From a883dc0b85e19a73a92e04f21e62af72bbeceea1 Mon Sep 17 00:00:00 2001 From: Ilya Lavrenov Date: Thu, 24 Mar 2022 22:27:29 +0300 Subject: [PATCH] DOCS: ported changes from 2022.1 release branch (#11206) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Extensibility guide with FE extensions and remove OV_FRAMEWORK_MAP from docs * Rework of Extensibility Intro, adopted examples to missing OPENVINO_FRAMEWORK_MAP * Removed OPENVINO_FRAMEWORK_MAP reference * Frontend extension detailed documentation * Fixed distributed snippets * Fixed snippet inclusion in FE extension document and chapter headers * Fixed wrong name in a snippet reference * Fixed test for template extension due to changed number of loaded extensions * Update docs/Extensibility_UG/frontend_extensions.md Co-authored-by: Ivan Tikhonov * Minor fixes in extension snippets * Small grammar fix Co-authored-by: Ivan Tikhonov Co-authored-by: Ivan Tikhonov * DOCS: transition banner (#10973) * transition banner * minor fix * update transition banner * updates * update custom.js * updates * updates * Documentation fixes (#11044) * Benchmark app usage * Fixed link to the devices * More fixes * Update docs/OV_Runtime_UG/multi_device.md Co-authored-by: Sergey Lyubimtsev * Removed several hardcoded links Co-authored-by: Sergey Lyubimtsev * Updated documentation for compile_tool (#11049) * Added deployment guide (#11060) * Added deployment guide * Added local distribution * Updates * Fixed more indentations * Removed obsolete code snippets (#11061) * Removed obsolete code snippets * NCC style * Fixed NCC for BA * Add a troubleshooting issue for PRC installation (#11074) * updates * adding gna to linux * add missing reference * update * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev * update * minor updates * add gna item to yum and apt * add gna to get started page * update reference formatting * merge commit * add a troubleshooting issue * update * update * fix CVS-71846 Co-authored-by: Sergey Lyubimtsev * DOCS: fixed hardcoded links (#11100) * Fixes * Use links * applying reviewers comments to the Opt Guide (#11093) * applying reviewrs comments * fixed refs, more structuring (bold, bullets, etc) * refactoring tput/latency sections * next iteration (mostly latency), also brushed the auto-batching and other sections * updates sync/async images * common opts brushed * WIP tput redesigned * minor brushing of common and auto-batching * Tput fully refactored * fixed doc name in the link * moved int8 perf counters to the right section * fixed links * fixed broken quotes * fixed more links * add ref to the internals to the TOC * Added a note on the batch size Co-authored-by: Andrey Zaytsev * [80085] New images for docs (#11114) * change doc structure * fix manager tools * fix manager tools 3 step * fix manager tools 3 step * new img * new img for OV Runtime * fix steps * steps * fix intendents * change list * fix space * fix space * code snippets fix * change display * Benchmarks 2022 1 (#11130) * Minor fixes * Updates for 2022.1 * Edits according to the review * Edits according to review comments * Edits according to review comments * Edits according to review comments * Fixed table * Edits according to review comments * Removed config for Intel® Core™ i7-11850HE * Removed forward-tacotron-duration-prediction-241 graph * Added resnet-18-pytorch * Add info about Docker images in Deployment guide (#11136) * Renamed user guides (#11137) * fix screenshot (#11140) * More conservative recommendations on dynamic shapes usage in docs (#11161) * More conservative recommendations about using dynamic shapes * Duplicated statement from C++ part to Python part of reshape doc (no semantical changes) * Update ShapeInference.md (#11168) * Benchmarks 2022 1 updates (#11180) * Updated graphs * Quick fix for TODO in Dynamic Shapes article * Anchor link fixes * Fixed DM config (#11199) * DOCS: doxy sphinxtabs (#11027) * initial implementation of doxy sphinxtabs * fixes * fixes * fixes * fixes * fixes * WA for ignored visibility attribute * Fixes Co-authored-by: Sergey Lyalin Co-authored-by: Ivan Tikhonov Co-authored-by: Nikolay Tyukaev Co-authored-by: Sergey Lyubimtsev Co-authored-by: Yuan Xu Co-authored-by: Maxim Shevtsov Co-authored-by: Andrey Zaytsev Co-authored-by: Tatiana Savina Co-authored-by: Ilya Naumov Co-authored-by: Evgenya Stepyreva --- .../ncc_naming_style/ncc_naming_style.cmake | 4 +- docs/Doxyfile.config | 4 + docs/Extensibility_UG/Intro.md | 122 +++- docs/Extensibility_UG/add_openvino_ops.md | 11 +- docs/Extensibility_UG/frontend_extensions.md | 105 +++ .../low_precision_transformations/lpt.md | 6 +- .../Deep_Learning_Model_Optimizer_DevGuide.md | 2 +- .../Getting_performance_numbers.md | 39 +- .../convert_model/Convert_Model_From_ONNX.md | 2 +- .../convert_model/Converting_Model.md | 2 +- .../Convert_GNMT_From_Tensorflow.md | 2 +- .../Convert_RetinaNet_From_Tensorflow.md | 2 +- docs/OV_Runtime_UG/Int8Inference.md | 33 +- docs/OV_Runtime_UG/PythonPackage_Overview.md | 14 - docs/OV_Runtime_UG/Python_API_exclusives.md | 88 +-- docs/OV_Runtime_UG/ShapeInference.md | 8 +- docs/OV_Runtime_UG/auto_device_selection.md | 4 +- docs/OV_Runtime_UG/automatic_batching.md | 97 +-- .../deployment}/deployment-manager-tool.md | 227 +++--- .../deployment/deployment_intro.md | 68 ++ .../deployment/local-distribution.md | 162 +++++ docs/OV_Runtime_UG/hetero_execution.md | 81 +-- .../img/configuration_dialog.png | 3 + .../img/deploy_encrypted_model.png | 4 +- docs/OV_Runtime_UG/img/selection_dialog.png | 3 + .../integrate_with_your_application.md | 205 +++--- docs/OV_Runtime_UG/layout_overview.md | 119 ++- .../common_inference_pipeline.md | 184 ++--- .../migration_ov_2_0/configure_devices.md | 216 +++--- .../migration_ov_2_0/deployment_migration.md | 4 +- docs/OV_Runtime_UG/migration_ov_2_0/intro.md | 2 +- .../migration_ov_2_0/preprocessing.md | 177 +++-- docs/OV_Runtime_UG/model_representation.md | 144 ++-- docs/OV_Runtime_UG/multi_device.md | 4 +- docs/OV_Runtime_UG/openvino_intro.md | 8 +- docs/OV_Runtime_UG/ov_dynamic_shapes.md | 122 ++-- docs/OV_Runtime_UG/ov_infer_request.md | 287 ++++---- docs/OV_Runtime_UG/performance_hints.md | 2 +- docs/OV_Runtime_UG/preprocessing_details.md | 281 ++++---- docs/OV_Runtime_UG/preprocessing_overview.md | 101 ++- .../preprocessing_usecase_save.md | 62 +- docs/OV_Runtime_UG/protecting_model_guide.md | 1 - .../supported_plugins/AutoPlugin_Debugging.md | 1 - .../supported_plugins/Device_Plugins.md | 28 +- docs/OV_Runtime_UG/supported_plugins/GNA.md | 121 ++-- docs/OV_Runtime_UG/supported_plugins/GPU.md | 48 +- .../supported_plugins/GPU_RemoteTensor_API.md | 191 ++--- .../supported_plugins/Supported_Devices.md | 14 - .../supported_plugins/config_properties.md | 178 ++--- docs/_static/css/custom.css | 32 + docs/_static/images/inputs_defined.png | 3 + docs/_static/images/omz_banner.png | 3 - docs/_static/images/original_model_banner.png | 3 - docs/_static/js/custom.js | 32 +- docs/_templates/layout.html | 10 + docs/benchmarks/performance_benchmarks.md | 12 +- docs/benchmarks/performance_benchmarks_faq.md | 61 +- .../performance_benchmarks_openvino.md | 366 +++++----- .../benchmarks/performance_benchmarks_ovms.md | 65 +- docs/benchmarks/performance_int8_vs_fp32.md | 675 +++++++++--------- docs/benchmarks/performance_ov_vs_tf.md | 102 +++ docs/documentation.md | 9 +- docs/doxyrest/frame/common/doc.lua | 24 +- docs/get_started/get_started_demos.md | 10 +- docs/how_tos/MonoDepth_how_to.md | 70 -- docs/how_tos/POT_how_to_example.md | 163 ----- docs/how_tos/how-to-links.md | 86 --- docs/img/configuration_dialog.png | 4 +- docs/img/deploy_encrypted_model.png | 4 +- docs/img/deploy_encrypted_model.png.vsdx | 3 + docs/img/deployment_full.png | 3 + docs/img/deployment_simplified.png | 3 + docs/img/int8vsfp32.png | 4 +- docs/img/selection_dialog.png | 4 +- ...hput_ovms_1gbps_facedetection0200_fp32.png | 3 + ...hput_ovms_1gbps_facedetection0200_int8.png | 3 + .../throughput_ovms_1gbps_googlenet4_fp32.png | 3 + ...oughput_ovms_1gbps_mobilnet3small_fp32.png | 3 + .../throughput_ovms_1gbps_resnet50_fp32.png | 3 + .../throughput_ovms_1gbps_resnet50_int8.png | 3 + docs/img/throughput_ovms_3dunet.png | 4 +- docs/img/throughput_ovms_alexnet.png | 3 + docs/img/throughput_ovms_bertsmall_fp32.png | 4 +- docs/img/throughput_ovms_bertsmall_int8.png | 4 +- ...throughput_ovms_braintumorsegmentation.png | 3 + docs/img/throughput_ovms_deeplabv3_fp32.png | 3 + ...throughput_ovms_facedetection0200_int8.png | 3 + docs/img/throughput_ovms_googlenet4_fp32.png | 3 + .../throughput_ovms_mobilenet3large_fp32.png | 4 +- docs/img/throughput_ovms_resnet50_fp32.png | 4 +- docs/img/throughput_ovms_resnet50_int8.png | 4 +- ...hroughput_ovms_unetcamvidonnx0001_fp32.png | 3 + ...hroughput_ovms_unetcamvidonnx0001_int8.png | 3 + docs/img/throughput_ovms_yolo3_fp32.png | 4 +- docs/img/throughput_ovms_yolo4_fp32.png | 4 +- docs/img/vtune_async.png | 4 +- docs/img/vtune_regular.png | 4 +- docs/index.rst | 2 +- .../install_guides/installing-openvino-apt.md | 16 +- .../installing-openvino-linux.md | 2 +- .../installing-openvino-macos.md | 2 +- .../installing-openvino-windows.md | 6 +- docs/install_guides/movidius-setup-guide.md | 1 - docs/install_guides/pypi-openvino-dev.md | 11 +- docs/install_guides/pypi-openvino-rt.md | 3 +- docs/install_guides/troubleshooting.md | 26 +- .../dldt_deployment_optimization_common.md | 25 +- .../dldt_deployment_optimization_guide.md | 42 +- .../dldt_deployment_optimization_hints.md | 2 +- .../dldt_deployment_optimization_internals.md | 24 + .../dldt_deployment_optimization_latency.md | 17 +- .../dldt_deployment_optimization_tput.md | 121 ++-- .../dldt_optimization_guide.md | 15 +- docs/snippets/CMakeLists.txt | 34 +- docs/snippets/Graph_debug_capabilities0.cpp | 13 - docs/snippets/Graph_debug_capabilities1.cpp | 13 - docs/snippets/InferenceEngine_QueryAPI0.cpp | 10 - docs/snippets/InferenceEngine_QueryAPI1.cpp | 10 - docs/snippets/InferenceEngine_QueryAPI2.cpp | 10 - docs/snippets/InferenceEngine_QueryAPI3.cpp | 12 - docs/snippets/InferenceEngine_QueryAPI4.cpp | 12 - docs/snippets/InferenceEngine_QueryAPI5.cpp | 12 - docs/snippets/dldt_optimization_guide1.cpp | 16 - docs/snippets/dldt_optimization_guide2.cpp | 14 - docs/snippets/dldt_optimization_guide3.cpp | 22 - docs/snippets/dldt_optimization_guide4.cpp | 20 - docs/snippets/dldt_optimization_guide5.cpp | 30 - docs/snippets/dldt_optimization_guide6.cpp | 24 - docs/snippets/dldt_optimization_guide7.cpp | 15 - docs/snippets/dldt_optimization_guide9.cpp | 3 +- docs/snippets/example_async_infer_request.cpp | 20 +- docs/snippets/movidius-programming-guide.cpp | 36 - docs/snippets/nGraphTutorial.cpp | 38 - docs/snippets/ov_extensions.cpp | 109 ++- docs/snippets/ov_extensions.py | 7 +- docs/snippets/ov_properties_api.py | 6 +- docs/template_extension/new/CMakeLists.txt | 6 - docs/template_extension/new/identity.hpp | 9 - docs/template_extension/new/ov_extension.cpp | 8 +- samples/c/hello_classification/README.md | 2 +- .../hello_nv12_input_classification/README.md | 2 +- samples/cpp/CMakeLists.txt | 2 +- samples/cpp/benchmark_app/README.md | 14 +- samples/cpp/benchmark_app/main.cpp | 6 +- samples/cpp/build_samples.sh | 4 +- samples/cpp/build_samples_msvc.bat | 4 +- .../cpp/classification_sample_async/README.md | 2 +- samples/cpp/hello_classification/README.md | 2 +- .../hello_nv12_input_classification/README.md | 2 +- samples/cpp/hello_reshape_ssd/README.md | 2 +- samples/cpp/hello_reshape_ssd/main.cpp | 2 +- samples/cpp/speech_sample/main.cpp | 6 +- .../classification_sample_async/README.md | 2 +- samples/python/hello_classification/README.md | 2 +- samples/python/hello_reshape_ssd/README.md | 2 +- .../model_creation_sample.py | 2 +- src/core/tests/extension.cpp | 4 +- tools/compile_tool/README.md | 20 +- tools/deployment_manager/configs/darwin.json | 1 + tools/deployment_manager/configs/linux.json | 1 + tools/deployment_manager/configs/windows.json | 1 + 161 files changed, 3254 insertions(+), 3078 deletions(-) create mode 100644 docs/Extensibility_UG/frontend_extensions.md delete mode 100644 docs/OV_Runtime_UG/PythonPackage_Overview.md rename docs/{install_guides => OV_Runtime_UG/deployment}/deployment-manager-tool.md (52%) create mode 100644 docs/OV_Runtime_UG/deployment/deployment_intro.md create mode 100644 docs/OV_Runtime_UG/deployment/local-distribution.md create mode 100644 docs/OV_Runtime_UG/img/configuration_dialog.png create mode 100644 docs/OV_Runtime_UG/img/selection_dialog.png create mode 100644 docs/_static/images/inputs_defined.png delete mode 100644 docs/_static/images/omz_banner.png delete mode 100644 docs/_static/images/original_model_banner.png create mode 100644 docs/benchmarks/performance_ov_vs_tf.md delete mode 100644 docs/how_tos/MonoDepth_how_to.md delete mode 100644 docs/how_tos/POT_how_to_example.md delete mode 100644 docs/how_tos/how-to-links.md create mode 100644 docs/img/deploy_encrypted_model.png.vsdx create mode 100644 docs/img/deployment_full.png create mode 100644 docs/img/deployment_simplified.png create mode 100644 docs/img/throughput_ovms_1gbps_facedetection0200_fp32.png create mode 100644 docs/img/throughput_ovms_1gbps_facedetection0200_int8.png create mode 100644 docs/img/throughput_ovms_1gbps_googlenet4_fp32.png create mode 100644 docs/img/throughput_ovms_1gbps_mobilnet3small_fp32.png create mode 100644 docs/img/throughput_ovms_1gbps_resnet50_fp32.png create mode 100644 docs/img/throughput_ovms_1gbps_resnet50_int8.png create mode 100644 docs/img/throughput_ovms_alexnet.png create mode 100644 docs/img/throughput_ovms_braintumorsegmentation.png create mode 100644 docs/img/throughput_ovms_deeplabv3_fp32.png create mode 100644 docs/img/throughput_ovms_facedetection0200_int8.png create mode 100644 docs/img/throughput_ovms_googlenet4_fp32.png create mode 100644 docs/img/throughput_ovms_unetcamvidonnx0001_fp32.png create mode 100644 docs/img/throughput_ovms_unetcamvidonnx0001_int8.png create mode 100644 docs/optimization_guide/dldt_deployment_optimization_internals.md delete mode 100644 docs/snippets/Graph_debug_capabilities0.cpp delete mode 100644 docs/snippets/Graph_debug_capabilities1.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI0.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI1.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI2.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI3.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI4.cpp delete mode 100644 docs/snippets/InferenceEngine_QueryAPI5.cpp delete mode 100644 docs/snippets/dldt_optimization_guide1.cpp delete mode 100644 docs/snippets/dldt_optimization_guide2.cpp delete mode 100644 docs/snippets/dldt_optimization_guide3.cpp delete mode 100644 docs/snippets/dldt_optimization_guide4.cpp delete mode 100644 docs/snippets/dldt_optimization_guide5.cpp delete mode 100644 docs/snippets/dldt_optimization_guide6.cpp delete mode 100644 docs/snippets/dldt_optimization_guide7.cpp delete mode 100644 docs/snippets/movidius-programming-guide.cpp delete mode 100644 docs/snippets/nGraphTutorial.cpp diff --git a/cmake/developer_package/ncc_naming_style/ncc_naming_style.cmake b/cmake/developer_package/ncc_naming_style/ncc_naming_style.cmake index 8efdf859a42..c2da0f5a732 100644 --- a/cmake/developer_package/ncc_naming_style/ncc_naming_style.cmake +++ b/cmake/developer_package/ncc_naming_style/ncc_naming_style.cmake @@ -23,7 +23,7 @@ execute_process( ERROR_VARIABLE error_var) if(NOT clang_find_result EQUAL "0") - message(WARNING "Please, install libclang-[N]-dev package (required for ncc naming style check)") + message(WARNING "Please, install clang-[N] libclang-[N]-dev package (required for ncc naming style check)") message(WARNING "find_package(Clang) output: ${output_var}") message(WARNING "find_package(Clang) error: ${error_var}") set(ENABLE_NCC_STYLE OFF) @@ -106,9 +106,9 @@ function(ov_ncc_naming_style) "${NCC_STYLE_SOURCE_DIRECTORY}/*.cpp") list(APPEND NCC_STYLE_ADDITIONAL_INCLUDE_DIRECTORIES "${NCC_STYLE_SOURCE_DIRECTORY}") - # without it sources with same name from different directories will map to same .ncc_style target file(RELATIVE_PATH source_dir_rel ${CMAKE_SOURCE_DIR} ${NCC_STYLE_SOURCE_DIRECTORY}) + foreach(source IN LISTS sources) set(output_file "${ncc_style_bin_dir}/${source_dir_rel}/${source}.ncc_style") set(full_source_path "${NCC_STYLE_SOURCE_DIRECTORY}/${source}") diff --git a/docs/Doxyfile.config b/docs/Doxyfile.config index adffa442688..fa8a893de49 100644 --- a/docs/Doxyfile.config +++ b/docs/Doxyfile.config @@ -264,6 +264,10 @@ TAB_SIZE = 4 ALIASES = "ref_ie{1}=@ref InferenceEngine::\1 \"\1\"" ALIASES += sphinxdirective="\n\xmlonly" ALIASES += endsphinxdirective="\endxmlonly" +ALIASES += sphinxtabset="\n\xmlonly\endxmlonly\n" +ALIASES += endsphinxtabset="\n\xmlonly\endxmlonly\n" +ALIASES += sphinxtab{1}="\n\xmlonly\1\endxmlonly\n" +ALIASES += endsphinxtab="\n\xmlonly\endxmlonly\n" # Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources # only. Doxygen will then generate output that is more tailored for C. For diff --git a/docs/Extensibility_UG/Intro.md b/docs/Extensibility_UG/Intro.md index f0df72daf47..47de4c1f907 100644 --- a/docs/Extensibility_UG/Intro.md +++ b/docs/Extensibility_UG/Intro.md @@ -1,4 +1,4 @@ -# OpenVINO Extensibility Mechanism {#openvino_docs_Extensibility_UG_Intro} +# OpenVINO Extensibility Mechanism {#openvino_docs_Extensibility_UG_Intro} @sphinxdirective @@ -7,41 +7,67 @@ :hidden: openvino_docs_Extensibility_UG_add_openvino_ops + openvino_docs_Extensibility_UG_Frontend_Extensions openvino_docs_Extensibility_UG_GPU openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer @endsphinxdirective The Intel® Distribution of OpenVINO™ toolkit supports neural network models trained with various frameworks, including -TensorFlow, PyTorch, ONNX, PaddlePaddle, MXNet, Caffe, and Kaldi. The list of supported operations (layers) is different for +TensorFlow, PyTorch, ONNX, PaddlePaddle, MXNet, Caffe, and Kaldi. The list of supported operations is different for each of the supported frameworks. To see the operations supported by your framework, refer to [Supported Framework Operations](../MO_DG/prepare_model/Supported_Frameworks_Layers.md). -Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. Therefore, creating Intermediate Representation (IR) for a model using them requires additional steps. This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to plug in your own implementation for existing or completely new operations. +Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. The need in custom operation may appear in two main cases: -If your model contains operations not normally supported by OpenVINO™, the OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime. +1. A regular framework operation that is new or rarely used and that’s why hasn’t been supported in OpenVINO yet. -There are two steps to support inference of a model with custom operation(s): -1. Add support for a [custom operation in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) so -the Model Optimizer can generate the IR with the operation. -2. Create a custom operation in it as described in the [Custom Operation](add_openvino_ops.md). +2. A new user operation that was created for some specific model topology by a model author using framework extension capabilities. -## OpenVINO™ Extensions +Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations, allowing you to plug in your own implementation for them. OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime. -OpenVINO™ provides extensions for: +Defining a new custom operation basically consist of two parts: - * [Custom OpenVINO™ Operation](add_openvino_ops.md): - - Enables the creation of unsupported operations - - Enables the use of `ov::Core::read_model` to read models with unsupported operations - - Provides a shape inference mechanism for custom operations - - Provides an evaluate method that allows you to support the operation on CPU or perform constant folding - * [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md): - - Enables support of new operations to generate IR - - Enables support of custom transformations to replace sub-graphs for performance optimization +1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). -> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details. You can review the complete code, which is fully compilable and up-to-date, to see how it works. +2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics. -## Load extensions to OpenVINO™ Runtime +The first part is required for inference, the second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part, the next sections will describe them in detail. + +## Definition of Operation Semantics + + +If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. When deciding feasibility of such decomposition refer to the latest OpenVINO operation set. You can use any valid combination of exiting operations. How to map a custom operation is described in the next section of this document. + +If such decomposition is not possible or appears too bulky with lots of consisting operations that are not performing well, then a new class for the custom operation should be implemented as described in the [Custom Operation Guide](add_openvino_ops.md). + +Prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise try to decompose the operation first as described above and then after verifying correctness of inference and resulting performance, optionally invest to implementing bare metal C++ implementation. + +## Mapping from Framework Operation + +Depending on model format used for import, mapping of custom operation is implemented differently, choose one of: + +1. If model is represented in ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with Model Optimizer `--extensions` option or when model is imported directly to OpenVINO run-time using read_model method. Python API is also available for run-time model importing. + +2. If model is represented in TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only. + +Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle) and legacy frontends (TensorFlow, Caffe, Kaldi and MXNet). Model Optimizer can use both front-ends in contrast to the direct import of model with `read_model` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend. + +If you are implementing extensions for ONNX or PaddlePaddle new frontends and plan to use Model Optimizer `--extension` option for model conversion, then the extensions should be + +1. Implemented in C++ only + +2. Compiled as a separate shared library (see details how to do that later in this guide). + +You cannot write new frontend extensions using Python API if you plan to use them with Model Optimizer. + +Remaining part of this guide uses Frontend Extension API applicable for new frontends. + +## Registering Extensions + +A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime. + +> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works. To load the extensions to the `ov::Core` object, use the `ov::Core::add_extension` method, this method allows to load library with extensions or extensions from the code. @@ -49,27 +75,50 @@ To load the extensions to the `ov::Core` object, use the `ov::Core::add_extensio Extensions can be loaded from code with `ov::Core::add_extension` method: +@sphinxtabset + +@sphinxtab{C++} + +@snippet docs/snippets/ov_extensions.cpp add_extension + +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_extensions.py add_extension + +@endsphinxtab + +@endsphinxtabset + +`Identity` is custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is enough to enable reading IR which uses `Identity` extension operation emitted by Model Optimizer. To be able to load original model directly to the runtime, you need to add also a mapping extension: + @sphinxdirective .. tab:: C++ .. doxygensnippet:: docs/snippets/ov_extensions.cpp :language: cpp - :fragment: add_extension + :fragment: add_frontend_extension .. tab:: Python .. doxygensnippet:: docs/snippets/ov_extensions.py :language: python - :fragment: add_extension + :fragment: add_frontend_extension @endsphinxdirective + +When Python API is used there is no way to implement a custom OpenVINO operation. Also, even if custom OpenVINO operation is implemented in C++ and loaded to the runtime through a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. Use C++ shared library approach to implement both operations semantics and framework mapping in this case. + +You still can use Python for operation mapping and decomposition in case if operations from the standard OpenVINO operation set is used only. ### Create library with extensions -You need to create extension library in following cases: - - Load extensions to Model Optimizer - - Load extensions to Python application +You need to create extension library in the following cases: + - Convert model with custom operations in Model Optimizer + - Load model with custom operations in Python application. It is applicable for both framework model and IR. + - Loading models with custom operations in tools that support loading extensions from a library, for example `benchmark_app`. If you want to create an extension library, for example in order to load these extensions to the Model Optimizer, you need to do next steps: Create an entry point for extension library. OpenVINO™ provides an `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO™ Extensions. @@ -97,24 +146,25 @@ $ cmake --build . After the build you can use path to your extension library to load your extensions to OpenVINO™ Runtime: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_extensions.cpp - :language: cpp - :fragment: add_extension_lib +@snippet docs/snippets/ov_extensions.cpp add_extension_lib -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_extensions.py - :language: python - :fragment: add_extension_lib +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_extensions.py add_extension_lib + +@endsphinxtab + +@endsphinxtabset ## See Also * [OpenVINO Transformations](./ov_transformations.md) -* [Using Inference Engine Samples](../OV_Runtime_UG/Samples_Overview.md) +* [Using OpenVINO Runtime Samples](../OV_Runtime_UG/Samples_Overview.md) * [Hello Shape Infer SSD sample](../../samples/cpp/hello_reshape_ssd/README.md) + diff --git a/docs/Extensibility_UG/add_openvino_ops.md b/docs/Extensibility_UG/add_openvino_ops.md index 7c5ed06f1fd..c292060159d 100644 --- a/docs/Extensibility_UG/add_openvino_ops.md +++ b/docs/Extensibility_UG/add_openvino_ops.md @@ -1,4 +1,4 @@ -# Custom OpenVINO™ Operations {#openvino_docs_Extensibility_UG_add_openvino_ops} +# Custom OpenVINO™ Operations {#openvino_docs_Extensibility_UG_add_openvino_ops} OpenVINO™ Extension API allows you to register custom operations to support models with operations which OpenVINO™ does not support out-of-the-box. @@ -20,14 +20,10 @@ Follow the steps below to add a custom operation: 5. Override the `visit_attributes` method, which enables serialization and deserialization of operation attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector`, and for existing OpenVINO defined types. -6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allow to get information about availability of `evaluate` method for the operation. - -7. Add the `OPENVINO_FRAMEWORK_MAP` macro if you want to map custom operation to framework operation with the same name. It is an optional macro which can be used for one to one mapping. In order to use this macro please include frontend specific headers: - @snippet template_extension/new/identity.hpp op:frontend_include +6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allows to get information about availability of `evaluate` method for the operation. Based on that, declaration of an operation class can look as follows: -@snippet template_extension/new/identity.hpp op:header ### Operation Constructors @@ -55,8 +51,9 @@ OpenVINO™ operation contains two constructors: @snippet template_extension/new/identity.cpp op:visit_attributes -### `evaluate()` and `has_evaluate()` +### evaluate() and has_evaluate() `ov::Node::evaluate` method enables you to apply constant folding to an operation. @snippet template_extension/new/identity.cpp op:evaluate + diff --git a/docs/Extensibility_UG/frontend_extensions.md b/docs/Extensibility_UG/frontend_extensions.md new file mode 100644 index 00000000000..7ad109752f6 --- /dev/null +++ b/docs/Extensibility_UG/frontend_extensions.md @@ -0,0 +1,105 @@ +# Frontend Extensions {#openvino_docs_Extensibility_UG_Frontend_Extensions} + +The goal of this chapter is to explain how to use Frontend extension classes to facilitate mapping of custom operations from framework model representation to OpenVINO representation. Refer to [Introduction to OpenVINO Extension](Intro.md) to understand entire flow. + +This API is applicable for new frontends only, which exist for ONNX and PaddlePaddle. If a different model format is used, follow legacy [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) guide. + +> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works. + +## Single Operation Mapping with OpExtension + +This section covers the case when a single operation in framework representation is mapped to a single operation in OpenVINO representation. This is called *one-to-one mapping*. There is `OpExtension` class that works well if all the following conditions are satisfied: + +1. Number of inputs to operation in the Framework representation is the same as in the OpenVINO representation. + +2. Number of outputs is also the same in both representations. + +3. Inputs can be indexed and are mapped in order correspondingly, e.g. input with index 0 in framework representation maps to input with index 0 in OpenVINO representation and so on. + +4. The same for outputs. + +5. Each attribute in OpenVINO operation can be initialized from one of the attributes of original operation or by some predefined constant value. Value of copied attributes cannot contain expressions, value is accepted as-is, so type of a value should be compatible. + +> **NOTE**: `OpExtension` class is currently available for ONNX frontend only. PaddlePaddle frontend has named inputs and outputs for operation (not indexed) therefore OpExtension mapping is not applicable for this case. + +The next example maps ONNX operation with type [“Identity”]( https://github.com/onnx/onnx/blob/main/docs/Operators.md#Identity) to OpenVINO template extension `Identity` class. + +@snippet ov_extensions.cpp frontend_extension_Identity_header +@snippet ov_extensions.cpp frontend_extension_Identity + +The mapping doesn’t involve any attributes, as operation Identity doesn’t have them. + +Extension objects, like just constructed `extension` can be used to add to the OpenVINO runtime just before the loading a model that contains custom operations: + +@snippet ov_extensions.cpp frontend_extension_read_model + +Or extensions can be constructed in a separately compiled shared library. Separately compiled library can be used in Model Optimizer or `benchmark_app`. Read about how to build and load such library in chapter “Create library with extensions” in [Introduction to OpenVINO Extension](Intro.md). + +If operation have multiple inputs and/or outputs they will be mapped in order. The type of elements in input/output tensors should match expected types in the surrounding operations. For example, if custom operation produces `f32` data type then operation that consumes this output should also support `f32`. Otherwise, model conversion fails with an error, there are no automatic type conversion happens. + +### Converting to Standard OpenVINO Operation + +`OpExtension` class can be used when mapping to one of the operations from standard OpenVINO operation set is what you need and there is no class like `TemplateExtension::Identity` implemented. + +Here is an example for a custom framework operation “MyRelu”. Suppose it is mathematically equivalent to standard `Relu` that exists in OpenVINO operation set, but for some reason has type name “MyRelu”. In this case you can directly say that “MyRelu” -> `Relu` mapping should be used: + +@snippet ov_extensions.cpp frontend_extension_MyRelu + +In the resulting converted OpenVINO model, “MyRelu” operation will be replaced by the standard operation `Relu` from the latest available OpenVINO operation set. Notice that when standard operation is used, it can be specified using just a type string (“Relu”) instead of using a `ov::opset8::Relu` class name as a template parameter for `OpExtension`. This method is available for operations from the standard operation set only. For a user custom OpenVINO operation the corresponding class should be always specified as a template parameter as it was demonstrated with `TemplateExtension::Identity`. + +### Attributes Mapping + +As described above, `OpExtension` is useful when attributes can be mapped one by one or initialized by a constant. If the set of attributes in framework representation and OpenVINO representation completely match by their names and types, nothing should be specified in OpExtension constructor parameters. The attributes are discovered and mapped automatically based on `visit_attributes` method that should be defined for any OpenVINO operation. + +Imagine you have CustomOperation class implementation that has two attributes with names `attr1` and `attr2`: + +@snippet ov_extensions.cpp frontend_extension_CustomOperation + +And original model in framework representation also has operation with name “CustomOperatoin” with the same `attr1` and `attr2` attributes. Then with the following code: + +@snippet ov_extensions.cpp frontend_extension_CustomOperation_as_is + +both `attr1` and `attr2` are copied from framework representation to OpenVINO representation automatically. If for some reason names of attributes are different but values still can be copied “as-is” you can pass attribute names mapping in `OpExtension` constructor: + +@snippet ov_extensions.cpp frontend_extension_CustomOperation_rename + +Where `fw_attr1` and `fw_attr2` are names for corresponding attributes in framework operation representation. + +If copying of an attribute is not what you need, `OpExtension` also can set attribute to predefined constant value. For the same `CustomOperation`, imagine you want to set `attr2` to value 5 instead of copying from `fw_attr2`, to achieve that do the following: + +@snippet ov_extensions.cpp frontend_extension_CustomOperation_rename_set + +So the conclusion is that each attribute of target OpenVINO operation should be initialized either by + +1. Setting automatically due to name matching + +2. Mapped by attribute name + +3. Set to a constant value + +This is achieved by specifying maps as arguments for `OpExtension` constructor. + + +## Mapping to Multiple Operations with ConversionExtension + +Previous sections cover the case when a single operation is mapped to a single operation with optional adjustment in names and attribute values. That is likely enough for your own custom operation with existing C++ kernel implementation. In this case your framework representation and OpenVINO representation for the operation are under your control and inputs/outpus/attributes can be aligned to make `OpExtension` usable. + +In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity. + +`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement. + +The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`. + +> **NOTE**: `ThresholdedRelu` is one of the standard ONNX operators which is supported by ONNX frontend natively out-of-the-box. Here we are re-implementing it to illustrate how you can add a similar support for your custom operation instead of `ThresholdedRelu`. + +@snippet ov_extensions.cpp frontend_extension_ThresholdedReLU_header +@snippet ov_extensions.cpp frontend_extension_ThresholdedReLU + +To access original framework operation attribute value and connect to inputs, `node` object of type `NodeContext` is used. It has two main methods: + +* `NodeContext::get_input` to get input with a given index, + +* `NodeContext::get_attribute` to get attribute value with a given name. + +The conversion function should return a vector of node outputs that are mapped to corresponding outputs of the original framework operation in the same order. + diff --git a/docs/IE_PLUGIN_DG/plugin_transformation_pipeline/low_precision_transformations/lpt.md b/docs/IE_PLUGIN_DG/plugin_transformation_pipeline/low_precision_transformations/lpt.md index dc7e2db4f78..07c21dcaa9d 100644 --- a/docs/IE_PLUGIN_DG/plugin_transformation_pipeline/low_precision_transformations/lpt.md +++ b/docs/IE_PLUGIN_DG/plugin_transformation_pipeline/low_precision_transformations/lpt.md @@ -236,11 +236,11 @@ This step is optional. It modifies the nGraph function to a device-specific oper Let's explore quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) tool to download the `fp16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo): ```sh -./downloader.py --name resnet-50-tf --precisions FP16-INT8 +omz_downloader --name resnet-50-tf --precisions FP16-INT8 ``` After that you should quantize model by the [Model Quantizer](@ref omz_tools_downloader) tool. ```sh -./quantizer.py --model_dir public/resnet-50-tf --dataset_dir --precisions=FP16-INT8 +omz_quantizer --model_dir public/resnet-50-tf --dataset_dir --precisions=FP16-INT8 ``` ### Inference @@ -259,7 +259,7 @@ Result model depends on different factors: Information about layer precision is stored in the performance counters that are -available from the Inference Engine API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows: +available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows: | layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) | diff --git a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md index 0fa581a39ab..52c92f6174b 100644 --- a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md +++ b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md @@ -1,4 +1,4 @@ -# Model Optimizer User Guide {#openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide} +# Convert model with Model Optimizer {#openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide} @sphinxdirective diff --git a/docs/MO_DG/prepare_model/Getting_performance_numbers.md b/docs/MO_DG/prepare_model/Getting_performance_numbers.md index be253fc5709..419b2162a51 100644 --- a/docs/MO_DG/prepare_model/Getting_performance_numbers.md +++ b/docs/MO_DG/prepare_model/Getting_performance_numbers.md @@ -9,7 +9,7 @@ When evaluating performance of your model with the OpenVINO Runtime, you must me - Track separately the operations that happen outside the OpenVINO Runtime, like video decoding. -> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [_runtime_ preprocessing optimizations](../../optimization_guide/dldt_deployment_optimization_common). +> **NOTE**: Some image pre-processing can be baked into the IR and accelerated accordingly. For more information, refer to [Embedding the Preprocessing](Additional_Optimizations.md). Also consider [Runtime Optimizations of the Preprocessing](../../optimization_guide/dldt_deployment_optimization_common). ## Tip 2. Getting Credible Performance Numbers @@ -53,17 +53,32 @@ When comparing the OpenVINO Runtime performance with the framework or another re Further, finer-grained insights into inference performance breakdown can be achieved with device-specific performance counters and/or execution graphs. Both [C++](../../../samples/cpp/benchmark_app/README.md) and [Python](../../../tools/benchmark_tool/README.md) versions of the `benchmark_app` supports a `-pc` command-line parameter that outputs internal execution breakdown. -Below is example of CPU plugin output for a network (since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same): +For example, below is the part of performance counters for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](../../OV_Runtime_UG/supported_plugins/CPU.md). +Notice that since the device is CPU, the layers wall clock `realTime` and the `cpu` time are the same. Information about layer precision is also stored in the performance counters. + +| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) | +| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ | +| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 | +| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | +| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 | +| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | +| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 | +| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | +| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 | +| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | + + + The `exeStatus` column of the table includes possible values: + - `EXECUTED` - layer was executed by standalone primitive, + - `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive. + + The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks: + * Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision + * Suffix `FP32` for layers computed in 32-bit precision + + All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](../../OV_Runtime_UG/supported_plugins/CPU.md). + This contains layers name (as seen in IR), layers type and execution statistics. -``` -conv1 EXECUTED layerType: Convolution realTime: 706 cpu: 706 execType: jit_avx2 -conv2_1_x1 EXECUTED layerType: Convolution realTime: 137 cpu: 137 execType: jit_avx2_1x1 -fc6 EXECUTED layerType: Convolution realTime: 233 cpu: 233 execType: jit_avx2_1x1 -fc6_nChw8c_nchw EXECUTED layerType: Reorder realTime: 20 cpu: 20 execType: reorder -out_fc6 EXECUTED layerType: Output realTime: 3 cpu: 3 execType: unknown -relu5_9_x2 OPTIMIZED_OUT layerType: ReLU realTime: 0 cpu: 0 execType: undef -``` -This contains layers name (as seen in IR), layers type and execution statistics. Notice the `OPTIMIZED_OUT`, which indicates that the particular activation was fused into adjacent convolution. Both benchmark_app versions also support "exec_graph_path" command-line option governing the OpenVINO to output the same per-layer execution statistics, but in the form of the plugin-specific [Netron-viewable](https://netron.app/) graph to the specified file. Notice that on some devices, the execution graphs/counters may be pretty intrusive overhead-wise. @@ -71,4 +86,4 @@ Also, especially when performance-debugging the [latency case](../../optimizatio Finally, the performance statistics with both performance counters and execution graphs is averaged, so such a data for the [dynamically-shaped inputs](../../OV_Runtime_UG/ov_dynamic_shapes.md) should be measured carefully (ideally by isolating the specific shape and executing multiple times in a loop, to gather the reliable data). -OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view. \ No newline at end of file +OpenVINO in general and individual plugins are heavily instrumented with Intel® instrumentation and tracing technology (ITT), so another option is to compile the OpenVINO from the source code with the ITT enabled and using tools like [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) to get detailed inference performance breakdown and additional insights in the application-level performance on the timeline view. diff --git a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md index 08dacd50aa6..70baf49263d 100644 --- a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md +++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md @@ -1,4 +1,4 @@ -# Converting a ONNX* Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX} +# Converting an ONNX Model {#openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_ONNX} ## Introduction to ONNX [ONNX*](https://github.com/onnx/onnx) is a representation format for deep learning models. ONNX allows AI developers easily transfer models between different frameworks that helps to choose the best combination for them. Today, PyTorch\*, Caffe2\*, Apache MXNet\*, Microsoft Cognitive Toolkit\* and other tools are developing ONNX support. diff --git a/docs/MO_DG/prepare_model/convert_model/Converting_Model.md b/docs/MO_DG/prepare_model/convert_model/Converting_Model.md index e21acb7139d..6cfd5a77304 100644 --- a/docs/MO_DG/prepare_model/convert_model/Converting_Model.md +++ b/docs/MO_DG/prepare_model/convert_model/Converting_Model.md @@ -12,7 +12,7 @@ This is an offline approach to set static shapes and it can save time and memory To learn more about runtime shape change please see a dedicated article about [reshape feature](../../../OV_Runtime_UG/ShapeInference.md). For more information about the dynamic shapes, refer to [Dynamic Shapes](../../../OV_Runtime_UG/ov_dynamic_shapes.md) -OpenVINO Runtime API can have limitations to infer models with undefined dimensions on some hardware. +OpenVINO Runtime API can have limitations to infer models with undefined dimensions on some hardware (see [Features support matrix](../../../OV_Runtime_UG/supported_plugins/Device_Plugins.md) for reference). In this case, the `--input_shape` parameter and the [reshape method](../../../OV_Runtime_UG/ShapeInference.md) can help resolving undefined dimensions. Sometimes Model Optimizer is unable to convert models out-of-the-box (only the `--input_model` parameter is specified). diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md index a50f2dcacce..36f7066f388 100644 --- a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_GNMT_From_Tensorflow.md @@ -240,7 +240,7 @@ Outputs of the model: 1. With benchmark app: ```sh -python3 benchmark_app.py -m -d CPU +benchmark_app -m -d CPU ``` diff --git a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md index 510ff2f5862..cc7e1b584f2 100644 --- a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md +++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_RetinaNet_From_Tensorflow.md @@ -3,7 +3,7 @@ This tutorial explains how to convert RetinaNet model to the Intermediate Representation (IR). [Public RetinaNet model](https://github.com/fizyr/keras-retinanet) does not contain pretrained TensorFlow\* weights. -To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](https://docs.openvino.ai/latest/omz_models_model_retinanet_tf.html). +To convert this model to the TensorFlow\* format, you can use [Reproduce Keras* to TensorFlow* Conversion tutorial](@ref omz_models_model_retinanet_tf). After you convert the model to TensorFlow* format, run the Model Optimizer command below: ```sh diff --git a/docs/OV_Runtime_UG/Int8Inference.md b/docs/OV_Runtime_UG/Int8Inference.md index 20f002b2a29..2ff1180b977 100644 --- a/docs/OV_Runtime_UG/Int8Inference.md +++ b/docs/OV_Runtime_UG/Int8Inference.md @@ -30,14 +30,12 @@ At runtime, the quantized model is loaded to the plugin. The plugin uses the `Lo Let's explore quantized [TensorFlow* implementation of the ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) to download the `FP16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo): -> **NOTE**: If you installed OpenVINO with pip, use `omz_downloader` and `omz_quantizer` instead of `download.py` and `quantize.py`. See [Open Model Zoo documentation](https://github.com/openvinotoolkit/open_model_zoo/tree/master/tools/model_tools#model-downloader-usage). Replace `./benchmark_app` with `benchmark_app`. - ```sh -/tools/downloader/downloader.py --name resnet-50-tf --precisions FP16-INT8 +omz_downloader --name resnet-50-tf --precisions FP16-INT8 ``` After that you should quantize the model with the [Model Quantizer](@ref omz_tools_downloader) tool. ```sh -/tools/downloader/quantizer.py --model_dir public/resnet-50-tf --dataset_dir --precisions=FP16-INT8 +omz_quantizer --model_dir public/resnet-50-tf --dataset_dir --precisions=FP16-INT8 ``` The simplest way to infer the model and collect performance counters is the [Benchmark Application](../../samples/cpp/benchmark_app/README.md): @@ -61,30 +59,3 @@ For 8-bit integer computations, a model must be quantized. Quantized models can ![int8_flow] -## Performance Counters - -Information about layer precision is stored in the performance counters that are -available from the Inference Engine API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on [CPU Plugin](supported_plugins/CPU.md) looks as follows: - - -| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) | -| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ | -| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 | -| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | -| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 | -| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | -| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 | -| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | -| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 | -| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 | - - - The `exeStatus` column of the table includes possible values: - - `EXECUTED` - layer was executed by standalone primitive, - - `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive. - - The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks: - * Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision - * Suffix `FP32` for layers computed in 32-bit precision - - All `Convolution` layers are executed in int8 precision. Rest layers are fused into Convolutions using post operations optimization technique, which is described in [Internal CPU Plugin Optimizations](supported_plugins/CPU.md). diff --git a/docs/OV_Runtime_UG/PythonPackage_Overview.md b/docs/OV_Runtime_UG/PythonPackage_Overview.md deleted file mode 100644 index 5e03eb3295c..00000000000 --- a/docs/OV_Runtime_UG/PythonPackage_Overview.md +++ /dev/null @@ -1,14 +0,0 @@ -# OpenVINO™ Python* Package - -OpenVINO™ Python\* package includes types to measure model and calibrate to low precision. - -The OpenVINO™ Python\* package available in the `/python/python3.X` directory. - -The OpenVINO™ Python\* package includes the following sub-packages: - - - [openvino.inference_engine](../../src/bindings/python/docs/api_overview.md) - Python\* wrapper on OpenVINO™ Inference Engine. - - `openvino.tools.accuracy_checker` - Measure accuracy. - - `openvino.tools.benchmark` - Measure latency and throughput. - -## See Also -* [Integrate with Customer Application New API](integrate_with_your_application.md) diff --git a/docs/OV_Runtime_UG/Python_API_exclusives.md b/docs/OV_Runtime_UG/Python_API_exclusives.md index 3d3375acb34..92e9fa0266d 100644 --- a/docs/OV_Runtime_UG/Python_API_exclusives.md +++ b/docs/OV_Runtime_UG/Python_API_exclusives.md @@ -6,25 +6,13 @@ OpenVINO™ Runtime Python API is exposing additional features and helpers to el `CompiledModel` can be easily created with the helper method. It hides `Core` creation and applies `AUTO` device by default. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [auto_compilation] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py auto_compilation ## Model/CompiledModel inputs and outputs Besides functions aligned to C++ API, some of them have their Pythonic counterparts or extensions. For example, `Model` and `CompiledModel` inputs/outputs can be accessed via properties. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [properties_example] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py properties_example Refer to Python API documentation on which helper functions or properties are available for different classes. @@ -32,37 +20,19 @@ Refer to Python API documentation on which helper functions or properties are av Python API allows passing data as tensors. `Tensor` object holds a copy of the data from the given array. `dtype` of numpy arrays is converted to OpenVINO™ types automatically. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [tensor_basics] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py tensor_basics ### Shared memory mode `Tensor` objects can share the memory with numpy arrays. By specifing `shared_memory` argument, a `Tensor` object does not perform copy of data and has access to the memory of the numpy array. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [tensor_shared_mode] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py tensor_shared_mode ### Slices of array's memory One of the `Tensor` class constructors allows to share the slice of array's memory. When `shape` is specified in the constructor that has the numpy array as first argument, it triggers the special shared memory mode. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [tensor_slice_mode] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py tensor_slice_mode ## Running inference @@ -70,35 +40,17 @@ Python API supports extra calling methods to synchronous and asynchronous modes All infer methods allow users to pass data as popular numpy arrays, gathered in either Python dicts or lists. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [passing_numpy_array] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py passing_numpy_array Results from inference can be obtained in various ways: -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [getting_results] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py getting_results ### Synchronous mode - extended Python API provides different synchronous calls to infer model, which block the application execution. Additionally these calls return results of inference: -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [sync_infer] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py sync_infer ### AsyncInferQueue @@ -108,25 +60,13 @@ Each job is distinguishable by unique `id`, which is in the range from 0 up to n Function call `start_async` is not required to be synchronized, it waits for any available job if queue is busy/overloaded. Every `AsyncInferQueue` code block should end with `wait_all` function. It provides "global" synchronization of all jobs in the pool and ensure that access to them is safe. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [asyncinferqueue] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue #### Acquire results from requests After the call to `wait_all`, jobs and their data can be safely accessed. Acquring of a specific job with `[id]` returns `InferRequest` object, which results in seamless retrieval of the output data. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [asyncinferqueue_access] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue_access #### Setting callbacks @@ -134,10 +74,4 @@ Another feature of `AsyncInferQueue` is ability of setting callbacks. When callb The callback of `AsyncInferQueue` is uniform for every job. When executed, GIL is acquired to ensure safety of data manipulation inside the function. -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: python - :fragment: [asyncinferqueue_set_callback] - -@endsphinxdirective +@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue_set_callback diff --git a/docs/OV_Runtime_UG/ShapeInference.md b/docs/OV_Runtime_UG/ShapeInference.md index 85b7cb75c61..a8b1ef80570 100644 --- a/docs/OV_Runtime_UG/ShapeInference.md +++ b/docs/OV_Runtime_UG/ShapeInference.md @@ -39,6 +39,8 @@ There are other approaches to change model input shapes during the stage of @@ -103,7 +105,7 @@ For example, [publicly available Inception family models from TensorFlow*](https - Changing the model input shape may significantly affect its accuracy. For example, Object Detection models from TensorFlow have resizing restrictions by design. To keep the model valid after the reshape, choose a new input shape that satisfies conditions listed in the `pipeline.config` file. -For details, refer to the Tensorflow Object Detection API models resizing techniques. +For details, refer to the Tensorflow Object Detection API models resizing techniques. ### How To Fix Non-Reshape-able Model @@ -179,6 +181,8 @@ There are other approaches to change model input shapes during the stage of @@ -230,7 +234,7 @@ Dictionary values (representing new shapes) could be @endsphinxdirective -Please find usage scenarios of `reshape` feature in our [samples](Samples_Overview.md) and [demos](ToDo), starting with [Hello Reshape Sample](../../samples/python/hello_reshape_ssd/README.html) +Please find usage scenarios of `reshape` feature in our [samples](Samples_Overview.md), starting with [Hello Reshape Sample](../../samples/python/hello_reshape_ssd/README.html) Practically, some models are not ready to be reshaped. In this case, a new input shape cannot be set with the Model Optimizer or the `Model.reshape` method. diff --git a/docs/OV_Runtime_UG/auto_device_selection.md b/docs/OV_Runtime_UG/auto_device_selection.md index bfb65a51dbf..d1a388be676 100644 --- a/docs/OV_Runtime_UG/auto_device_selection.md +++ b/docs/OV_Runtime_UG/auto_device_selection.md @@ -205,14 +205,14 @@ For unlimited device choice: @sphinxdirective .. code-block:: sh - ./benchmark_app –d AUTO –m -i -niter 1000 + benchmark_app –d AUTO –m -i -niter 1000 @endsphinxdirective For limited device choice: @sphinxdirective .. code-block:: sh - ./benchmark_app –d AUTO:CPU,GPU,MYRIAD –m -i -niter 1000 + benchmark_app –d AUTO:CPU,GPU,MYRIAD –m -i -niter 1000 @endsphinxdirective For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.md) or [Python](../../tools/benchmark_tool/README.md) version instructions. diff --git a/docs/OV_Runtime_UG/automatic_batching.md b/docs/OV_Runtime_UG/automatic_batching.md index d21fe94b61e..54c1a6d26a3 100644 --- a/docs/OV_Runtime_UG/automatic_batching.md +++ b/docs/OV_Runtime_UG/automatic_batching.md @@ -9,77 +9,86 @@ The feature primarily targets existing code written for inferencing many request As explained below, the auto-batching functionality can be also used via a special *virtual* device. Batching is a straightforward way of leveraging the GPU compute power and saving on communication overheads. The automatic batching is _implicitly_ triggered on the GPU when the `ov::hint::PerformanceMode::THROUGHPUT` is specified for the `ov::hint::performance_mode` property for the compile_model or set_property calls. -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp - :language: cpp - :fragment: [compile_model] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_auto_batching.cpp compile_model + +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_auto_batching.py compile_model + +@endsphinxtab + +@endsphinxtabset - .. doxygensnippet:: docs/snippets/ov_auto_batching.py - :language: python - :fragment: [compile_model] -@endsphinxdirective > **NOTE**: You can disable the Auto-Batching (for example, for the GPU device) from being triggered by the `ov::hint::PerformanceMode::THROUGHPUT`. To do that, pass the `ov::hint::allow_auto_batching` set to **false** in addition to the `ov::hint::performance_mode`: -@sphinxdirective -.. tab:: C++ - .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp - :language: cpp - :fragment: [compile_model_no_auto_batching] +@sphinxtabset -.. tab:: Python +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_auto_batching.py - :language: python - :fragment: [compile_model_no_auto_batching] +@snippet docs/snippets/ov_auto_batching.cpp compile_model_no_auto_batching -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_auto_batching.py compile_model_no_auto_batching + +@endsphinxtab + +@endsphinxtabset Alternatively, to enable the Auto-Batching in the legacy apps not akin to the notion of the performance hints, you may need to use the **explicit** device notion, such as 'BATCH:GPU'. In both cases (the *throughput* hint or explicit BATCH device), the optimal batch size selection happens automatically (the implementation queries the `ov::optimal_batch_size` property from the device, passing the model's graph as the parameter). The actual value depends on the model and device specifics, for example, on-device memory for the dGPUs. Auto-Batching support is not limited to the GPUs, but if a device does not support the `ov::optimal_batch_size` yet, it can work with the auto-batching only when specifying an explicit batch size, for example, "BATCH:(16)". This _automatic batch size selection_ assumes that the application queries the `ov::optimal_number_of_infer_requests` to create and run the returned number of requests simultaneously: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp - :language: cpp - :fragment: [query_optimal_num_requests] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_auto_batching.cpp query_optimal_num_requests - .. doxygensnippet:: docs/snippets/ov_auto_batching.py - :language: python - :fragment: [query_optimal_num_requests] +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_auto_batching.py query_optimal_num_requests + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective If not enough inputs were collected, the `timeout` value makes the transparent execution fall back to the execution of individual requests. Configuration-wise, this is the AUTO_BATCH_TIMEOUT property. The timeout, which adds itself to the execution time of the requests, heavily penalizes the performance. To avoid this, in cases when your parallel slack is bounded, give the OpenVINO an additional hint. For example, the application processes only 4 video streams, so there is no need to use a batch larger than 4. The most future-proof way to communicate the limitations on the parallelism is to equip the performance hint with the optional `ov::hint::num_requests` configuration key set to 4. For the GPU this will limit the batch size, for the CPU - the number of inference streams, so each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp - :language: cpp - :fragment: [hint_num_requests] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_auto_batching.cpp hint_num_requests - .. doxygensnippet:: docs/snippets/ov_auto_batching.py - :language: python - :fragment: [hint_num_requests] +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_auto_batching.py hint_num_requests + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective For the *explicit* usage, you can limit the batch size using "BATCH:GPU(4)", where 4 is the number of requests running in parallel. @@ -89,6 +98,7 @@ To achieve the best performance with the Automatic Batching, the application sho - Operate the number of inference requests that represents the multiple of the batch size. In the above example, for batch size 4, the application should operate 4, 8, 12, 16, etc. requests. - Use the requests, grouped by the batch size, together. For example, the first 4 requests are inferred, while the second group of the requests is being populated. Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches. - Balance the 'timeout' value vs the batch size. For example, in many cases having a smaller timeout value/batch size may yield better performance than large batch size, but with the timeout value that is not large enough to accommodate the full number of the required requests. + - When the Automatic Batching is enabled, the 'timeout' property of the `ov::CompiledModel` can be changed any time, even after model loading/compilation. For example, setting the value to 0 effectively disables the auto-batching, as requests' collection would be omitted. - Carefully apply the auto-batching to the pipelines. For example for the conventional video-sources->detection->classification flow, it is the most benefical to do auto-batching over the inputs to the detection stage. Whereas the resulting number of detections is usually fluent, which makes the auto-batching less applicable for the classification stage. The following are limitations of the current implementations: @@ -110,11 +120,12 @@ Following the OpenVINO convention for devices names, the *batching* device is na ### Testing Automatic Batching Performance with the Benchmark_App The `benchmark_app`, that exists in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the performance of the Automatic Batching: - The most straighforward way is performance hints: -- - benchmark_app **-hint tput** -d GPU -m 'path to your favorite model' + - benchmark_app **-hint tput** -d GPU -m 'path to your favorite model' - Overriding the strict rules of implicit reshaping by the batch dimension via the explicit device notion: -- - benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model' + - benchmark_app **-hint none -d BATCH:GPU** -m 'path to your favorite model' - Finally, overriding the automatically-deduced batch size as well: -- - $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model' + - $benchmark_app -hint none -d **BATCH:GPU(16)** -m 'path to your favorite model' + - notice that some shell versions (e.g. `bash`) may require adding quotes around complex device names, i.e. -d "BATCH:GPU(16)" The last example is also applicable to the CPU or any other device that generally supports the batched execution. diff --git a/docs/install_guides/deployment-manager-tool.md b/docs/OV_Runtime_UG/deployment/deployment-manager-tool.md similarity index 52% rename from docs/install_guides/deployment-manager-tool.md rename to docs/OV_Runtime_UG/deployment/deployment-manager-tool.md index 23e1ecebbae..59b91a89f3b 100644 --- a/docs/install_guides/deployment-manager-tool.md +++ b/docs/OV_Runtime_UG/deployment/deployment-manager-tool.md @@ -1,4 +1,4 @@ -# OpenVINO™ Deployment Manager Guide {#openvino_docs_install_guides_deployment_manager_tool} +# Deployment Manager {#openvino_docs_install_guides_deployment_manager_tool} The Deployment Manager is a Python* command-line tool that creates a deployment package by assembling the model, IR files, your application, and associated dependencies into a runtime package for your target device. This tool is delivered within the Intel® Distribution of OpenVINO™ toolkit for Linux*, Windows* and macOS* release packages and is available after installation in the `/tools/deployment_manager` directory. @@ -6,17 +6,18 @@ The Deployment Manager is a Python* command-line tool that creates a deployment * Intel® Distribution of OpenVINO™ toolkit * To run inference on a target device other than CPU, device drivers must be pre-installed: - * **For Linux**, see the following sections in the [installation instructions for Linux](../install_guides/installing-openvino-linux.md): - * Steps for Intel® Processor Graphics (GPU) section - * Steps for Intel® Neural Compute Stick 2 section - * Steps for Intel® Vision Accelerator Design with Intel® Movidius™ VPUs - * **For Windows**, see the following sections in the [installation instructions for Windows](../install_guides/installing-openvino-windows.md): - * Steps for Intel® Processor Graphics (GPU) - * Steps for the Intel® Vision Accelerator Design with Intel® Movidius™ VPUs - * **For macOS**, see the following section in the [installation instructions for macOS](../install_guides/installing-openvino-macos.md): - * Steps for Intel® Neural Compute Stick 2 section - -> **IMPORTANT**: The operating system on the target system must be the same as the development system on which you are creating the package. For example, if the target system is Ubuntu 18.04, the deployment package must be created from the OpenVINO™ toolkit installed on Ubuntu 18.04. + * **For Linux**, see the following sections in the [installation instructions for Linux](../../install_guides/installing-openvino-linux.md): + * Steps for [Intel® Processor Graphics (GPU)](../../install_guides/configurations-for-intel-gpu.md) section + * Steps for [Intel® Neural Compute Stick 2 section](../../install_guides/configurations-for-ncs2.md) + * Steps for [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](../../install_guides/installing-openvino-config-ivad-vpu.md) + * Steps for [Intel® Gaussian & Neural Accelerator (GNA)](../../install_guides/configurations-for-intel-gna.md) + * **For Windows**, see the following sections in the [installation instructions for Windows](../../install_guides/installing-openvino-windows.md): + * Steps for [Intel® Processor Graphics (GPU)](../../install_guides/configurations-for-intel-gpu.md) + * Steps for the [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs](../../install_guides/installing-openvino-config-ivad-vpu.md) + * **For macOS**, see the following section in the [installation instructions for macOS](../../install_guides/installing-openvino-macos.md): + * Steps for [Intel® Neural Compute Stick 2 section](../../install_guides/configurations-for-ncs2.md) + +> **IMPORTANT**: The operating system on the target system must be the same as the development system on which you are creating the package. For example, if the target system is Ubuntu 18.04, the deployment package must be created from the OpenVINO™ toolkit installed on Ubuntu 18.04. > **TIP**: If your application requires additional dependencies, including the Microsoft Visual C++ Redistributable, use the ['--user_data' option](https://docs.openvino.ai/latest/openvino_docs_install_guides_deployment_manager_tool.html#run-standard-cli-mode) to add them to the deployment archive. Install these dependencies on the target host before running inference. @@ -31,77 +32,77 @@ There are two ways to create a deployment package that includes inference-relate .. raw:: html
- + @endsphinxdirective - + Interactive mode provides a user-friendly command-line interface that will guide you through the process with text prompts. -1. To launch the Deployment Manager in interactive mode, open a new terminal window, go to the Deployment Manager tool directory and run the tool script without parameters: +To launch the Deployment Manager in interactive mode, open a new terminal window, go to the Deployment Manager tool directory and run the tool script without parameters: - @sphinxdirective +@sphinxdirective - .. tab:: Linux - - .. code-block:: sh - - cd /tools/deployment_manager +.. tab:: Linux - ./deployment_manager.py + .. code-block:: sh - .. tab:: Windows - - .. code-block:: bat + cd /tools/deployment_manager + + ./deployment_manager.py + +.. tab:: Windows - cd \deployment_tools\tools\deployment_manager - .\deployment_manager.py - - .. tab:: macOS - - .. code-block:: sh + .. code-block:: bat + + cd \deployment_tools\tools\deployment_manager + .\deployment_manager.py + +.. tab:: macOS - cd /tools/deployment_manager - ./deployment_manager.py + .. code-block:: sh + + cd /tools/deployment_manager + ./deployment_manager.py @endsphinxdirective -2. The target device selection dialog is displayed: +The target device selection dialog is displayed: - ![Deployment Manager selection dialog](../img/selection_dialog.png) +![Deployment Manager selection dialog](../img/selection_dialog.png) - Use the options provided on the screen to complete selection of the target devices and press **Enter** to proceed to the package generation dialog. if you want to interrupt the generation process and exit the program, type **q** and press **Enter**. +Use the options provided on the screen to complete selection of the target devices and press **Enter** to proceed to the package generation dialog. if you want to interrupt the generation process and exit the program, type **q** and press **Enter**. -3. Once you accept the selection, the package generation dialog is displayed: +Once you accept the selection, the package generation dialog is displayed: - ![Deployment Manager configuration dialog](../img/configuration_dialog.png) +![Deployment Manager configuration dialog](../img/configuration_dialog.png) - The target devices you have selected at the previous step appear on the screen. To go back and change the selection, type **b** and press **Enter**. Use the options provided to configure the generation process, or use the default settings. +The target devices you have selected at the previous step appear on the screen. To go back and change the selection, type **b** and press **Enter**. Use the options provided to configure the generation process, or use the default settings. - * `o. Change output directory` (optional): Path to the output directory. By default, it's set to your home directory. +* `o. Change output directory` (optional): Path to the output directory. By default, it's set to your home directory. - * `u. Provide (or change) path to folder with user data` (optional): Path to a directory with user data (IRs, models, datasets, etc.) files and subdirectories required for inference, which will be added to the deployment archive. By default, it's set to `None`, which means you will separately copy the user data to the target system. +* `u. Provide (or change) path to folder with user data` (optional): Path to a directory with user data (IRs, models, datasets, etc.) files and subdirectories required for inference, which will be added to the deployment archive. By default, it's set to `None`, which means you will separately copy the user data to the target system. - * `t. Change archive name` (optional): Deployment archive name without extension. By default, it is set to `openvino_deployment_package`. +* `t. Change archive name` (optional): Deployment archive name without extension. By default, it is set to `openvino_deployment_package`. -4. Once all the parameters are set, type **g** and press **Enter** to generate the package for the selected target devices. To interrupt the generation process and exit the program, type **q** and press **Enter**. +Once all the parameters are set, type **g** and press **Enter** to generate the package for the selected target devices. To interrupt the generation process and exit the program, type **q** and press **Enter**. - The script successfully completes and the deployment package is generated in the specified output directory. +The script successfully completes and the deployment package is generated in the specified output directory. @sphinxdirective -.. raw:: html +.. raw:: html + +
- - @endsphinxdirective ### Run Standard CLI Mode - + @sphinxdirective .. raw:: html
- + @endsphinxdirective Alternatively, you can run the Deployment Manager tool in the standard CLI mode. In this mode, you specify the target devices and other parameters as command-line arguments of the Deployment Manager Python script. This mode facilitates integrating the tool in an automation pipeline. @@ -113,29 +114,29 @@ To launch the Deployment Manager tool in the standard mode, open a new terminal .. tab:: Linux .. code-block:: sh - + cd /tools/deployment_manager - ./deployment_manager.py <--targets> [--output_dir] [--archive_name] [--user_data] - -.. tab:: Windows + ./deployment_manager.py <--targets> [--output_dir] [--archive_name] [--user_data] - .. code-block:: bat +.. tab:: Windows - cd \deployment_tools\tools\deployment_manager + .. code-block:: bat + + cd \tools\deployment_manager .\deployment_manager.py <--targets> [--output_dir] [--archive_name] [--user_data] - -.. tab:: macOS + +.. tab:: macOS .. code-block:: sh cd /tools/deployment_manager ./deployment_manager.py <--targets> [--output_dir] [--archive_name] [--user_data] - + @endsphinxdirective The following options are available: -* `<--targets>` (required): List of target devices to run inference. To specify more than one target, separate them with spaces. For example: `--targets cpu gpu vpu`. You can get a list of currently available targets by running the program with the `-h` option. +* `<--targets>` (required): List of target devices to run inference. To specify more than one target, separate them with spaces. For example: `--targets cpu gpu vpu`. You can get a list of currently available targets by running the program with the `-h` option. * `[--output_dir]` (optional): Path to the output directory. By default, it is set to your home directory. @@ -147,80 +148,82 @@ The script successfully completes, and the deployment package is generated in th @sphinxdirective -.. raw:: html +.. raw:: html + +
- - @endsphinxdirective ## Deploy Package on Target Systems -After the Deployment Manager has successfully completed, you can find the generated `.tar.gz` (for Linux or macOS) or `.zip` (for Windows) package in the output directory you specified. +After the Deployment Manager has successfully completed, you can find the generated `.tar.gz` (for Linux or macOS) or `.zip` (for Windows) package in the output directory you specified. To deploy the OpenVINO Runtime components from the development machine to the target system, perform the following steps: 1. Copy the generated archive to the target system using your preferred method. 2. Unpack the archive into the destination directory on the target system (if your archive name is different from the default shown below, replace the `openvino_deployment_package` with the name you use). +@sphinxdirective - @sphinxdirective - - .. tab:: Linux - - .. code-block:: sh - - tar xf openvino_deployment_package.tar.gz -C - - .. tab:: Windows - - Use the archiver of your choice to unzip the file. - - .. tab:: macOS - - .. code-block:: sh - - tar xf openvino_deployment_package.tar.gz -C - - @endsphinxdirective +.. tab:: Linux + + .. code-block:: sh + + tar xf openvino_deployment_package.tar.gz -C + +.. tab:: Windows + + .. code-block:: bat + + Use the archiver of your choice to unzip the file. + +.. tab:: macOS + + .. code-block:: sh + + tar xf openvino_deployment_package.tar.gz -C + +@endsphinxdirective + + The package is unpacked to the destination directory and the following files and subdirectories are created: -The package is unpacked to the destination directory and the following files and subdirectories are created: - * `setupvars.sh` — Copy of `setupvars.sh` * `runtime` — Contains the OpenVINO runtime binary files. * `install_dependencies` — Snapshot of the `install_dependencies` directory from the OpenVINO installation directory. * `` — The directory with the user data (IRs, datasets, etc.) you specified while configuring the package. -3. For Linux, to run inference on a target Intel® GPU, Intel® Movidius™ VPU, or Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, you need to install additional dependencies by running the `install_openvino_dependencies.sh` script on the target machine: - ```sh - cd /openvino/install_dependencies - sudo -E ./install_openvino_dependencies.sh - ``` +For Linux, to run inference on a target Intel® GPU, Intel® Movidius™ VPU, or Intel® Vision Accelerator Design with Intel® Movidius™ VPUs, you need to install additional dependencies by running the `install_openvino_dependencies.sh` script on the target machine: + +```sh +cd /openvino/install_dependencies +sudo -E ./install_openvino_dependencies.sh +``` + +Set up the environment variables: -4. Set up the environment variables: - - @sphinxdirective - - .. tab:: Linux - - .. code-block:: sh +@sphinxdirective - cd /openvino/ - source ./setupvars.sh - - .. tab:: Windows - - .. code-block:: bat +.. tab:: Linux - cd \openvino\ - .\setupvars.bat - - .. tab:: macOS - - .. code-block:: sh + .. code-block:: sh + + cd /openvino/ + source ./setupvars.sh - cd /openvino/ - source ./setupvars.sh - - @endsphinxdirective +.. tab:: Windows + + .. code-block:: bat + + cd \openvino\ + .\setupvars.bat + +.. tab:: macOS + + .. code-block:: sh + + cd /openvino/ + source ./setupvars.sh + +@endsphinxdirective You have now finished the deployment of the OpenVINO Runtime components to the target system. diff --git a/docs/OV_Runtime_UG/deployment/deployment_intro.md b/docs/OV_Runtime_UG/deployment/deployment_intro.md new file mode 100644 index 00000000000..6dbf2d71df4 --- /dev/null +++ b/docs/OV_Runtime_UG/deployment/deployment_intro.md @@ -0,0 +1,68 @@ +# Deploy with OpenVINO {#openvino_deployment_guide} + +@sphinxdirective + +.. toctree:: + :maxdepth: 1 + :hidden: + + openvino_docs_install_guides_deployment_manager_tool + openvino_docs_deploy_local_distribution + +@endsphinxdirective + +Once the [OpenVINO application development](../integrate_with_your_application.md) has been finished, usually application developers need to deploy their applications to end users. There are several ways how to achieve that: + +- Set a dependency on existing prebuilt packages (so called _centralized distribution_): + - Using Debian / RPM packages, a recommended way for a family of Linux operation systems + - Using pip package manager on PyPi, default approach for Python-based applications + - Using Docker images. If the application should be deployed as a Docker image, developer can use a pre-built runtime OpenVINO Docker image as a base image in the Dockerfile for the application container image. You can find more info about available OpenVINO Docker images in the Install Guides for [Linux](../../install_guides/installing-openvino-docker-linux.md) and [Windows](../../install_guides/installing-openvino-docker-windows.md). +Also, if you need to customize OpenVINO Docker image, you can use [Docker CI Framework](https://github.com/openvinotoolkit/docker_ci) to generate a Dockerfile and built it. +- Grab a necessary functionality of OpenVINO together with your application (so-called _local distribution_): + - Using [OpenVINO Deployment manager](deployment-manager-tool.md) providing a convinient way create a distribution package + - Using advanced [Local distribution](local-distribution.md) approach + - Using [static version of OpenVINO Runtime linked into the final app](https://github.com/openvinotoolkit/openvino/wiki/StaticLibraries) + +The table below shows which distribution type can be used depending on target operation system: + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +| Distribution type | Operation systems | +|------- ---------- | ----------------- | +| Debian packages | Ubuntu 18.04 long-term support (LTS), 64-bit; Ubuntu 20.04 long-term support (LTS), 64-bit | +| RMP packages | Red Hat Enterprise Linux 8, 64-bit | +| Docker images | Ubuntu 18.04 long-term support (LTS), 64-bit; Ubuntu 20.04 long-term support (LTS), 64-bit; Red Hat Enterprise Linux 8, 64-bit; Windows Server Core base LTSC 2019, 64-bit; Windows 10, version 20H2, 64-bit | +| PyPi (pip package manager) | See [https://pypi.org/project/openvino/](https://pypi.org/project/openvino/) | +| [OpenVINO Deployment Manager](deployment-manager-tool.md) | All operation systems | +| [Local distribution](local-distribution.md) | All operation systems | +| [Build OpenVINO statically and link into the final app](https://github.com/openvinotoolkit/openvino/wiki/StaticLibraries) | All operation systems | + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +Depending on the distribution type, the granularity of OpenVINO packages may vary: PyPi distribution [OpenVINO has a single package 'openvino'](https://pypi.org/project/openvino/) containing all the runtime libraries and plugins, while more configurable ways like [Local distribution](local-distribution.md) provide higher granularity, so it is important to now some details about the set of libraries which are part of OpenVINO Runtime package: + +![deployment_simplified] + +- The main library `openvino` is used by C++ user's applications to link against with. The library provides all OpenVINO Runtime public API for both OpenVINO API 2.0 and Inference Engine, nGraph APIs. For C language applications `openvino_c` is additionally required for distribution. +- The _optional_ plugin libraries like `openvino_intel_cpu_plugin` (matching `openvino_.+_plugin` pattern) are used to provide inference capabilities on specific devices or additional capabitilies like [Hetero execution](../hetero_execution.md) or [Multi-Device execution](../multi_device.md). +- The _optional_ plugin libraries like `openvino_ir_frontnend` (matching `openvino_.+_frontend`) are used to provide capabilities to read models of different file formats like OpenVINO IR, ONNX or Paddle. + +The _optional_ means that if the application does not use the capability enabled by the plugin, the plugin's library or package with the plugin is not needed in the final distribution. + +The information above covers granularity aspects of majority distribution types, more detailed information is only needed and provided in [Local Distribution](local-distribution.md). + +> **NOTE**: Depending on target OpenVINO devices, you also have to use [Configurations for GPU](../../install_guides/configurations-for-intel-gpu.md), [Configurations for GNA](../../install_guides/configurations-for-intel-gna.md), [Configurations for NCS2](../../install_guides/configurations-for-ncs2.md) or [Configurations for VPU](../../install_guides/installing-openvino-config-ivad-vpu.md) for proper configuration of deployed machines. + +[deployment_simplified]: ../../img/deployment_simplified.png diff --git a/docs/OV_Runtime_UG/deployment/local-distribution.md b/docs/OV_Runtime_UG/deployment/local-distribution.md new file mode 100644 index 00000000000..5d1af6a503c --- /dev/null +++ b/docs/OV_Runtime_UG/deployment/local-distribution.md @@ -0,0 +1,162 @@ +# Local distribution {#openvino_docs_deploy_local_distribution} + +The local distribution implies that each C or C++ application / installer will have its own copies of OpenVINO Runtime binaries. However, OpenVINO has a scalable plugin-based architecture which implies that some components can be loaded in runtime only if they are really needed. So, it is important to understand which minimal set of libraries is really needed to deploy the application and this guide helps to achieve this goal. + +> **NOTE**: The steps below are operation system independent and refer to a library file name without any prefixes (like `lib` on Unix systems) or suffixes (like `.dll` on Windows OS). Do not put `.lib` files on Windows OS to the distribution, because such files are needed only on a linker stage. + +Local dsitribution is also appropriate for OpenVINO binaries built from sources using [Build instructions](https://github.com/openvinotoolkit/openvino/wiki#how-to-build), but the guide below supposes OpenVINO Runtime is built dynamically. For case of [Static OpenVINO Runtime](https://github.com/openvinotoolkit/openvino/wiki/StaticLibraries) select the required OpenVINO capabilities on CMake configuration stage using [CMake Options for Custom Compilation](https://github.com/openvinotoolkit/openvino/wiki/CMakeOptionsForCustomCompilation), the build and link the OpenVINO components into the final application. + +### C++ or C language + +Independently on language used to write the application, `openvino` must always be put to the final distribution since is a core library which orshectrates with all the inference and frontend plugins. +If your application is written with C language, then you need to put `openvino_c` additionally. + +The `plugins.xml` file with information about inference devices must also be taken as support file for `openvino`. + +> **NOTE**: in Intel Distribution of OpenVINO, `openvino` depends on TBB libraries which are used by OpenVINO Runtime to optimally saturate the devices with computations, so it must be put to the distribution package + +### Pluggable components + +The picture below demonstrates dependnecies between the OpenVINO Runtime core and pluggable libraries: + +![deployment_full] + +#### Compute devices + +For each inference device, OpenVINO Runtime has its own plugin library: +- `openvino_intel_cpu_plugin` for [Intel CPU devices](../supported_plugins/CPU.md) +- `openvino_intel_gpu_plugin` for [Intel GPU devices](../supported_plugins/GPU.md) +- `openvino_intel_gna_plugin` for [Intel GNA devices](../supported_plugins/GNA.md) +- `openvino_intel_myriad_plugin` for [Intel MYRIAD devices](../supported_plugins/MYRIAD.md) +- `openvino_intel_hddl_plugin` for [Intel HDDL device](../supported_plugins/HDDL.md) +- `openvino_arm_cpu_plugin` for [ARM CPU devices](../supported_plugins/ARM_CPU.md) + +Depending on what devices is used in the app, put the appropriate libraries to the distribution package. + +As it is shown on the picture above, some plugin libraries may have OS-specific dependencies which are either backend libraries or additional supports files with firmware, etc. Refer to the table below for details: + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +| Device | Dependency | +|-------------|------------| +| CPU | `-` | +| GPU | `OpenCL.dll`, `cache.json` | +| MYRIAD | `usb.dll`, `usb-ma2x8x.mvcmd`, `pcie-ma2x8x.elf` | +| HDDL | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll` | +| GNA | `gna.dll` | +| Arm® CPU | `-` | + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +| Device | Dependency | +|-------------|-------------| +| CPU | `-` | +| GPU | `libOpenCL.so`, `cache.json` | +| MYRIAD | `libusb.so`, `usb-ma2x8x.mvcmd`, `pcie-ma2x8x.mvcmd` | +| HDDL | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so` | +| GNA | `gna.dll` | +| Arm® CPU | `-` | + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +| Device | Dependency | +|-------------|-------------| +| CPU | `-` | +| MYRIAD | `libusb.dylib`, `usb-ma2x8x.mvcmd`, `pcie-ma2x8x.mvcmd` | +| Arm® CPU | `-` | + +@sphinxdirective + +.. raw:: html + +
+ +@endsphinxdirective + +#### Execution capabilities + +`HETERO`, `MULTI`, `BATCH`, `AUTO` execution capabilities can also be used explicitly or implicitly by the application. Use the following recommendation scheme to decide whether to put the appropriate libraries to the distribution package: +- If [AUTO](../auto_device_selection.md) is used explicitly in the application or `ov::Core::compile_model` is used without specifying a device, put the `openvino_auto_plugin` to the distribution + > **NOTE**: Auto device selection relies on [inference device plugins](../supported_plugins/Device_Plugins.md), so if are not sure what inference devices are available on target machine, put all inference plugin libraries to the distribution. If the `ov::device::priorities` is used for `AUTO` to specify a limited device list, grab the corresponding device plugins only. + +- If [MULTI](../multi_device.md) is used explicitly, put the `openvino_auto_plugin` to the distribution +- If [HETERO](../hetero_execution.md) is either used explicitly or `ov::hint::performance_mode` is used with GPU, put the `openvino_hetero_plugin` to the distribution +- If [BATCH](../automatic_batching.md) is either used explicitly or `ov::hint::performance_mode` is used with GPU, put the `openvino_batch_plugin` to the distribution + +#### Reading models + +OpenVINO Runtime uses frontend libraries dynamically to read models in different formats: +- To read OpenVINO IR `openvino_ir_frontend` is used +- To read ONNX file format `openvino_onnx_frontend` is used +- To read Paddle file format `openvino_paddle_frontend` is used + +Depending on what types of model file format are used in the application in `ov::Core::read_model`, peek up the appropriate libraries. + +> **NOTE**: The recommended way to optimize the size of final distribution package is to [convert models using Model Optimizer](../../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to OpenVINO IR, in this case you don't have to keep ONNX, Paddle and other frontend libraries in the distribution package. + +#### (Legacy) Preprocessing via G-API + +> **NOTE**: [G-API](../../gapi/gapi_intro.md) preprocessing is a legacy functionality, use [preprocessing capabilities from OpenVINO 2.0](../preprocessing_overview.md) which do not require any additional libraries. + +If the application uses `InferenceEngine::PreProcessInfo::setColorFormat` or `InferenceEngine::PreProcessInfo::setResizeAlgorithm` methods, OpenVINO Runtime dynamically loads `openvino_gapi_preproc` plugin to perform preprocessing via G-API. + +### Examples + +#### CPU + IR in C-written application + +C-written application performs inference on CPU and reads models stored as OpenVINO IR: +- `openvino_c` library is a main dependency of the application. It links against this library +- `openvino` is used as a private dependency for `openvino` and also used in the deployment +- `openvino_intel_cpu_plugin` is used for inference +- `openvino_ir_frontend` is used to read source model + +#### MULTI execution on GPU and MYRIAD in tput mode + +C++ written application performs inference [simultaneously on GPU and MYRIAD devices](../multi_device.md) with `ov::hint::PerformanceMode::THROUGHPUT` property, reads models stored in ONNX file format: +- `openvino` library is a main dependency of the application. It links against this library +- `openvino_intel_gpu_plugin` and `openvino_intel_myriad_plugin` are used for inference +- `openvino_auto_plugin` is used for `MULTI` multi-device execution +- `openvino_auto_batch_plugin` can be also put to the distribution to improve saturation of [Intel GPU](../supported_plugins/GPU.md) device. If there is no such plugin, [Automatic batching](../automatic_batching.md) is turned off. +- `openvino_onnx_frontend` is used to read source model + +#### Auto device selection between HDDL and CPU + +C++ written application performs inference with [automatic device selection](../auto_device_selection.md) with device list limited to HDDL and CPU, model is [created using C++ code](../model_representation.md): +- `openvino` library is a main dependency of the application. It links against this library +- `openvino_auto_plugin` is used to enable automatic device selection feature +- `openvino_intel_hddl_plugin` and `openvino_intel_cpu_plugin` are used for inference, `AUTO` selects between CPU and HDDL devices according to their physical existance on deployed machine. +- No frontend library is needed because `ov::Model` is created in code. + +[deployment_full]: ../../img/deployment_full.png diff --git a/docs/OV_Runtime_UG/hetero_execution.md b/docs/OV_Runtime_UG/hetero_execution.md index 2591232b6d7..1b802eb2793 100644 --- a/docs/OV_Runtime_UG/hetero_execution.md +++ b/docs/OV_Runtime_UG/hetero_execution.md @@ -31,21 +31,22 @@ Following the OpenVINO™ naming convention, the Hetero execution plugin is assi #### The Manual Mode It assumes setting affinities explicitly for all operations in the model using `ov::Node::get_rt_info` with the `"affinity"` key. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_hetero.cpp - :language: cpp - :fragment: [set_manual_affinities] +@snippet docs/snippets/ov_hetero.cpp set_manual_affinities -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_hetero.py - :language: python - :fragment: [set_manual_affinities] +@sphinxtab{Python} + +@snippet docs/snippets/ov_hetero.py set_manual_affinities + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective @@ -55,40 +56,40 @@ It decides automatically which operation is assigned to which device according t The automatic mode causes "greedy" behavior and assigns all operations that can be executed on a given device to it, according to the priorities you specify (for example, `ov::device::priorities("GPU,CPU")`). It does not take into account device peculiarities such as the inability to infer certain operations without other special operations placed before or after that layer. If the device plugin does not support the subgraph topology constructed by the HETERO device, then you should set affinity manually. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_hetero.cpp - :language: cpp - :fragment: [compile_model] +@snippet docs/snippets/ov_hetero.cpp compile_model -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_hetero.py - :language: python - :fragment: [compile_model] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_hetero.py compile_model + +@endsphinxtab + +@endsphinxtabset #### Using Manual and Automatic Modes in Combination In some cases you may need to consider manually adjusting affinities which were set automatically. It usually serves minimizing the number of total subgraphs to optimize memory transfers. To do it, you need to "fix" the automatically assigned affinities like so: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_hetero.cpp - :language: cpp - :fragment: [fix_automatic_affinities] +@snippet docs/snippets/ov_hetero.cpp fix_automatic_affinities -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_hetero.py - :language: python - :fragment: [fix_automatic_affinities] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_hetero.py fix_automatic_affinities + +@endsphinxtab + +@endsphinxtabset Importantly, the automatic mode will not work if any operation in a model has its `"affinity"` already initialized. @@ -97,21 +98,21 @@ Importantly, the automatic mode will not work if any operation in a model has it ### Configure fallback devices If you want different devices in Hetero execution to have different device-specific configuration options, you can use the special helper property `ov::device::properties`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_hetero.cpp - :language: cpp - :fragment: [configure_fallback_devices] +@snippet docs/snippets/ov_hetero.cpp configure_fallback_devices -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_hetero.py - :language: python - :fragment: [configure_fallback_devices] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_hetero.py configure_fallback_devices + +@endsphinxtab + +@endsphinxtabset In the example above, the `GPU` device is configured to enable profiling data and uses the default execution precision, while `CPU` has the configuration property to perform inference in `fp32`. diff --git a/docs/OV_Runtime_UG/img/configuration_dialog.png b/docs/OV_Runtime_UG/img/configuration_dialog.png new file mode 100644 index 00000000000..e8f3995d432 --- /dev/null +++ b/docs/OV_Runtime_UG/img/configuration_dialog.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2d6db31aee32fc54a0c58fff77aca191070da87a85148998ed837e81cd3b708e +size 42540 diff --git a/docs/OV_Runtime_UG/img/deploy_encrypted_model.png b/docs/OV_Runtime_UG/img/deploy_encrypted_model.png index 9338c59dcf2..419e0a22fb6 100644 --- a/docs/OV_Runtime_UG/img/deploy_encrypted_model.png +++ b/docs/OV_Runtime_UG/img/deploy_encrypted_model.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:25ed719bdd525dc0b606ef17a3fec5303ea032dfe6b2d167e1b19b6100b6fb37 -size 16516 +oid sha256:9ba2a85ae6c93405f9b6e11c3c41ab20ffe13e8ae64403fa9802af6d96b314b1 +size 35008 diff --git a/docs/OV_Runtime_UG/img/selection_dialog.png b/docs/OV_Runtime_UG/img/selection_dialog.png new file mode 100644 index 00000000000..86570aae170 --- /dev/null +++ b/docs/OV_Runtime_UG/img/selection_dialog.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0812f173a2fca3a3fce86d5b1df36e4d956c35bb09fcadbab0f26f17ccc97b5e +size 43417 diff --git a/docs/OV_Runtime_UG/integrate_with_your_application.md b/docs/OV_Runtime_UG/integrate_with_your_application.md index 6bc3644da58..e12bc161356 100644 --- a/docs/OV_Runtime_UG/integrate_with_your_application.md +++ b/docs/OV_Runtime_UG/integrate_with_your_application.md @@ -8,6 +8,7 @@ openvino_docs_OV_Runtime_UG_Model_Representation openvino_docs_OV_Runtime_UG_Infer_request + openvino_docs_OV_Runtime_UG_Python_API_exclusives @endsphinxdirective @@ -27,39 +28,39 @@ This section provides step-by-step instructions to implement a typical inference Include next files to work with OpenVINO™ Runtime: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [include] +@snippet docs/snippets/src/main.cpp include -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [import] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py import + +@endsphinxtab + +@endsphinxtabset Use the following code to create OpenVINO™ Core to manage available devices and read model objects: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part1] +@snippet docs/snippets/src/main.cpp part1 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part1] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py part1 + +@endsphinxtab + +@endsphinxtabset ### Step 2. Compile the Model @@ -67,61 +68,73 @@ Use the following code to create OpenVINO™ Core to manage available devices an Compile the model for a specific device using `ov::Core::compile_model()`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. tab:: IR +@sphinxtabset - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part2_1] +@sphinxtab{IR} - .. tab:: ONNX +@snippet docs/snippets/src/main.cpp part2_1 - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part2_2] +@endsphinxtab - .. tab:: PaddlePaddle +@sphinxtab{ONNX} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part2_3] +@snippet docs/snippets/src/main.cpp part2_2 - .. tab:: ov::Model +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part2_4] +@sphinxtab{PaddlePaddle} -.. tab:: Python +@snippet docs/snippets/src/main.cpp part2_3 - .. tab:: IR +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part2_1] +@sphinxtab{ov::Model} - .. tab:: ONNX +@snippet docs/snippets/src/main.cpp part2_4 - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part2_2] +@endsphinxtab - .. tab:: PaddlePaddle +@endsphinxtabset - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part2_3] +@endsphinxtab - .. tab:: ov::Model +@sphinxtab{Python} - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part2_4] +@sphinxtabset -@endsphinxdirective +@sphinxtab{IR} + +@snippet docs/snippets/src/main.py part2_1 + +@endsphinxtab + +@sphinxtab{ONNX} + +@snippet docs/snippets/src/main.py part2_2 + +@endsphinxtab + +@sphinxtab{PaddlePaddle} + +@snippet docs/snippets/src/main.py part2_3 + +@endsphinxtab + +@sphinxtab{ov::Model} + +@snippet docs/snippets/src/main.py part2_4 + +@endsphinxtab + +@endsphinxtabset + +@endsphinxtab + +@endsphinxtabset The `ov::Model` object represents any models inside the OpenVINO™ Runtime. For more details please read article about [OpenVINO™ Model representation](model_representation.md). @@ -134,61 +147,61 @@ To learn how to change the device configuration, read the [Query device properti `ov::InferRequest` class provides methods for model inference in OpenVINO™ Runtime. Create an infer request using the following code (see [InferRequest detailed documentation](./ov_infer_request.md) for more details): -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part3] +@snippet docs/snippets/src/main.cpp part3 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part3] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py part3 + +@endsphinxtab + +@endsphinxtabset ### Step 4. Set Inputs You can use external memory to create `ov::Tensor` and use the `ov::InferRequest::set_input_tensor` method to put this tensor on the device: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part4] +@snippet docs/snippets/src/main.cpp part4 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part4] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py part4 + +@endsphinxtab + +@endsphinxtabset ### Step 5. Start Inference OpenVINO™ Runtime supports inference in either synchronous or asynchronous mode. Using the Async API can improve application's overall frame-rate, because rather than wait for inference to complete, the app can keep working on the host, while the accelerator is busy. You can use `ov::InferRequest::start_async` to start model inference in the asynchronous mode and call `ov::InferRequest::wait` to wait for the inference results: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part5] +@snippet docs/snippets/src/main.cpp part5 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part5] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py part5 + +@endsphinxtab + +@endsphinxtabset This section demonstrates a simple pipeline, to get more information about other ways to perform inference, read the dedicated ["Run inference" section](./ov_infer_request.md). @@ -196,21 +209,21 @@ This section demonstrates a simple pipeline, to get more information about other Go over the output tensors and process the inference results. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/src/main.cpp - :language: cpp - :fragment: [part6] +@snippet docs/snippets/src/main.cpp part6 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/src/main.py - :language: python - :fragment: [part6] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/src/main.py part6 + +@endsphinxtab + +@endsphinxtabset ## Link and Build Your C++ Application with OpenVINO™ Runtime diff --git a/docs/OV_Runtime_UG/layout_overview.md b/docs/OV_Runtime_UG/layout_overview.md index c164fb8a1be..493e56b08c9 100644 --- a/docs/OV_Runtime_UG/layout_overview.md +++ b/docs/OV_Runtime_UG/layout_overview.md @@ -20,82 +20,82 @@ Reasons when you may want to care about input/output layout: ### Short The easiest way is to fully specify each dimension with one alphabetical letter -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:simple] +@snippet docs/snippets/ov_layout.cpp ov:layout:simple -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:simple] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:simple + +@endsphinxtab + +@endsphinxtabset This assigns 'N' to first dimension, 'C' to second, 'H' to 3rd and 'W' to 4th ### Advanced Advanced syntax allows assigning a word to a dimension. To do this, wrap layout with square brackets `[]` and specify each name separated by comma `,` -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:complex] +@snippet docs/snippets/ov_layout.cpp ov:layout:complex -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:complex] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:complex + +@endsphinxtab + +@endsphinxtabset ### Partially defined layout If some dimension is not important, it's name can be set to `?` -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:partially_defined] +@snippet docs/snippets/ov_layout.cpp ov:layout:partially_defined -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:partially_defined] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:partially_defined + +@endsphinxtab + +@endsphinxtabset ### Dynamic layout If number of dimensions is not important, ellipsis `...` can be used to specify variadic number of dimensions. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:dynamic] +@snippet docs/snippets/ov_layout.cpp ov:layout:dynamic -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:dynamic] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:dynamic + +@endsphinxtab + +@endsphinxtabset ### Predefined names @@ -108,22 +108,21 @@ Layout has pre-defined some widely used in computer vision dimension names: These names are used in [PreProcessing API](./preprocessing_overview.md) and there is a set of helper functions to get appropriate dimension index from layout -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:predefined] +@snippet docs/snippets/ov_layout.cpp ov:layout:predefined -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:predefined] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:predefined +@endsphinxtab + +@endsphinxtabset ### Equality @@ -133,21 +132,21 @@ Layout names are case-insensitive, which means that ```Layout("NCHW") == Layout( Layout can be converted to string in advanced syntax format. Can be useful for debugging and serialization purposes -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_layout.cpp - :language: cpp - :fragment: [ov:layout:dump] +@snippet docs/snippets/ov_layout.cpp ov:layout:dump -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_layout.py - :language: python - :fragment: [ov:layout:dump] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_layout.py ov:layout:dump + +@endsphinxtab + +@endsphinxtabset ## See also diff --git a/docs/OV_Runtime_UG/migration_ov_2_0/common_inference_pipeline.md b/docs/OV_Runtime_UG/migration_ov_2_0/common_inference_pipeline.md index c7d0e6594f8..7f3aa58d28b 100644 --- a/docs/OV_Runtime_UG/migration_ov_2_0/common_inference_pipeline.md +++ b/docs/OV_Runtime_UG/migration_ov_2_0/common_inference_pipeline.md @@ -80,162 +80,162 @@ OpenVINO™ Runtime API 2.0: Inference Engine API fills inputs as `I32` precision (**not** aligned with the original model): -@sphinxdirective +@sphinxtabset -.. tab:: IR v10 +@sphinxtab{IR v10} - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_input_tensor] +@snippet docs/snippets/ie_common.cpp ie:get_input_tensor -.. tab:: IR v11 +@endsphinxtab - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_input_tensor] +@sphinxtab{IR v11} -.. tab:: ONNX +@snippet docs/snippets/ie_common.cpp ie:get_input_tensor - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_input_tensor] +@endsphinxtab -.. tab:: Model created in code +@sphinxtab{ONNX} - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_input_tensor] +@snippet docs/snippets/ie_common.cpp ie:get_input_tensor -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Model created in code} + +@snippet docs/snippets/ie_common.cpp ie:get_input_tensor + +@endsphinxtab + +@endsphinxtabset OpenVINO™ Runtime API 2.0 fills inputs as `I64` precision (aligned with the original model): -@sphinxdirective +@sphinxtabset -.. tab:: IR v10 +@sphinxtab{IR v10} - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_input_tensor_v10] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_input_tensor_v10 -.. tab:: IR v11 +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_input_tensor_aligned] +@sphinxtab{IR v11} -.. tab:: ONNX +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_input_tensor_aligned - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_input_tensor_aligned] +@endsphinxtab -.. tab:: Model created in code +@sphinxtab{ONNX} - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_input_tensor_aligned] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_input_tensor_aligned -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Model created in code} + +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_input_tensor_aligned + +@endsphinxtab + +@endsphinxtabset ## 6. Start Inference Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: Sync +@sphinxtab{Sync} - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:inference] +@snippet docs/snippets/ie_common.cpp ie:inference -.. tab:: Async +@endsphinxtab - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:start_async_and_wait] +@sphinxtab{Async} -@endsphinxdirective +@snippet docs/snippets/ie_common.cpp ie:start_async_and_wait + +@endsphinxtab + +@endsphinxtabset OpenVINO™ Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: Sync +@sphinxtab{Sync} - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:inference] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:inference -.. tab:: Async +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:start_async_and_wait] +@sphinxtab{Async} -@endsphinxdirective +@snippet docs/snippets/ov_common.cpp ov_api_2_0:start_async_and_wait + +@endsphinxtab + +@endsphinxtabset ## 7. Process the Inference Results Inference Engine API processes outputs as `I32` precision (**not** aligned with the original model): -@sphinxdirective +@sphinxtabset -.. tab:: IR v10 +@sphinxtab{IR v10} - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_output_tensor] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:inference -.. tab:: IR v11 +@endsphinxtab - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_output_tensor] +@sphinxtab{IR v11} -.. tab:: ONNX +@snippet docs/snippets/ie_common.cpp ie:get_output_tensor - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_output_tensor] +@endsphinxtab -.. tab:: Model created in code +@sphinxtab{ONNX} - .. doxygensnippet:: docs/snippets/ie_common.cpp - :language: cpp - :fragment: [ie:get_output_tensor] +@snippet docs/snippets/ie_common.cpp ie:get_output_tensor -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Model created in code} + +@snippet docs/snippets/ie_common.cpp ie:get_output_tensor + +@endsphinxtab + +@endsphinxtabset OpenVINO™ Runtime API 2.0 processes outputs: - For IR v10 as `I32` precision (**not** aligned with the original model) to match **old** behavior - For IR v11, ONNX, ov::Model, Paddle as `I64` precision (aligned with the original model) to match **new** behavior -@sphinxdirective +@sphinxtabset -.. tab:: IR v10 +@sphinxtab{IR v10} - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_output_tensor_v10] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_output_tensor_v10 -.. tab:: IR v11 +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_output_tensor_aligned] +@sphinxtab{IR v11} -.. tab:: ONNX +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_output_tensor_aligned - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_output_tensor_aligned] +@endsphinxtab -.. tab:: Model created in code +@sphinxtab{ONNX} - .. doxygensnippet:: docs/snippets/ov_common.cpp - :language: cpp - :fragment: [ov_api_2_0:get_output_tensor_aligned] +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_output_tensor_aligned -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Model created in code} + +@snippet docs/snippets/ov_common.cpp ov_api_2_0:get_output_tensor_aligned + +@endsphinxtab + +@endsphinxtabset diff --git a/docs/OV_Runtime_UG/migration_ov_2_0/configure_devices.md b/docs/OV_Runtime_UG/migration_ov_2_0/configure_devices.md index 1ddd799a352..fb773c58549 100644 --- a/docs/OV_Runtime_UG/migration_ov_2_0/configure_devices.md +++ b/docs/OV_Runtime_UG/migration_ov_2_0/configure_devices.md @@ -20,160 +20,184 @@ The snippets below show how to migrate from Inference Engine device configuratio Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: Devices +@sphinxtab{Devices} - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_set_config] +@snippet docs/snippets/ov_properties_migration.cpp core_set_config -.. tab:: Model Loading +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_load_network] +@sphinxtab{Model Loading} -.. tab:: Execution +@snippet docs/snippets/ov_properties_migration.cpp core_load_network - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [executable_network_set_config] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Execution} + +@snippet docs/snippets/ov_properties_migration.cpp executable_network_set_config + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. tab:: Devices +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_set_property] +@sphinxtab{Devices} - .. tab:: Model Loading +@snippet docs/snippets/ov_properties_migration.cpp core_set_property - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_compile_model] +@endsphinxtab - .. tab:: Execution +@sphinxtab{Model Loading} - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [compiled_model_set_property] +@snippet docs/snippets/ov_properties_migration.cpp core_compile_model -.. tab:: Python +@endsphinxtab - .. tab:: Devices +@sphinxtab{Execution} - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [core_set_property] +@snippet docs/snippets/ov_properties_migration.cpp compiled_model_set_property - .. tab:: Model Loading +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [core_compile_model] +@endsphinxtabset - .. tab:: Execution +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [compiled_model_set_property] +@sphinxtab{Python} -@endsphinxdirective +@sphinxtabset + +@sphinxtab{Devices} + +@snippet docs/snippets/ov_properties_migration.py core_set_property + +@endsphinxtab + +@sphinxtab{Model Loading} + +@snippet docs/snippets/ov_properties_migration.py core_compile_model + +@endsphinxtab + +@sphinxtab{Execution} + +@snippet docs/snippets/ov_properties_migration.py compiled_model_set_property + +@endsphinxtab + +@endsphinxtabset + +@endsphinxtab + +@endsphinxtabset ### Get information Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: Device configuration +@sphinxtab{Device Configuration} - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_get_config] +@snippet docs/snippets/ov_properties_migration.cpp core_get_config -.. tab:: Device metrics +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_get_metric] +@sphinxtab{Device metrics} -.. tab:: Execution config +@snippet docs/snippets/ov_properties_migration.cpp core_get_metric - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [executable_network_get_config] +@endsphinxtab -.. tab:: Execution metrics +@sphinxtab{Execution config} - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [executable_network_get_metric] +@snippet docs/snippets/ov_properties_migration.cpp executable_network_get_config -@endsphinxdirective +@endsphinxtab + +@sphinxtab{Execution metrics} + +@snippet docs/snippets/ov_properties_migration.cpp executable_network_get_metric + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. tab:: Device configuration +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_get_rw_property] +@sphinxtab{Device Configuration} - .. tab:: Device metrics +@snippet docs/snippets/ov_properties_migration.cpp core_get_rw_property - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [core_get_ro_property] +@endsphinxtab - .. tab:: Execution config +@sphinxtab{Device metrics} - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [compiled_model_get_rw_property] +@snippet docs/snippets/ov_properties_migration.cpp core_get_ro_property - .. tab:: Execution metrics +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.cpp - :language: cpp - :fragment: [compiled_model_get_ro_property] +@sphinxtab{Execution config} -.. tab:: Python +@snippet docs/snippets/ov_properties_migration.cpp compiled_model_get_rw_property - .. tab:: Device configuration +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [core_get_rw_property] +@sphinxtab{Execution metrics} - .. tab:: Device metrics +@snippet docs/snippets/ov_properties_migration.cpp compiled_model_get_ro_property - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [core_get_ro_property] +@endsphinxtab - .. tab:: Execution config +@endsphinxtabset - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [compiled_model_get_rw_property] +@endsphinxtab - .. tab:: Execution metrics +@sphinxtab{Python} - .. doxygensnippet:: docs/snippets/ov_properties_migration.py - :language: python - :fragment: [compiled_model_get_ro_property] +@sphinxtabset -@endsphinxdirective +@sphinxtab{Device Configuration} + +@snippet docs/snippets/ov_properties_migration.py core_get_rw_property + +@endsphinxtab + +@sphinxtab{Device metrics} + +@snippet docs/snippets/ov_properties_migration.py core_get_ro_property + +@endsphinxtab + +@sphinxtab{Execution config} + +@snippet docs/snippets/ov_properties_migration.py compiled_model_get_rw_property + +@endsphinxtab + +@sphinxtab{Execution metrics} + +@snippet docs/snippets/ov_properties_migration.py compiled_model_get_ro_property + +@endsphinxtab + +@endsphinxtabset + +@endsphinxtab + +@endsphinxtabset diff --git a/docs/OV_Runtime_UG/migration_ov_2_0/deployment_migration.md b/docs/OV_Runtime_UG/migration_ov_2_0/deployment_migration.md index 0eb86abd370..9bc193382a8 100644 --- a/docs/OV_Runtime_UG/migration_ov_2_0/deployment_migration.md +++ b/docs/OV_Runtime_UG/migration_ov_2_0/deployment_migration.md @@ -14,7 +14,7 @@ Starting from OpenVINO 2022.1, Model Optimizer, Post-Training Optimization tool The structure of OpenVINO 2022.1 installer package has been organized as below: - The `runtime` folder includes headers, libraries and CMake interfaces. -- The `tools` folder contains [the compile tool](../../../tools/compile_tool/README.md), [deployment manager](../../install_guides/deployment-manager-tool.md) and a set of `requirements.txt` files with links to the corresponding versions of the `openvino-dev` package. +- The `tools` folder contains [the compile tool](../../../tools/compile_tool/README.md), [deployment manager](../../OV_Runtime_UG/deployment/deployment-manager-tool.md) and a set of `requirements.txt` files with links to the corresponding versions of the `openvino-dev` package. - The `python` folder contains the Python version for OpenVINO Runtime. ## Installing OpenVINO Development Tools via PyPI @@ -153,7 +153,7 @@ To build applications without CMake interface, you can also use MSVC IDE, UNIX m ## Clearer Library Structure for Deployment -OpenVINO 2022.1 has reorganized the libraries to make it easier for deployment. In previous versions, to perform deployment steps, you have to use several libraries. Now you can just use `openvino` or `openvino_c` based on your developing language plus necessary plugins to complete your task. For example, `openvino_intel_cpu_plugin` and `openvino_ir_frontend` plugins will enable you to load OpenVINO IRs and perform inference on CPU device. +OpenVINO 2022.1 has reorganized the libraries to make it easier for deployment. In previous versions, to perform deployment steps, you have to use several libraries. Now you can just use `openvino` or `openvino_c` based on your developing language plus necessary plugins to complete your task. For example, `openvino_intel_cpu_plugin` and `openvino_ir_frontend` plugins will enable you to load OpenVINO IRs and perform inference on CPU device (see [Local distribution with OpenVINO](../deployment/local-distribution.md) for more details). Here you can find some detailed comparisons on library structure between OpenVINO 2022.1 and previous versions: diff --git a/docs/OV_Runtime_UG/migration_ov_2_0/intro.md b/docs/OV_Runtime_UG/migration_ov_2_0/intro.md index 5f31ed51913..b824f55c650 100644 --- a/docs/OV_Runtime_UG/migration_ov_2_0/intro.md +++ b/docs/OV_Runtime_UG/migration_ov_2_0/intro.md @@ -1,4 +1,4 @@ -# OpenVINO™ Transition Guide for API 2.0 {#openvino_2_0_transition_guide} +# Transition to OpenVINO™ 2.0 {#openvino_2_0_transition_guide} @sphinxdirective diff --git a/docs/OV_Runtime_UG/migration_ov_2_0/preprocessing.md b/docs/OV_Runtime_UG/migration_ov_2_0/preprocessing.md index a860ac261f6..db0481d792f 100644 --- a/docs/OV_Runtime_UG/migration_ov_2_0/preprocessing.md +++ b/docs/OV_Runtime_UG/migration_ov_2_0/preprocessing.md @@ -25,23 +25,11 @@ In order to utilize preprocessing following imports must be added. Inference Engine API: -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [imports] - -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py imports OpenVINO Runtime API 2.0: -@sphinxdirective - -.. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [ov_imports] - -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py ov_imports There are two different namespaces `runtime`, which contains OpenVINO Runtime API classes and `preprocess` which provides Preprocessing API. @@ -50,153 +38,154 @@ There are two different namespaces `runtime`, which contains OpenVINO Runtime AP Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [mean_scale] +@snippet docs/snippets/ov_preprocessing_migration.cpp mean_scale -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [mean_scale] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py mean_scale + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [ov_mean_scale] +@snippet docs/snippets/ov_preprocessing_migration.cpp ov_mean_scale -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [ov_mean_scale] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py ov_mean_scale + +@endsphinxtab + +@endsphinxtabset ### Precision and layout conversions Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [conversions] +@snippet docs/snippets/ov_preprocessing_migration.cpp conversions -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [conversions] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py conversions + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [ov_conversions] +@snippet docs/snippets/ov_preprocessing_migration.cpp ov_conversions -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [ov_conversions] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py ov_conversions + +@endsphinxtab + +@endsphinxtabset ### Image scaling Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [image_scale] +@snippet docs/snippets/ov_preprocessing_migration.cpp image_scale -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [image_scale] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py image_scale + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [ov_image_scale] +@snippet docs/snippets/ov_preprocessing_migration.cpp ov_image_scale -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [ov_image_scale] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py ov_image_scale + +@endsphinxtab + +@endsphinxtabset ### Color space conversions Inference Engine API: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [color_space] +@snippet docs/snippets/ov_preprocessing_migration.cpp color_space -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [color_space] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing_migration.py color_space + +@endsphinxtab + +@endsphinxtabset OpenVINO Runtime API 2.0: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.cpp - :language: cpp - :fragment: [ov_color_space] +@snippet docs/snippets/ov_preprocessing_migration.cpp ov_color_space -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing_migration.py - :language: python - :fragment: [ov_color_space] +@sphinxtab{Python} + +@snippet docs/snippets/ov_preprocessing_migration.py ov_color_space + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective **See also:** - [Preprocessing details](../preprocessing_details.md) diff --git a/docs/OV_Runtime_UG/model_representation.md b/docs/OV_Runtime_UG/model_representation.md index 4e49108f2fe..593dadc4f24 100644 --- a/docs/OV_Runtime_UG/model_representation.md +++ b/docs/OV_Runtime_UG/model_representation.md @@ -8,7 +8,7 @@ Sinks of the graph have no consumers and are not included in the results vector. Each operation in `ov::Model` has the `std::shared_ptr` type. -For details on how to build a model in OpenVINO™ Runtime, see the [Build a Model in OpenVINO™ Runtime](@ref build_model) section. +For details on how to build a model in OpenVINO™ Runtime, see the [Build a Model in OpenVINO™ Runtime](@ref ov_ug_build_model) section. OpenVINO™ Runtime allows to use different approaches to work with model inputs/outputs: - `ov::Model::inputs()`/`ov::Model::outputs()` methods allow to get vector of all input/output ports. @@ -16,21 +16,21 @@ OpenVINO™ Runtime allows to use different approaches to work with model inputs - Methods `ov::Model::input()` and `ov::Model::output()` can be used with index of input or output from the framework model to get specific port by index. - You can use tensor name of input or output from the original framework model together with methods `ov::Model::input()` or `ov::Model::output()` to get specific port. It means that you don't need to have any additional mapping of names from framework to OpenVINO, as it was before, OpenVINO™ Runtime allows using of native framework tensor names. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [all_inputs_ouputs] +@snippet docs/snippets/ov_model_snippets.cpp all_inputs_ouputs -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [all_inputs_ouputs] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_model_snippets.py all_inputs_ouputs + +@endsphinxtab + +@endsphinxtabset OpenVINO™ Runtime model representation uses special classes to work with model data types and shapes. For data types the `ov::element::Type` is used. @@ -42,21 +42,21 @@ OpenVINO™ Runtime provides two types for shape representation: * `ov::PartialShape` - Represents dynamic shapes. That means that the rank or some of dimensions are dynamic (dimension defines an interval or undefined). `ov::PartialShape` can be converted to `ov::Shape` using the `get_shape()` method if all dimensions are static; otherwise the conversion raises an exception. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:partial_shape] +@snippet docs/snippets/ov_model_snippets.cpp ov:partial_shape -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [ov:partial_shape] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_model_snippets.py ov:partial_shape + +@endsphinxtab + +@endsphinxtabset But in most cases before getting static shape using `get_shape()` method, you need to check that shape is static. @@ -72,7 +72,7 @@ Each OpenVINO™ Release introduces new operations and add these operations to a For a complete list of operation sets supported in OpenVINO™ toolkit, see [Available Operations Sets](../ops/opset.md). To add support of custom operations, see the [Add Custom OpenVINO Operations](../Extensibility_UG/Intro.md) document. -## Build a Model in OpenVINO™ Runtime {#build_model} +## Build a Model in OpenVINO™ Runtime {#ov_ug_build_model} You can create a model from source. This section illustrates how to construct a model composed of operations from an available operation set. @@ -80,78 +80,79 @@ Operation set `opsetX` integrates a list of pre-compiled operations that work To build an `ov::Model` instance from `opset8` operations, include the following files: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:include] +@snippet docs/snippets/ov_model_snippets.cpp ov:include -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [import] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_model_snippets.py import + +@endsphinxtab + +@endsphinxtabset The following code demonstrates how to create a simple model: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:create_simple_model] +@snippet docs/snippets/ov_model_snippets.cpp ov:create_simple_model -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [ov:create_simple_model] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_model_snippets.py ov:create_simple_model + +@endsphinxtab + +@endsphinxtabset The following code creates a model with several outputs: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:create_advanced_model] +@snippet docs/snippets/ov_model_snippets.cpp ov:create_advanced_model -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [ov:create_advanced_model] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_model_snippets.py ov:create_advanced_model + +@endsphinxtab + +@endsphinxtabset ## Model debug capabilities OpenVINO™ provides several debug capabilities: - To receive additional messages about applied model modifications, rebuild the OpenVINO™ Runtime library with the `-DENABLE_OPENVINO_DEBUG=ON` option. - Model can be visualized to image from the xDot format: -@sphinxdirective -.. tab:: C++ + @sphinxtabset - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:visualize] + @sphinxtab{C++} -.. tab:: Python + @snippet docs/snippets/ov_model_snippets.cpp ov:visualize - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [ov:visualize] + @endsphinxtab -@endsphinxdirective + @sphinxtab{Python} + + @snippet docs/snippets/ov_model_snippets.py ov:visualize + + @endsphinxtab + +@endsphinxtabset `ov::pass::VisualizeTree` can be parametrized via environment variables: @@ -163,21 +164,20 @@ OpenVINO™ provides several debug capabilities: OV_VISUALIZE_TREE_MEMBERS_NAME=1 - print member names - Also model can be serialized to IR: -@sphinxdirective -.. tab:: C++ + @sphinxtabset - .. doxygensnippet:: docs/snippets/ov_model_snippets.cpp - :language: cpp - :fragment: [ov:serialize] + @sphinxtab{C++} -.. tab:: Python + @snippet docs/snippets/ov_model_snippets.cpp ov:serialize - .. doxygensnippet:: docs/snippets/ov_model_snippets.py - :language: python - :fragment: [ov:serialize] + @endsphinxtab -@endsphinxdirective + @sphinxtab{Python} + + @snippet docs/snippets/ov_model_snippets.py ov:serialize + + @endsphinxtab ## See Also diff --git a/docs/OV_Runtime_UG/multi_device.md b/docs/OV_Runtime_UG/multi_device.md index 7075294188f..6950b58a729 100644 --- a/docs/OV_Runtime_UG/multi_device.md +++ b/docs/OV_Runtime_UG/multi_device.md @@ -310,11 +310,9 @@ Note that while the performance of accelerators works well with Multi-Device, th Every OpenVINO sample that supports the `-d` (which stands for "device") command-line option transparently accepts Multi-Device. The [Benchmark application](../../tools/benchmark_tool/README.md) is the best reference for the optimal usage of Multi-Device. As discussed earlier, you do not need to set up the number of requests, CPU streams or threads because the application provides optimal performance out of the box. Below is an example command to evaluate CPU+GPU performance with the Benchmark application: ```sh -./benchmark_app.py –d MULTI:CPU,GPU –m +benchmark_app –d MULTI:CPU,GPU –m ``` -> **NOTE**: If you installed OpenVINO with pip, use `benchmark_app -d MULTI:CPU,GPU -m ` - The Multi-Device plugin supports FP16 IR files. The CPU plugin automatically upconverts it to FP32 and the other devices support it natively. Note that no demos are (yet) fully optimized for Multi-Device, by means of supporting the ov::optimal_number_of_infer_requests property, using the GPU streams/throttling, and so on. ### Video: MULTI Plugin diff --git a/docs/OV_Runtime_UG/openvino_intro.md b/docs/OV_Runtime_UG/openvino_intro.md index e5864a5f9d6..a20c0fb2349 100644 --- a/docs/OV_Runtime_UG/openvino_intro.md +++ b/docs/OV_Runtime_UG/openvino_intro.md @@ -1,8 +1,8 @@ -# OpenVINO™ Runtime User Guide {#openvino_docs_OV_Runtime_User_Guide} +# Performing inference with OpenVINO Runtime {#openvino_docs_OV_Runtime_User_Guide} @sphinxdirective -.. _deep learning inference engine: +.. _deep learning openvino runtime: .. toctree:: :maxdepth: 1 @@ -19,8 +19,6 @@ openvino_docs_OV_UG_Performance_Hints openvino_docs_OV_UG_Automatic_Batching openvino_docs_IE_DG_network_state_intro - openvino_docs_OV_Runtime_UG_Python_API_exclusives - openvino_2_0_transition_guide @endsphinxdirective @@ -46,6 +44,6 @@ The scheme below illustrates the typical workflow for deploying a trained deep l - * - **Inference Engine Concept**. Duration: 3:43 + * - **OpenVINO Runtime Concept**. Duration: 3:43 @endsphinxdirective diff --git a/docs/OV_Runtime_UG/ov_dynamic_shapes.md b/docs/OV_Runtime_UG/ov_dynamic_shapes.md index 0208c2fc974..aae2fb06757 100644 --- a/docs/OV_Runtime_UG/ov_dynamic_shapes.md +++ b/docs/OV_Runtime_UG/ov_dynamic_shapes.md @@ -36,6 +36,7 @@ Apply those methods only if native dynamic shape API described in the following The decision about using dynamic shapes should be based on proper benchmarking of real application with real data. That's because unlike statically shaped models, inference of dynamically shaped ones takes different inference time depending on input data shape or input tensor content. +Also using the dynamic shapes can bring more overheads in memory and running time per each inference call depending on hardware plugin and model used. ## Dynamic Shapes without Tricks @@ -51,21 +52,20 @@ To avoid the tricks mentioned in the previous section there is a way to directly This is achieved with the same reshape method that is used for alternating static shape of inputs. Dynamic dimensions are specified as `-1` or `ov::Dimension()` instead of a positive number used for static dimensions: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:reshape_undefined] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:reshape_undefined -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [reshape_undefined] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_dynamic_shapes.py reshape_undefined +@endsphinxtab + +@endsphinxtabset To simplify the code, the examples assume that the model has a single input and single output. However, there are no limitations on the number of inputs and outputs to apply dynamic shapes. @@ -78,10 +78,10 @@ If such a model is converted with Model Optimizer or read directly by Core::read Such dimensions automatically treated as dynamic ones. So you don't need to call reshape if undefined dimensions are already configured in the original model or in the IR file. -If the input model has undefined dimensions that you are not going to change during the inference, you can set them to static values, using the same `reshape` method of the model. +If the input model has undefined dimensions that you are not going to change during the inference, it is recommended to set them to static values, using the same `reshape` method of the model. From the API perspective any combination of dynamic and static dimensions can be configured. -Model Optimizer provides capability to reshape the model during the conversion, including specifying dynamic dimensions. +Model Optimizer provides identical capability to reshape the model during the conversion, including specifying dynamic dimensions. Use this capability to save time on calling `reshape` method in the end application. To get information about setting input shapes using Model Optimizer, refer to [Setting Input Shapes](../MO_DG/prepare_model/convert_model/Converting_Model.md) @@ -90,21 +90,19 @@ To get information about setting input shapes using Model Optimizer, refer to [S Besides marking a dimension just dynamic, you can also specify lower and/or upper bounds that define a range of allowed values for the dimension. Bounds are coded as arguments for `ov::Dimension`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:reshape_bounds] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:reshape_bounds -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [reshape_bounds] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_dynamic_shapes.py reshape_bounds +@endsphinxtab +@endsphinxtabset Information about bounds gives opportunity for the inference plugin to apply additional optimizations. Using dynamic shapes assumes the plugins apply more loose optimization technique during model compilation @@ -114,8 +112,8 @@ For the same reason it is not recommended to leave dimensions as undefined witho When specifying bounds, the lower bound is not so important as upper bound, because knowing of upper bound allows inference devices to more precisely allocate memory for intermediate tensors for inference and use lesser number of tuned kernels for different sizes. Precisely speaking benefits of specifying lower or upper bound is device dependent. -Depending on the plugin specifying upper bounds can be required. -. +Depending on the plugin specifying upper bounds can be required. For information about dynamic shapes support on different devices, see the [Features Support Matrix](@ref features_support_matrix). + If users known lower and upper bounds for dimension it is recommended to specify them even when plugin can execute model without the bounds. ### Setting Input Tensors @@ -124,21 +122,19 @@ Preparing model with the reshape method was the first step. The second step is passing a tensor with an appropriate shape to infer request. This is similar to [regular steps](integrate_with_your_application.md), but now we can pass tensors with different shapes for the same executable model and even for the same inference request: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:set_input_tensor] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:set_input_tensor -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [set_input_tensor] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_dynamic_shapes.py set_input_tensor +@endsphinxtab +@endsphinxtabset In the example above `set_input_tensor` is used to specify input tensors. The real dimensions of the tensor is always static, because it is a concrete tensor and it doesn't have any dimension variations in contrast to model inputs. @@ -149,21 +145,20 @@ Without doing that, the tensor returned by `get_input_tensor` is an empty tensor Setting shape for input tensor is required when the corresponding input has at least one dynamic dimension regardless of bounds information. The following example makes the same sequence of two infer request as the previous example but using `get_input_tensor` instead of `set_input_tensor`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:get_input_tensor] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:get_input_tensor -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [get_input_tensor] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_dynamic_shapes.py get_input_tensor +@endsphinxtab + +@endsphinxtabset ### Dynamic Shapes in Outputs @@ -174,41 +169,40 @@ The same is true for other dimensions, like sequence length for NLP models or sp Whether or not output has dynamic dimensions can be examined by querying output partial shape after model read or reshape. The same is applicable for inputs. For example: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:print_dynamic] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:print_dynamic -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [print_dynamic] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_dynamic_shapes.py print_dynamic +@endsphinxtab + +@endsphinxtabset Appearing `?` or ranges like `1..10` means there are dynamic dimensions in corresponding inputs or outputs. Or more programmatically: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.cpp - :language: cpp - :fragment: [ov_dynamic_shapes:detect_dynamic] +@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:detect_dynamic -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_dynamic_shapes.py - :language: python - :fragment: [detect_dynamic] +@sphinxtab{Python} + +@snippet docs/snippets/ov_dynamic_shapes.py detect_dynamic +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective If at least one dynamic dimension exists in output of the model, shape of the corresponding output tensor will be set as the result of inference call. Before the first inference, memory for such a tensor is not allocated and has shape `[0]`. diff --git a/docs/OV_Runtime_UG/ov_infer_request.md b/docs/OV_Runtime_UG/ov_infer_request.md index 2574f907320..157c9f3ac9d 100644 --- a/docs/OV_Runtime_UG/ov_infer_request.md +++ b/docs/OV_Runtime_UG/ov_infer_request.md @@ -8,21 +8,21 @@ This class allows to set and get data for model inputs, outputs and run inferenc `ov::InferRequest` can be created from the `ov::CompiledModel`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [create_infer_request] +@snippet docs/snippets/ov_infer_request.cpp create_infer_request -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [create_infer_request] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py create_infer_request + +@endsphinxtab + +@endsphinxtabset ## Run inference @@ -32,188 +32,195 @@ This class allows to set and get data for model inputs, outputs and run inferenc You can use `ov::InferRequest::infer`, which blocks the application execution, to infer model in the synchronous mode: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [sync_infer] +@snippet docs/snippets/ov_infer_request.cpp sync_infer -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [sync_infer] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py sync_infer + +@endsphinxtab + +@endsphinxtabset ### Asynchronous mode Asynchronous mode can improve application's overall frame-rate, because rather than wait for inference to complete, the app can keep working on the host, while the accelerator is busy. You can use `ov::InferRequest::start_async` to infer model in the asynchronous mode: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [async_infer] +@snippet docs/snippets/ov_infer_request.cpp async_infer -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [async_infer] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py async_infer + +@endsphinxtab + +@endsphinxtabset Asynchronous mode supports two ways the application waits for inference results: * `ov::InferRequest::wait_for` - specifies the maximum duration in milliseconds to block the method. The method is blocked until the specified time has passed, or the result becomes available, whichever comes first. -@sphinxdirective + @sphinxtabset -.. tab:: C++ + @sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [wait_for] + @snippet docs/snippets/ov_infer_request.cpp wait_for -.. tab:: Python + @endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [wait_for] + @sphinxtab{Python} + + @snippet docs/snippets/ov_infer_request.py wait_for + + @endsphinxtab + + @endsphinxtabset -@endsphinxdirective * `ov::InferRequest::wait` - waits until inference result becomes available -@sphinxdirective + @sphinxtabset -.. tab:: C++ + @sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [wait] + @snippet docs/snippets/ov_infer_request.cpp wait -.. tab:: Python + @endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [wait] + @sphinxtab{Python} -@endsphinxdirective + @snippet docs/snippets/ov_infer_request.py wait + + @endsphinxtab + + @endsphinxtabset Both methods are thread-safe. When you are running several inference requests in parallel, a device can process them simultaneously, with no garauntees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait` (unless your code needs to wait for the _all_ requests). For multi-request scenarios, consider using the `ov::InferRequest::set_callback` method to set a callback which is called upon completion of the request: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [set_callback] +@snippet docs/snippets/ov_infer_request.cpp set_callback -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [set_callback] +@sphinxtab{Python} + +@snippet docs/snippets/ov_infer_request.py set_callback + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective > **NOTE**: Use weak reference of infer_request (`ov::InferRequest*`, `ov::InferRequest&`, `std::weal_ptr`, etc.) in the callback. It is necessary to avoid cyclic references. For more details, check [Classification Sample Async](../../samples/cpp/classification_sample_async/README.md). You can use the `ov::InferRequest::cancel` method if you want to abort execution of the current inference request: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [cancel] +@snippet docs/snippets/ov_infer_request.cpp cancel -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [cancel] +@sphinxtab{Python} + +@snippet docs/snippets/ov_infer_request.py cancel + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective ## Working with Input and Output tensors `ov::InferRequest` allows to get input/output tensors by tensor name, index, port and without any arguments in case if model has only one input or output. * `ov::InferRequest::get_input_tensor`, `ov::InferRequest::set_input_tensor`, `ov::InferRequest::get_output_tensor`, `ov::InferRequest::set_output_tensor` methods without arguments can be used to get or set input/output tensor for model with only one input/output: -@sphinxdirective -.. tab:: C++ + @sphinxtabset - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [get_set_one_tensor] + @sphinxtab{C++} -.. tab:: Python + @snippet docs/snippets/ov_infer_request.cpp get_set_one_tensor - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [get_set_one_tensor] + @endsphinxtab -@endsphinxdirective + @sphinxtab{Python} + + @snippet docs/snippets/ov_infer_request.py get_set_one_tensor + + @endsphinxtab + + @endsphinxtabset * `ov::InferRequest::get_input_tensor`, `ov::InferRequest::set_input_tensor`, `ov::InferRequest::get_output_tensor`, `ov::InferRequest::set_output_tensor` methods with argument can be used to get or set input/output tensor by input/output index: -@sphinxdirective + + @sphinxtabset -.. tab:: C++ + @sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [get_set_index_tensor] + @snippet docs/snippets/ov_infer_request.cpp get_set_index_tensor -.. tab:: Python + @endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [get_set_index_tensor] + @sphinxtab{Python} -@endsphinxdirective + @snippet docs/snippets/ov_infer_request.py get_set_index_tensor + + @endsphinxtab + + @endsphinxtabset * `ov::InferRequest::get_tensor`, `ov::InferRequest::set_tensor` methods can be used to get or set input/output tensor by tensor name: -@sphinxdirective -.. tab:: C++ + @sphinxtabset - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [get_set_tensor] + @sphinxtab{C++} -.. tab:: Python + @snippet docs/snippets/ov_infer_request.cpp get_set_tensor - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [get_set_tensor] + @endsphinxtab -@endsphinxdirective + @sphinxtab{Python} + + @snippet docs/snippets/ov_infer_request.py get_set_tensor + + @endsphinxtab + + @endsphinxtabset * `ov::InferRequest::get_tensor`, `ov::InferRequest::set_tensor` methods can be used to get or set input/output tensor by port: -@sphinxdirective -.. tab:: C++ + @sphinxtabset - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [get_set_tensor_by_port] + @sphinxtab{C++} -.. tab:: Python + @snippet docs/snippets/ov_infer_request.cpp get_set_tensor_by_port - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [get_set_tensor_by_port] + @endsphinxtab -@endsphinxdirective + @sphinxtab{Python} + + @snippet docs/snippets/ov_infer_request.py get_set_tensor_by_port + + @endsphinxtab + + @endsphinxtabset ## Examples of InferRequest usages @@ -222,58 +229,58 @@ You can use the `ov::InferRequest::cancel` method if you want to abort execution `ov::InferRequest` can be used to organize cascade of models. You need to have infer requests for each model. In this case you can get output tensor from the first request using `ov::InferRequest::get_tensor` and set it as input for the second request using `ov::InferRequest::set_tensor`. But be careful, shared tensors across compiled models can be rewritten by the first model if the first infer request is run once again, while the second model has not started yet. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [cascade_models] +@snippet docs/snippets/ov_infer_request.cpp cascade_models -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [cascade_models] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py cascade_models + +@endsphinxtab + +@endsphinxtabset ### Using of ROI tensors It is possible to re-use shared input by several models. You do not need to allocate separate input tensor for a model if it processes a ROI object located inside of already allocated input of a previous model. For instance, when the first model detects objects in a video frame (stored as input tensor) and the second model accepts detected bounding boxes (ROI inside of the frame) as input. In this case, it is allowed to re-use pre-allocated input tensor (used by the first model) by the second model and just crop ROI without allocation of new memory using `ov::Tensor` with passing of `ov::Tensor` and `ov::Coordinate` as parameters. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [roi_tensor] +@snippet docs/snippets/ov_infer_request.cpp roi_tensor -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [roi_tensor] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py roi_tensor + +@endsphinxtab + +@endsphinxtabset ### Using of remote tensors You can create a remote tensor to work with remote device memory. `ov::RemoteContext` allows to create remote tensor. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_infer_request.cpp - :language: cpp - :fragment: [remote_tensor] +@snippet docs/snippets/ov_infer_request.cpp remote_tensor -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_infer_request.py - :language: python - :fragment: [remote_tensor] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_infer_request.py remote_tensor + +@endsphinxtab + +@endsphinxtabset diff --git a/docs/OV_Runtime_UG/performance_hints.md b/docs/OV_Runtime_UG/performance_hints.md index 5e81921854b..051d8d66df4 100644 --- a/docs/OV_Runtime_UG/performance_hints.md +++ b/docs/OV_Runtime_UG/performance_hints.md @@ -1,6 +1,6 @@ # High-level Performance Hints {#openvino_docs_OV_UG_Performance_Hints} -Each of the OpenVINO's [supported devices](supported_plugins/Supported_Devices.md) offers low-level performance settings. Tweaking this detailed configuration requires deep architecture understanding. +Each of the OpenVINO's [supported devices](supported_plugins/Device_Plugins.md) offers low-level performance settings. Tweaking this detailed configuration requires deep architecture understanding. Also, while the performance may be optimal for the specific combination of the device and the inferred model, the resulting configuration is not necessarily optimal for another device or model. The OpenVINO performance hints are the new way to configure the performance with the _portability_ in mind. diff --git a/docs/OV_Runtime_UG/preprocessing_details.md b/docs/OV_Runtime_UG/preprocessing_details.md index b7fa4e97161..5df0ebe3015 100644 --- a/docs/OV_Runtime_UG/preprocessing_details.md +++ b/docs/OV_Runtime_UG/preprocessing_details.md @@ -6,58 +6,59 @@ If your model has only one input, then simple ov::preprocess::PrePostProcessor::input() will get a reference to preprocessing builder for this input (tensor, steps, model): -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:input_1] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:input_1 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:input_1] +@sphinxtab{Python} + +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:input_1 + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective In general, when model has multiple inputs/outputs, each one can be addressed by tensor name -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:input_name] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:input_name -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:input_name] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:input_name + +@endsphinxtab + +@endsphinxtabset Or by it's index -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:input_index] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:input_index -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:input_index] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:input_index + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::InputTensorInfo @@ -74,40 +75,39 @@ C++ references: Typical data normalization includes 2 operations for each data item: subtract mean value and divide to standard deviation. This can be done with the following code: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:mean_scale] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:mean_scale -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:mean_scale] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:mean_scale +@endsphinxtab + +@endsphinxtabset In Computer Vision area normalization is usually done separately for R, G, B values. To do this, [layout with 'C' dimension](./layout_overview.md) shall be defined. Example: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:mean_scale_array] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:mean_scale_array -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:mean_scale_array] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:mean_scale_array + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::PreProcessSteps::mean() @@ -120,21 +120,21 @@ In Computer Vision, image is represented by array of unsigned 8-but integer valu To integrate precision conversion into execution graph as a preprocessing step, just do: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:convert_element_type] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:convert_element_type -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:convert_element_type] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:convert_element_type + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::InputTensorInfo::set_element_type() @@ -147,39 +147,40 @@ Transposing of matrices/tensors is a typical operation in Deep Learning - you ma Using [layout](./layout_overview.md) of user's tensor and layout of original model conversion can be done implicitly -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:convert_layout] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:convert_layout - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:convert_layout] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Python} + +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:convert_layout + +@endsphinxtab + +@endsphinxtabset Or if you prefer manual transpose of axes without usage of [layout](./layout_overview.md) in your code, just do: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:convert_layout_2] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:convert_layout_2 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:convert_layout_2] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:convert_layout_2 + +@endsphinxtab + +@endsphinxtabset It performs the same transpose, but we believe that approach using source and destination layout can be easier to read and understand @@ -195,39 +196,39 @@ Resizing of image is a typical preprocessing step for computer vision tasks. Wit To resize the input image, it is needed to define `H` and `W` dimensions of [layout](./layout_overview.md) -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:resize_1] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:resize_1 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:resize_1] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:resize_1 + +@endsphinxtab + +@endsphinxtabset Or in case if original model has known spatial dimensions (widht+height), target width/height can be omitted -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:resize_2] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:resize_2 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:resize_2] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:resize_2 + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::PreProcessSteps::resize() @@ -238,41 +239,41 @@ C++ references: Typical use case is to reverse color channels from RGB to BGR and wise versa. To do this, specify source color format in `tensor` section and perform `convert_color` preprocessing operation. In example below, user has `BGR` image and needs to convert it to `RGB` as required for model's input -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:convert_color_1] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:convert_color_1 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:convert_color_1] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:convert_color_1 + +@endsphinxtab + +@endsphinxtabset #### Color conversion - NV12/I420 Preprocessing also support YUV-family source color formats, i.e. NV12 and I420. In advanced cases such YUV images can be splitted into separate planes, e.g. for NV12 images Y-component may come from one source and UV-component comes from another source. Concatenating such components in user's application manually is not a perfect solution from performance and device utilization perspectives, so there is a way to use Preprocessing API. For such cases there is `NV12_TWO_PLANES` and `I420_THREE_PLANES` source color formats, which will split original `input` to 2 or 3 inputs -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:convert_color_2] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:convert_color_2 -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:convert_color_2] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:convert_color_2 + +@endsphinxtab + +@endsphinxtabset In this example, original `input` is being split to `input/y` and `input/uv` inputs. You can fill `input/y` from one source, and `input/uv` from another source. Color conversion to `RGB` will be performed using these sources, it is more optimal as there will be no additional copies of NV12 buffers. @@ -289,21 +290,21 @@ Preprocessing API also allows adding custom preprocessing steps into execution g If there is a need to insert some additional operations to execution graph right after input, like some specific crops and/or resizes - Preprocessing API can be a good choice to implement this -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:custom] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:custom -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:custom] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:custom + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::PreProcessSteps::custom() @@ -324,21 +325,21 @@ Comparing to preprocessing, there is not so much operations needed to do in post Usage of these operations is similar to Preprocessing. Some example is shown below: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:postprocess] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:postprocess -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:postprocess] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:postprocess + +@endsphinxtab + +@endsphinxtabset C++ references: * ov::preprocess::PostProcessSteps diff --git a/docs/OV_Runtime_UG/preprocessing_overview.md b/docs/OV_Runtime_UG/preprocessing_overview.md index 2b5c5ee0be8..ccbced19b4f 100644 --- a/docs/OV_Runtime_UG/preprocessing_overview.md +++ b/docs/OV_Runtime_UG/preprocessing_overview.md @@ -1,4 +1,4 @@ -# Overview of Preprocessing API {#openvino_docs_OV_Runtime_UG_Preprocessing_Overview} +# Optimize Preprocessing {#openvino_docs_OV_Runtime_UG_Preprocessing_Overview} @sphinxdirective @@ -45,42 +45,41 @@ Intuitively, Preprocessing API consists of the following parts: `ov::preprocess::PrePostProcessor` class allows specifying preprocessing and postprocessing steps for model read from disk. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:create] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:create -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:create] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:create + +@endsphinxtab + +@endsphinxtabset ### Declare user's data format To address particular input of model/preprocessor, use `ov::preprocess::PrePostProcessor::input(input_name)` method -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:tensor] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:tensor -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:tensor] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:tensor +@endsphinxtab + +@endsphinxtabset Here we've specified all information about user's input: - Precision is U8 (unsigned 8-bit integer) @@ -92,21 +91,21 @@ Here we've specified all information about user's input: Model's input already has information about precision and shape. Preprocessing API is not intended to modify this. The only thing that may be specified is input's data [layout](./layout_overview.md) -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:model] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:model -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:model] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:model + +@endsphinxtab + +@endsphinxtabset Now, if model's input has `{1,3,224,224}` shape, preprocessing will be able to identify that model's `height=224`, `width=224`, `channels=3`. Height/width information is necessary for 'resize', and `channels` is needed for mean/scale normalization @@ -115,21 +114,21 @@ Now, if model's input has `{1,3,224,224}` shape, preprocessing will be able to i Now we can define sequence of preprocessing steps: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:steps] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:steps -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:steps] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:steps + +@endsphinxtab + +@endsphinxtabset Here: - Convert U8 to FP32 precision @@ -143,21 +142,21 @@ Here: We've finished with preprocessing steps declaration, now it is time to build it. For debugging purposes it is possible to print `PrePostProcessor` configuration on screen: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:build] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:build -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:build] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:build + +@endsphinxtab + +@endsphinxtabset After this, `model` will accept U8 input with `{1, 480, 640, 3}` shape, with `BGR` channels order. All conversion steps will be integrated into execution graph. Now you can load model on device and pass your image to model as is, without any data manipulation on application's side diff --git a/docs/OV_Runtime_UG/preprocessing_usecase_save.md b/docs/OV_Runtime_UG/preprocessing_usecase_save.md index b0f5c023cd3..ebea93da552 100644 --- a/docs/OV_Runtime_UG/preprocessing_usecase_save.md +++ b/docs/OV_Runtime_UG/preprocessing_usecase_save.md @@ -18,59 +18,61 @@ In case if you have some preprocessing steps which can't be integrated into exec Let's consider the example, there is an original `ONNX` model which takes one `float32` input with shape `{1, 3, 224, 224}` with `RGB` channels order, with mean/scale values applied. User's application can provide `BGR` image buffer with not fixed size. Additionally, we'll also imagine that our application provides input images as batches, each batch contains 2 images. Here is how model conversion code may look like in your model preparation script - Includes / Imports -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:save_headers] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:save_headers - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:save_headers] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Python} + +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:save_headers + +@endsphinxtab + +@endsphinxtabset - Preprocessing & Saving to IR code -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:save] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:save - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:save] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Python} + +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:save + +@endsphinxtab + +@endsphinxtabset ## Application code - load model to target device After this, your application's code can load saved file and don't perform preprocessing anymore. In this example we'll also enable [model caching](./Model_caching_overview.md) to minimize load time when cached model is available -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_preprocessing.cpp - :language: cpp - :fragment: [ov:preprocess:save_load] +@snippet docs/snippets/ov_preprocessing.cpp ov:preprocess:save_load -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_preprocessing.py - :language: python - :fragment: [ov:preprocess:save_load] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_preprocessing.py ov:preprocess:save_load + +@endsphinxtab + +@endsphinxtabset ## See Also diff --git a/docs/OV_Runtime_UG/protecting_model_guide.md b/docs/OV_Runtime_UG/protecting_model_guide.md index 222bdb90ffc..84b7e401ac5 100644 --- a/docs/OV_Runtime_UG/protecting_model_guide.md +++ b/docs/OV_Runtime_UG/protecting_model_guide.md @@ -52,7 +52,6 @@ should be called with `weights` passed as an empty `ov::Tensor`. ## Additional Resources - Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) -- OpenVINO™ toolkit online documentation: [https://docs.openvino.ai](https://docs.openvino.ai) - Model Optimizer Developer Guide: [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) - [OpenVINO™ runTime User Guide](openvino_intro.md) - For more information on Sample Applications, see the [OpenVINO Samples Overview](Samples_Overview.md) diff --git a/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md b/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md index a2546b01e56..2c457ffd072 100644 --- a/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md +++ b/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md @@ -69,7 +69,6 @@ All major performance calls of both OpenVINO™ Runtime and the AUTO plugin are @endsphinxdirective For more information, you can refer to: -* [OpenVINO profiling](https://docs.openvino.ai/latest/groupie_dev_profiling.html) * [Intel® VTune™ Profiler User Guide](https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis.html) ### Analyze Code Performance on Linux diff --git a/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md b/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md index 2e79c9a7bc7..9c31d027c14 100644 --- a/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md +++ b/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md @@ -36,21 +36,21 @@ OpenVINO runtime also has several execution capabilities which work on top of ot Devices similar to the ones we have used for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/). - -## Features support matrix +@anchor features_support_matrix +## Features Support Matrix The table below demonstrates support of key features by OpenVINO device plugins. -| Capability | [CPU](CPU.md) | [GPU](GPU.md) | [GNA](GNA.md) | [VPU](VPU.md) | [Arm® CPU](ARM_CPU.md) | -| ---------- | --- | --- | --- | --- | --- | -| [Heterogeneous execution](../hetero_execution.md)| Yes | Yes | No | ? | Yes | -| [Multi-device execution](../multi_device.md) | Yes | Yes | Partial | ? | Yes | -| [Automatic batching](../automatic_batching.md) | No | Yes | No | ? | No | -| [Multi-stream execution](@ref openvino_docs_optimization_guide_dldt_optimization_guide) | Yes | Yes | No | ? | Yes | -| [Models caching](../Model_caching_overview.md) | Yes | Partial | Yes | ? | No | -| [Dynamic shapes](../ov_dynamic_shapes.md) | Yes | Partial | No | ? | No | -| Import/Export | Yes | No | Yes | ? | No | -| [Preprocessing acceleration](../preprocessing_overview.md) | Yes | Yes | No | ? | Partial | -| [Stateful models](../network_state_intro.md) | Yes | No | Yes | ? | No | -| [Extensibility](@ref openvino_docs_Extensibility_UG_Intro) | Yes | Yes | No | ? | No | +| Capability | [CPU](CPU.md) | [GPU](GPU.md) | [GNA](GNA.md) |[Arm® CPU](ARM_CPU.md) | +| ---------- | --- | --- | --- | --- | +| [Heterogeneous execution](../hetero_execution.md)| Yes | Yes | No | Yes | +| [Multi-device execution](../multi_device.md) | Yes | Yes | Partial | Yes | +| [Automatic batching](../automatic_batching.md) | No | Yes | No | No | +| [Multi-stream execution](../../optimization_guide/dldt_deployment_optimization_tput.md) | Yes | Yes | No | Yes | +| [Models caching](../Model_caching_overview.md) | Yes | Partial | Yes | No | +| [Dynamic shapes](../ov_dynamic_shapes.md) | Yes | Partial | No | No | +| [Import/Export](../../../tools/compile_tool/README.md) | Yes | No | Yes | No | +| [Preprocessing acceleration](../preprocessing_overview.md) | Yes | Yes | No | Partial | +| [Stateful models](../network_state_intro.md) | Yes | No | Yes | No | +| [Extensibility](@ref openvino_docs_Extensibility_UG_Intro) | Yes | Yes | No | No | For more details on plugin specific feature limitation, see corresponding plugin pages. diff --git a/docs/OV_Runtime_UG/supported_plugins/GNA.md b/docs/OV_Runtime_UG/supported_plugins/GNA.md index af7f7f16ee0..edb907578ef 100644 --- a/docs/OV_Runtime_UG/supported_plugins/GNA.md +++ b/docs/OV_Runtime_UG/supported_plugins/GNA.md @@ -63,28 +63,23 @@ Starting with 2021.4.1 release of OpenVINO and 03.00.00.1363 version of Windows* to assure that workloads satisfy real-time execution. In this mode, the GNA driver automatically falls back on CPU for a particular infer request if the HW queue is not empty, so there is no need for explicitly switching between GNA and CPU. -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/gna/configure.cpp - :language: cpp - :fragment: [include] +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/gna/configure.cpp - :language: cpp - :fragment: [ov_gna_exec_mode_hw_with_sw_fback] +@snippet docs/snippets/gna/configure.cpp include +@snippet docs/snippets/gna/configure.cpp ov_gna_exec_mode_hw_with_sw_fback -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/gna/configure.py - :language: python - :fragment: [import] +@sphinxtab{Python} - .. doxygensnippet:: docs/snippets/gna/configure.py - :language: python - :fragment: [ov_gna_exec_mode_hw_with_sw_fback] +@snippet docs/snippets/gna/configure.py import +@snippet docs/snippets/gna/configure.py ov_gna_exec_mode_hw_with_sw_fback -@endsphinxdirective +@endsphinxtab + +@endsphinxtabset > **NOTE**: Due to the "first come - first served" nature of GNA driver and the QoS feature, this mode may lead to increased CPU consumption if there are several clients using GNA simultaneously. @@ -125,37 +120,39 @@ The GNA plugin supports import/export capability which helps to significantly de If you are willing to export a model for a specific version of GNA HW, please use the `ov::intel_gna::compile_target` property and then export the model: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/gna/import_export.cpp - :language: cpp - :fragment: [ov_gna_export] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/gna/import_export.cpp ov_gna_export - .. doxygensnippet:: docs/snippets/gna/import_export.py - :language: python - :fragment: [ov_gna_export] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Python} + +@snippet docs/snippets/gna/import_export.py ov_gna_export + +@endsphinxtab + +@endsphinxtabset Import model: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/gna/import_export.cpp - :language: cpp - :fragment: [ov_gna_import] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/gna/import_export.cpp ov_gna_import - .. doxygensnippet:: docs/snippets/gna/import_export.py - :language: python - :fragment: [ov_gna_import] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Python} + +@snippet docs/snippets/gna/import_export.py ov_gna_import + +@endsphinxtab + +@endsphinxtabset [Compile Tool](@ref openvino_inference_engine_tools_compile_tool_README) or [Speech C++ Sample](@ref openvino_inference_engine_samples_speech_sample_README) can be used to compile model. @@ -287,47 +284,45 @@ Intel® GNA plugin supports the processing of context-windowed speech frames in Please refer to [Layout API overview](@ref openvino_docs_OV_Runtime_UG_Layout_Overview) to determine batch dimension. -To set layout of model inputs in runtime use [Preprocessing API](@ref openvino_docs_OV_Runtime_UG_Preprocessing_Overview): +To set layout of model inputs in runtime use [Optimize Preprocessing](@ref openvino_docs_OV_Runtime_UG_Preprocessing_Overview) guide: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/gna/set_batch.cpp - :language: cpp - :fragment: [include] +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/gna/set_batch.cpp - :language: cpp - :fragment: [ov_gna_set_nc_layout] +@snippet docs/snippets/gna/set_batch.cpp include +@snippet docs/snippets/gna/set_batch.cpp ov_gna_set_nc_layout -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/gna/set_batch.py - :language: python - :fragment: [import] +@sphinxtab{Python} - .. doxygensnippet:: docs/snippets/gna/set_batch.py - :language: python - :fragment: [ov_gna_set_nc_layout] +@snippet docs/snippets/gna/set_batch.py import +@snippet docs/snippets/gna/set_batch.py ov_gna_set_nc_layout + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective then set batch size: -@sphinxdirective -.. tab:: C++ +@sphinxtabset - .. doxygensnippet:: docs/snippets/gna/set_batch.cpp - :language: cpp - :fragment: [ov_gna_set_batch_size] +@sphinxtab{C++} -.. tab:: Python +@snippet docs/snippets/gna/set_batch.cpp ov_gna_set_batch_size - .. doxygensnippet:: docs/snippets/gna/set_batch.py - :language: python - :fragment: [ov_gna_set_batch_size] +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/gna/set_batch.py ov_gna_set_batch_size + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective Increasing batch size only improves efficiency of `MatMul` layers. diff --git a/docs/OV_Runtime_UG/supported_plugins/GPU.md b/docs/OV_Runtime_UG/supported_plugins/GPU.md index 7099ccc307b..645a0e937ea 100644 --- a/docs/OV_Runtime_UG/supported_plugins/GPU.md +++ b/docs/OV_Runtime_UG/supported_plugins/GPU.md @@ -44,27 +44,27 @@ Available devices: Then device name can be passed to `ov::Core::compile_model()` method: -@sphinxdirective +@sphinxtabset -.. tab:: Running on default device +@sphinxtab{Running on default device} - .. doxygensnippet:: docs/snippets/gpu/compile_model.cpp - :language: cpp - :fragment: [compile_model_default_gpu] +@snippet docs/snippets/gpu/compile_model.cpp compile_model_default_gpu -.. tab:: Running on specific GPU +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/compile_model.cpp - :language: cpp - :fragment: [compile_model_gpu_with_id] +@sphinxtab{Running on specific GPU} -.. tab:: Running on specific tile +@snippet docs/snippets/gpu/compile_model.cpp compile_model_gpu_with_id - .. doxygensnippet:: docs/snippets/gpu/compile_model.cpp - :language: cpp - :fragment: [compile_model_gpu_with_id_and_tile] +@endsphinxtab -@endsphinxdirective +@sphinxtab{Running on specific tile} + +@snippet docs/snippets/gpu/compile_model.cpp compile_model_gpu_with_id_and_tile + +@endsphinxtab + +@endsphinxtabset ## Supported inference data types GPU plugin supports the following data types as inference precision of internal primitives: @@ -102,21 +102,21 @@ GPU plugin is capable of reporting `ov::max_batch_size` and `ov::optimal_batch_s thus automatic batching is automatically enabled when `ov::optimal_batch_size` is > 1 and `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` is set. Alternatively it can be enabled explicitly via the device notion, e.g. `"BATCH:GPU"`. -@sphinxdirective +@sphinxtabset -.. tab:: Batching via BATCH plugin +@sphinxtab{Batching via BATCH plugin} - .. doxygensnippet:: docs/snippets/gpu/compile_model.cpp - :language: cpp - :fragment: [compile_model_batch_plugin] +@snippet docs/snippets/gpu/compile_model.cpp compile_model_batch_plugin -.. tab:: Batching via throughput hint +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/compile_model.cpp - :language: cpp - :fragment: [compile_model_auto_batch] +@sphinxtab{Bacthing via throughput hint} -@endsphinxdirective +@snippet docs/snippets/gpu/compile_model.cpp compile_model_auto_batch + +@endsphinxtab + +@endsphinxtabset See [Automatic batching page](../automatic_batching.md) for more details. diff --git a/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md b/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md index 638f4735293..fa98c5f9cac 100644 --- a/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md +++ b/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md @@ -1,5 +1,4 @@ -Remote Tensor API of GPU Plugin {#openvino_docs_OV_UG_supported_plugins_GPU_RemoteTensor_API} -================================ +# Remote Tensor API of GPU Plugin {#openvino_docs_OV_UG_supported_plugins_GPU_RemoteTensor_API} The GPU plugin implementation of the `ov::RemoteContext` and `ov::RemoteTensor` interfaces supports GPU pipeline developers who need video memory sharing and interoperability with existing native APIs @@ -38,49 +37,59 @@ additional parameter. To create `ov::RemoteContext` object for user context, explicitly provide the context to the plugin using constructor for one of `ov::RemoteContext` derived classes. -@sphinxdirective +@sphinxtabset -.. tab:: Linux +@sphinxtab{Linux} - .. tab:: Create from cl_context +@sphinxtabset - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_cl_context] +@sphinxtab{Create from cl_context} - .. tab:: Create from cl_queue +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_cl_queue] +@endsphinxtab - .. tab:: Create from VADisplay +@sphinxtab{Create from cl_queue} - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_va_display] +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue -.. tab:: Windows +@endsphinxtab - .. tab:: Create from cl_context +@sphinxtab{Create from VADisplay} - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_cl_context] +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_va_display - .. tab:: Create from cl_queue +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_cl_queue] +@endsphinxtabset - .. tab:: Create from ID3D11Device +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [context_from_d3d_device] +@sphinxtab{Windows} -@endsphinxdirective +@sphinxtabset + +@sphinxtab{Create from cl_context} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context + +@endsphinxtab + +@sphinxtab{Create from cl_queue} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue + +@endsphinxtab + +@sphinxtab{Create from ID3D11Device} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_d3d_device + +@endsphinxtab + +@endsphinxtabset + +@endsphinxtabset ### Getting RemoteContext from the plugin @@ -91,22 +100,21 @@ Once the plugin options are changed, the internal context is replaced by the new To request the current default context of the plugin use one of the following methods: -@sphinxdirective +@sphinxtabset -.. tab:: Get context from Core +@sphinxtab{Get context from Core} - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [default_context_from_core] +@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_core -.. tab:: Get context from CompiledModel +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [default_context_from_model] +@sphinxtab{Bacthing via throughput hint} +@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_model -@endsphinxdirective +@endsphinxtab + +@endsphinxtabset ## Memory sharing between application and GPU plugin @@ -118,61 +126,72 @@ of the `ov::RemoteContext` sub-classes. `ov::intel_gpu::ocl::ClContext` has multiple overloads of `create_tensor` methods which allow to wrap pre-allocated native handles with `ov::RemoteTensor` object or request plugin to allocate specific device memory. See code snippets below for more details. -@sphinxdirective +@sphinxtabset -.. tab:: Wrap native handles +@sphinxtab{Wrap native handles} - .. tab:: USM pointer +@sphinxtabset - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [wrap_usm_pointer] +@sphinxtab{USM pointer} - .. tab:: cl_mem +@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_usm_pointer - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [wrap_cl_mem] +@endsphinxtab - .. tab:: cl::Buffer +@sphinxtab{cl_mem} - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [wrap_cl_buffer] +@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_mem - .. tab:: cl::Image2D +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [wrap_cl_image] +@sphinxtab{cl::Buffer} - .. tab:: biplanar NV12 surface +@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_buffer - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [wrap_nv12_surface] +@endsphinxtab -.. tab:: Allocate device memory +@sphinxtab{cl::Image2D} - .. tab:: USM host memory +@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_image - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [allocate_usm_host] +@endsphinxtab - .. tab:: USM device memory +@sphinxtab{biplanar NV12 surface} - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [allocate_usm_device] +@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_nv12_surface - .. tab:: cl::Buffer +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp - :language: cpp - :fragment: [allocate_cl_buffer] +@endsphinxtabset +@endsphinxtab -@endsphinxdirective +@sphinxtab{Allocate device memory} + +@sphinxtabset + +@sphinxtab{USM host memory} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_host + +@endsphinxtab + +@sphinxtab{USM device memory} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_device + +@endsphinxtab + +@sphinxtab{cl::Buffer} + +@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_cl_buffer + +@endsphinxtab + +@endsphinxtabset + +@endsphinxtab + +@endsphinxtabset `ov::intel_gpu::ocl::D3DContext` and `ov::intel_gpu::ocl::VAContext` classes are derived from `ov::intel_gpu::ocl::ClContext`, thus they provide functionality described above and extends it @@ -192,22 +211,22 @@ should be added before model compilation: Since `ov::intel_gpu::ocl::ClImage2DTensor` (and derived classes) doesn't support batched surfaces, in cases when batching and surface sharing are required at the same time, user need to set inputs via `ov::InferRequest::set_tensors` method with vector of shared surfaces for each plane: -@sphinxdirective +@sphinxtabset -.. tab:: Single batch +@sphinxtab{Single batch} - .. doxygensnippet:: docs/snippets/gpu/preprocessing.cpp - :language: cpp - :fragment: [single_batch] +@snippet docs/snippets/gpu/preprocessing.cpp single_batch -.. tab:: Multiple batches +@endsphinxtab - .. doxygensnippet:: docs/snippets/gpu/preprocessing.cpp - :language: cpp - :fragment: [batched_case] +@sphinxtab{Multiple batches} +@snippet docs/snippets/gpu/preprocessing.cpp batched_case + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective I420 color format can be processed in similar way diff --git a/docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md b/docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md index 9661eb86c3e..ec7afd53fd9 100644 --- a/docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md +++ b/docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md @@ -20,20 +20,6 @@ The OpenVINO Runtime provides unique capabilities to infer deep learning models Devices similar to the ones we have used for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/). -The table below shows the plugin libraries and additional dependencies for Linux, Windows and macOS platforms. - -| Plugin | Library name for Linux | Dependency libraries for Linux | Library name for Windows | Dependency libraries for Windows | Library name for macOS | Dependency libraries for macOS | -|--------|-----------------------------|-------------------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------|------------------------------|---------------------------------------------| -| CPU | `libopenvino_intel_cpu_plugin.so` | | `openvino_intel_cpu_plugin.dll` | | `libopenvino_intel_cpu_plugin.so` | | -| GPU | `libopenvino_intel_gpu_plugin.so` | `libOpenCL.so` | `openvino_intel_gpu_plugin.dll` | `OpenCL.dll` | Is not supported | - | -| MYRIAD | `libopenvino_intel_myriad_plugin.so` | `libusb.so` | `openvino_intel_myriad_plugin.dll`| `usb.dll` | `libopenvino_intel_myriad_plugin.so` | `libusb.dylib` | -| HDDL | `libintel_hddl_plugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so` | `intel_hddl_plugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll` | Is not supported | - | -| GNA | `libopenvino_intel_gna_plugin.so` | `libgna.so`, | `openvino_intel_gna_plugin.dll` | `gna.dll` | Is not supported | - | -| HETERO | `libopenvino_hetero_plugin.so` | Same as for selected plugins | `openvino_hetero_plugin.dll` | Same as for selected plugins | `libopenvino_hetero_plugin.so` | Same as for selected plugins | -| MULTI | `libopenvino_auto_plugin.so` | Same as for selected plugins | `openvino_auto_plugin.dll` | Same as for selected plugins | `libopenvino_auto_plugin.so` | Same as for selected plugins | -| AUTO | `libopenvino_auto_plugin.so` | Same as for selected plugins | `openvino_auto_plugin.dll` | Same as for selected plugins | `libopenvino_auto_plugin.so` | Same as for selected plugins | -| BATCH | `libopenvino_auto_batch_plugin.so` | Same as for selected plugins | `openvino_auto_batch_plugin.dll` | Same as for selected plugins | `libopenvino_auto_batch_plugin.so` | Same as for selected plugins | - ## Supported Configurations The OpenVINO Runtime can inference models in different formats with various input and output formats. diff --git a/docs/OV_Runtime_UG/supported_plugins/config_properties.md b/docs/OV_Runtime_UG/supported_plugins/config_properties.md index 5b0fef66c20..465ebc1a05d 100644 --- a/docs/OV_Runtime_UG/supported_plugins/config_properties.md +++ b/docs/OV_Runtime_UG/supported_plugins/config_properties.md @@ -21,21 +21,22 @@ Refer to the [Hello Query Device С++ Sample](../../../samples/cpp/hello_query_d Based on read-only property `ov::available_devices`, OpenVINO Core collects information about currently available devices enabled by OpenVINO plugins and returns information using the `ov::Core::get_available_devices` method: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [get_available_devices] +@snippet docs/snippets/ov_properties_api.cpp get_available_devices -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [get_available_devices] +@sphinxtab{Python} + +@snippet docs/snippets/ov_properties_api.py get_available_devices + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective The function returns a list of available devices, for example: @@ -73,27 +74,41 @@ For documentation about OpenVINO common device-independent properties, refer to The code below demonstrates how to query `HETERO` device priority of devices which will be used to infer the model: -@snippet snippets/ov_properties_api.cpp hetero_priorities +@sphinxtabset + +@sphinxtab{C++} + +@snippet docs/snippets/ov_properties_api.cpp hetero_priorities + +@endsphinxtab + +@sphinxtab{Python} + +@snippet docs/snippets/ov_properties_api.py hetero_priorities + +@endsphinxtab + +@endsphinxtabset > **NOTE**: All properties have a type, which is specified during property declaration. Based on this, actual type under `auto` is automatically deduced by C++ compiler. To extract device properties such as available devices (`ov::available_devices`), device name (`ov::device::full_name`), supported properties (`ov::supported_properties`), and others, use the `ov::Core::get_property` method: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [cpu_device_name] +@snippet docs/snippets/ov_properties_api.cpp cpu_device_name -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [cpu_device_name] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py cpu_device_name + +@endsphinxtab + +@endsphinxtabset A returned value appears as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`. @@ -109,21 +124,21 @@ A returned value appears as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`. accept variadic list of properties as last arguments. Each property in such parameters lists should be used as function call to pass property value with specified property type. -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [compile_model_with_property] +@snippet docs/snippets/ov_properties_api.cpp compile_model_with_property -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [compile_model_with_property] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py compile_model_with_property + +@endsphinxtab + +@endsphinxtabset The example below specifies hints that a model should be compiled to be inferenced with multiple inference requests in parallel to achive best throughput while inference should be performed without accuracy loss with FP32 precision. @@ -131,21 +146,21 @@ The example below specifies hints that a model should be compiled to be inferenc `ov::Core::set_property` with a given device name should be used to set global configuration properties which are the same accross multiple `ov::Core::compile_model`, `ov::Core::query_model`, etc. calls, while setting property on the specific `ov::Core::compile_model` call applies properties only for current call: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [core_set_property_then_compile] +@snippet docs/snippets/ov_properties_api.cpp core_set_property_then_compile -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [core_set_property_then_compile] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py core_set_property_then_compile + +@endsphinxtab + +@endsphinxtabset ### Properties on CompiledModel level @@ -153,74 +168,75 @@ The example below specifies hints that a model should be compiled to be inferenc The `ov::CompiledModel::get_property` method is used to get property values the compiled model has been created with or a compiled model level property such as `ov::optimal_number_of_infer_requests`: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [optimal_number_of_infer_requests] +@snippet docs/snippets/ov_properties_api.cpp optimal_number_of_infer_requests -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [optimal_number_of_infer_requests] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py optimal_number_of_infer_requests + +@endsphinxtab + +@endsphinxtabset Or the current temperature of the `MYRIAD` device: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [device_thermal] +@snippet docs/snippets/ov_properties_api.cpp device_thermal -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [device_thermal] +@sphinxtab{Python} + +@snippet docs/snippets/ov_properties_api.py device_thermal + +@endsphinxtab + +@endsphinxtabset -@endsphinxdirective Or the number of threads that would be used for inference on `CPU` device: -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [inference_num_threads] +@snippet docs/snippets/ov_properties_api.cpp inference_num_threads -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [inference_num_threads] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py inference_num_threads + +@endsphinxtab + +@endsphinxtabset #### Setting properties for compiled model The only mode that supports this method is [Multi-Device execution](../multi_device.md): -@sphinxdirective +@sphinxtabset -.. tab:: C++ +@sphinxtab{C++} - .. doxygensnippet:: docs/snippets/ov_properties_api.cpp - :language: cpp - :fragment: [multi_device] +@snippet docs/snippets/ov_properties_api.cpp multi_device -.. tab:: Python +@endsphinxtab - .. doxygensnippet:: docs/snippets/ov_properties_api.py - :language: python - :fragment: [multi_device] +@sphinxtab{Python} -@endsphinxdirective +@snippet docs/snippets/ov_properties_api.py multi_device + +@endsphinxtab + +@endsphinxtabset diff --git a/docs/_static/css/custom.css b/docs/_static/css/custom.css index 763b9f7505b..fd48b99c149 100644 --- a/docs/_static/css/custom.css +++ b/docs/_static/css/custom.css @@ -78,6 +78,38 @@ div.highlight { color: #fff; } +/* Transition banner */ +.transition-banner { + top: 60px; + background: #76CEFF; + position: fixed; + text-align: center; + color: white; + z-index: 1001; + display: block; + padding:0 2rem; + font-size: var(--pst-sidebar-font-size); + border: none; + border-radius: 0; + font-weight: bold; +} + +.transition-banner > p { + margin-bottom: 0; +} + +.transition-banner .close { + padding: 0 1.25rem; + color: #000; +} + + +@media (max-width: 720px) { + .transition-banner { + margin-top: 2rem; + } +} + @media (min-width: 1200px) { .container, .container-lg, .container-md, .container-sm, .container-xl { max-width: 1800px; diff --git a/docs/_static/images/inputs_defined.png b/docs/_static/images/inputs_defined.png new file mode 100644 index 00000000000..2abeb53b706 --- /dev/null +++ b/docs/_static/images/inputs_defined.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:50c191b2949e981811fbdd009b4138d88d2731432f308de010757058061cacbe +size 37171 diff --git a/docs/_static/images/omz_banner.png b/docs/_static/images/omz_banner.png deleted file mode 100644 index 32a9e7f899e..00000000000 --- a/docs/_static/images/omz_banner.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:6a3d820b43de20a74d857cb720783c1b579624afde26a2bb4bb097ba8fd0bd79 -size 2052 diff --git a/docs/_static/images/original_model_banner.png b/docs/_static/images/original_model_banner.png deleted file mode 100644 index 76d5e0dd808..00000000000 --- a/docs/_static/images/original_model_banner.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:04e954b5e1501f958ea2c03303760786ca7a57aaf6de335cb936750c675e6107 -size 1626 diff --git a/docs/_static/js/custom.js b/docs/_static/js/custom.js index 23763b3f0ca..8e323f6fa6e 100644 --- a/docs/_static/js/custom.js +++ b/docs/_static/js/custom.js @@ -28,7 +28,7 @@ var wapSection = 'openvinotoolkit'; // legal notice for benchmarks function addLegalNotice() { if (window.location.href.indexOf('openvino_docs_performance_') !== -1) { - var legalNotice = $('

Results may vary. For workloads and configurations visit: www.intel.com/PerformanceIndex and Legal Information.

'); + var legalNotice = $('

Results may vary. For workloads visit: workloads and for configurations visit: configurations. See also Legal Information.

'); $('body').append(legalNotice); } } @@ -45,6 +45,7 @@ $(document).ready(function () { addTableSort(); } addLegalNotice(); + createSphinxTabSets(); }); // Determine where we'd go if clicking on a version selector option @@ -54,6 +55,35 @@ function getPageUrlWithVersion(version) { return encodeURI(newURL); } + +function createSphinxTabSets() { + var sphinxTabSets = $('.sphinxtabset'); + var tabSetCount = 1000; + sphinxTabSets.each(function() { + var tabSet = $(this); + var inputCount = 1; + tabSet.addClass('tab-set docutils'); + tabSetCount++; + tabSet.find('> .sphinxtab').each(function() { + var tab = $(this); + var checked = ''; + var tabValue = tab.attr('data-sphinxtab-value'); + if (inputCount == 1) { + checked = 'checked'; + } + var input = $(``); + input.insertBefore(tab); + var label = $(``); + label.click(onLabelClick); + label.insertBefore(tab); + inputCount++; + tab.addClass('tab-content docutils'); + }); + + }) + ready(); // # this function is available from tabs.js +} + function updateTitleTag() { var title = $('title'); var currentVersion = getCurrentVersion(); diff --git a/docs/_templates/layout.html b/docs/_templates/layout.html index de7a83e8c19..7ec0e82fc47 100644 --- a/docs/_templates/layout.html +++ b/docs/_templates/layout.html @@ -14,3 +14,13 @@ {% endblock %} + +{% block docs_navbar %} +{{ super() }} + +{% endblock %} diff --git a/docs/benchmarks/performance_benchmarks.md b/docs/benchmarks/performance_benchmarks.md index 2e11602b110..26cdf7419bc 100644 --- a/docs/benchmarks/performance_benchmarks.md +++ b/docs/benchmarks/performance_benchmarks.md @@ -9,16 +9,16 @@ openvino_docs_performance_benchmarks_openvino openvino_docs_performance_benchmarks_ovms - + @endsphinxdirective -The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) helps accelerate deep learning inference across a variety of Intel® processors and accelerators. +The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) helps accelerate deep learning inference across a variety of Intel® processors and accelerators. -The benchmarks below demonstrate high performance gains on several public neural networks on multiple Intel® CPUs, GPUs and VPUs covering a broad performance range. Use this data to help you decide which hardware is best for your applications and solutions, or to plan your AI workload on the Intel computing already included in your solutions. +The benchmarks below demonstrate high performance gains on several public neural networks on multiple Intel® CPUs, GPUs and VPUs covering a broad performance range. Use this data to help you decide which hardware is best for your applications and solutions, or to plan your AI workload on the Intel computing already included in your solutions. -Use the links below to review the benchmarking results for each alternative: +Use the links below to review the benchmarking results for each alternative: -* [Intel® Distribution of OpenVINO™ toolkit Benchmark Results](performance_benchmarks_openvino.md) -* [OpenVINO™ Model Server Benchmark Results](performance_benchmarks_ovms.md) +* [Intel® Distribution of OpenVINO™ toolkit Benchmark Results](performance_benchmarks_openvino.md) +* [OpenVINO™ Model Server Benchmark Results](performance_benchmarks_ovms.md) Performance for a particular application can also be evaluated virtually using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/). diff --git a/docs/benchmarks/performance_benchmarks_faq.md b/docs/benchmarks/performance_benchmarks_faq.md index b628b12f116..9f8dcbe053c 100644 --- a/docs/benchmarks/performance_benchmarks_faq.md +++ b/docs/benchmarks/performance_benchmarks_faq.md @@ -6,7 +6,7 @@ The following questions and answers are related to [performance benchmarks](./pe New performance benchmarks are typically published on every `major.minor` release of the Intel® Distribution of OpenVINO™ toolkit. #### 2. Where can I find the models used in the performance benchmarks? -All of the models used are included in the toolkit's [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) GitHub repository. +All of the models used are included in the toolkit's [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) GitHub repository. #### 3. Will there be new models added to the list used for benchmarking? The models used in the performance benchmarks were chosen based on general adoption and usage in deployment scenarios. We're continuing to add new models that support a diverse set of workloads and usage. @@ -15,38 +15,45 @@ The models used in the performance benchmarks were chosen based on general adopt CF means Caffe*, while TF means TensorFlow*. #### 5. How can I run the benchmark results on my own? -All of the performance benchmarks were generated using the open-sourced tool within the Intel® Distribution of OpenVINO™ toolkit called `benchmark_app`, which is available in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md). +All of the performance benchmarks were generated using the open-sourced tool within the Intel® Distribution of OpenVINO™ toolkit called `benchmark_app`, which is available in both [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md). #### 6. What image sizes are used for the classification network models? The image size used in the inference depends on the network being benchmarked. The following table shows the list of input sizes for each network model. -| **Model** | **Public Network** | **Task** | **Input Size** (Height x Width) | +| **Model** | **Public Network** | **Task** | **Input Size** (Height x Width) | |------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|-----------------------------|-----------------------------------| -| [bert-large-uncased-whole-word-masking-squad](https://github.com/openvinotoolkit/open_model_zoo/tree/develop/models/intel/bert-large-uncased-whole-word-masking-squad-int8-0001) | BERT-large |question / answer |384| +| [bert-base-cased](https://github.com/PaddlePaddle/PaddleNLP/tree/v2.1.1) | BERT | question / answer | 124 | +| [bert-large-uncased-whole-word-masking-squad](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/bert-large-uncased-whole-word-masking-squad-int8-0001) | BERT-large | question / answer | 384 | +| [bert-small-uncased-whole-masking-squad-0002](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/bert-small-uncased-whole-word-masking-squad-0002) | BERT-small | question / answer | 384 | | [brain-tumor-segmentation-0001-MXNET](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/brain-tumor-segmentation-0001) | brain-tumor-segmentation-0001 | semantic segmentation | 128x128x128 | | [brain-tumor-segmentation-0002-CF2](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/brain-tumor-segmentation-0002) | brain-tumor-segmentation-0002 | semantic segmentation | 128x128x128 | -| [deeplabv3-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/deeplabv3) | DeepLab v3 Tf | semantic segmentation | 513x513 | -| [densenet-121-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/densenet-121-tf) | Densenet-121 Tf | classification | 224x224 | +| [deeplabv3-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/deeplabv3) | DeepLab v3 Tf | semantic segmentation | 513x513 | +| [densenet-121-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/densenet-121-tf) | Densenet-121 Tf | classification | 224x224 | +| [efficientdet-d0](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/efficientdet-d0-tf) | Efficientdet | classification | 512x512 | | [facenet-20180408-102900-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/facenet-20180408-102900) | FaceNet TF | face recognition | 160x160 | -| [faster_rcnn_resnet50_coco-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco) | Faster RCNN Tf | object detection | 600x1024 | -| [inception-v4-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/develop/models/public/googlenet-v4-tf) | Inception v4 Tf (aka GoogleNet-V4) | classification | 299x299 | -| [inception-v3-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/googlenet-v3) | Inception v3 Tf | classification | 299x299 | -| [mobilenet-ssd-CF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-ssd) | SSD (MobileNet)_COCO-2017_Caffe | object detection | 300x300 | -| [mobilenet-v2-1.0-224-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-1.0-224) | MobileNet v2 Tf | classification | 224x224 | -| [mobilenet-v2-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-pytorch ) | Mobilenet V2 PyTorch | classification | 224x224 | -| [resnet-18-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-18-pytorch) | ResNet-18 PyTorch | classification | 224x224 | +| [Facedetection0200](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/intel/face-detection-0200) | FaceDetection0200 | detection | 256x256 | +| [faster_rcnn_resnet50_coco-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco) | Faster RCNN Tf | object detection | 600x1024 | +| [forward-tacotron-duration-prediction](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/forward-tacotron) | ForwardTacotron | text to speech | 241 | +| [inception-v4-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/develop/models/public/googlenet-v4-tf) | Inception v4 Tf (aka GoogleNet-V4) | classification | 299x299 | +| [inception-v3-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/googlenet-v3) | Inception v3 Tf | classification | 299x299 | +| [mask_rcnn_resnet50_atrous_coco](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mask_rcnn_resnet50_atrous_coco) | Mask R-CNN ResNet50 Atrous | instance segmentation | 800x1365 | +| [mobilenet-ssd-CF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-ssd) | SSD (MobileNet)_COCO-2017_Caffe | object detection | 300x300 | +| [mobilenet-v2-1.0-224-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-1.0-224) | MobileNet v2 Tf | classification | 224x224 | +| [mobilenet-v2-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-pytorch ) | Mobilenet V2 PyTorch | classification | 224x224 | +| [Mobilenet-V3-small](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v3-small-1.0-224-tf) | Mobilenet-V3-1.0-224 | classifier | 224x224 | +| [Mobilenet-V3-large](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v3-large-1.0-224-tf) | Mobilenet-V3-1.0-224 | classifier | 224x224 | +| [pp-ocr-rec](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.1/) | PP-OCR | optical character recognition | 32x640 | +| [pp-yolo](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.1) | PP-YOLO | detection | 640x640 | +| [resnet-18-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-18-pytorch) | ResNet-18 PyTorch | classification | 224x224 | | [resnet-50-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-pytorch) | ResNet-50 v1 PyTorch | classification | 224x224 | -| [resnet-50-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) | ResNet-50_v1_ILSVRC-2012 | classification | 224x224 | -| [se-resnext-50-CF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/se-resnext-50) | Se-ResNext-50_ILSVRC-2012_Caffe | classification | 224x224 | -| [squeezenet1.1-CF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/squeezenet1.1) | SqueezeNet_v1.1_ILSVRC-2012_Caffe | classification | 227x227 | -| [ssd300-CF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd300) | SSD (VGG-16)_VOC-2007_Caffe | object detection | 300x300 | -| [yolo_v4-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf) | Yolo-V4 TF | object detection | 608x608 | +| [resnet-50-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) | ResNet-50_v1_ILSVRC-2012 | classification | 224x224 | +| [yolo_v4-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v4-tf) | Yolo-V4 TF | object detection | 608x608 | | [ssd_mobilenet_v1_coco-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd_mobilenet_v1_coco) | ssd_mobilenet_v1_coco | object detection | 300x300 | | [ssdlite_mobilenet_v2-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssdlite_mobilenet_v2) | ssdlite_mobilenet_v2 | object detection | 300x300 | | [unet-camvid-onnx-0001](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/unet-camvid-onnx-0001/description/unet-camvid-onnx-0001.md) | U-Net | semantic segmentation | 368x480 | | [yolo-v3-tiny-tf](https://github.com/openvinotoolkit/open_model_zoo/tree/develop/models/public/yolo-v3-tiny-tf) | YOLO v3 Tiny | object detection | 416x416 | +| [yolo-v3](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf) | YOLO v3 | object detection | 416x416 | | [ssd-resnet34-1200-onnx](https://github.com/openvinotoolkit/open_model_zoo/tree/develop/models/public/ssd-resnet34-1200-onnx) | ssd-resnet34 onnx model | object detection | 1200x1200 | -| [vgg19-caffe](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/vgg19-caffe2) | VGG-19 | classification | 224x224| - + #### 7. Where can I purchase the specific hardware used in the benchmarking? Intel partners with various vendors all over the world. Visit the [Intel® AI: In Production Partners & Solutions Catalog](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html) for a list of Equipment Makers and the [Supported Devices](../OV_Runtime_UG/supported_plugins/Supported_Devices.md) documentation. You can also remotely test and run models before purchasing any hardware by using [Intel® DevCloud for the Edge](http://devcloud.intel.com/edge/). @@ -54,18 +61,10 @@ Intel partners with various vendors all over the world. Visit the [Intel® AI: I We published a set of guidelines and recommendations to optimize your models available in the [optimization guide](../optimization_guide/dldt_optimization_guide.md). For further support, please join the conversation in the [Community Forum](https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit). #### 9. Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support? -The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](performance_int8_vs_fp32.md) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270. +The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Refer to [Model Accuracy for INT8 and FP32 Precision](performance_int8_vs_fp32.md) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270. -#### 10. Previous releases included benchmarks on googlenet-v1-CF (Caffe). Why is there no longer benchmarks on this neural network model? -We replaced googlenet-v1-CF to resnet-18-pytorch due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1-CF. Both googlenet-v1-CF and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases. - -#### 11. Why have resnet-50-CF, mobilenet-v1-1.0-224-CF, mobilenet-v2-CF and resnet-101-CF been removed? -The CAFFE version of resnet-50, mobilenet-v1-1.0-224 and mobilenet-v2 have been replaced with their TensorFlow and PyTorch counterparts. Resnet-50-CF is replaced by resnet-50-TF, mobilenet-v1-1.0-224-CF is replaced by mobilenet-v1-1.0-224-TF and mobilenet-v2-CF is replaced by mobilenetv2-PyTorch. Resnet-50-CF an resnet-101-CF are no longer maintained at their public source repos. - -#### 12. Where can I search for OpenVINO™ performance results based on HW-platforms? +#### 10. Where can I search for OpenVINO™ performance results based on HW-platforms? The web site format has changed in order to support the more common search approach of looking for the performance of a given neural network model on different HW-platforms. As opposed to review a given HW-platform's performance on different neural network models. -#### 13. How is Latency measured? +#### 11. How is Latency measured? Latency is measured by running the OpenVINO™ Runtime in synchronous mode. In synchronous mode each frame or image is processed through the entire set of stages (pre-processing, inference, post-processing) before the next frame or image is processed. This KPI is relevant for applications where the inference on a single image is required, for example the analysis of an ultra sound image in a medical application or the analysis of a seismic image in the oil & gas industry. Other use cases include real-time or near real-time applications like an industrial robot's response to changes in its environment and obstacle avoidance for autonomous vehicles where a quick response to the result of the inference is required. - -For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md). diff --git a/docs/benchmarks/performance_benchmarks_openvino.md b/docs/benchmarks/performance_benchmarks_openvino.md index 89b282ebf6a..17d35a7b2e5 100644 --- a/docs/benchmarks/performance_benchmarks_openvino.md +++ b/docs/benchmarks/performance_benchmarks_openvino.md @@ -4,26 +4,33 @@ .. toctree:: :maxdepth: 1 :hidden: - + openvino_docs_performance_benchmarks_faq - Download Performance Data Spreadsheet in MS Excel* Format + Download Performance Data Spreadsheet in MS Excel* Format openvino_docs_performance_int8_vs_fp32 - + @endsphinxdirective -This benchmark setup includes a single machine on which both the benchmark application and the OpenVINO™ installation reside. +This benchmark setup includes a single machine on which both the benchmark application and the OpenVINO™ installation reside. -The benchmark application loads the OpenVINO Runtime (SW) at runtime and executes inferences on the specified hardware (CPU, GPU or VPU). The benchmark application measures the time spent on actual inferencing (excluding any pre or post processing) and then reports on the inferences per second (or Frames Per Second). For more information on the benchmark application, please also refer to the entry 5 of the [FAQ section](performance_benchmarks_faq.md). - -Devices similar to the ones we have used for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/). +The benchmark application loads the OpenVINO™ Runtime and executes inferences on the specified hardware (CPU, GPU or VPU). The benchmark application measures the time spent on actual inferencing (excluding any pre or post processing) and then reports on the inferences per second (or Frames Per Second). For more information on the benchmark application, please also refer to the entry 5 of the [FAQ section](performance_benchmarks_faq.md). Measuring inference performance involves many variables and is extremely use-case and application dependent. We use the below four parameters for measurements, which are key elements to consider for a successful deep learning inference application: - **Throughput** - Measures the number of inferences delivered within a latency threshold. (for example, number of Frames Per Second - FPS). When deploying a system with deep learning inference, select the throughput that delivers the best trade-off between latency and power for the price and performance that meets your requirements. - **Value** - While throughput is important, what is more critical in edge AI deployments is the performance efficiency or performance-per-cost. Application performance in throughput per dollar of system cost is the best measure of value. - **Efficiency** - System power is a key consideration from the edge to the data center. When selecting deep learning solutions, power efficiency (throughput/watt) is a critical factor to consider. Intel designs provide excellent power efficiency for running deep learning workloads. -- **Latency** - This measures the synchronous execution of inference requests and is reported in milliseconds. Each inference request (for example: preprocess, infer, postprocess) is allowed to complete before the next is started. This performance metric is relevant in usage scenarios where a single image input needs to be acted upon as soon as possible. An example would be the healthcare sector where medical personnel only request analysis of a single ultra sound scanning image or in real-time or near real-time applications for example an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. +- **Latency** - This measures the synchronous execution of inference requests and is reported in milliseconds. Each inference request (for example: preprocess, infer, postprocess) is allowed to complete before the next is started. This performance metric is relevant in usage scenarios where a single image input needs to be acted upon as soon as possible. An example would be the healthcare sector where medical personnel only request analysis of a single ultra sound scanning image or in real-time or near real-time applications for example an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. + +## bert-base-cased [124] + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective ## bert-large-uncased-whole-word-masking-squad-int8-0001 [384] @@ -32,7 +39,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective ## deeplabv3-TF [513x513] @@ -41,7 +48,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective ## densenet-121-TF [224x224] @@ -50,7 +57,16 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + +@endsphinxdirective + +## efficientdet-d0 [512x512] + +@sphinxdirective +.. raw:: html + +
+ @endsphinxdirective ## faster-rcnn-resnet50-coco-TF [600x1024] @@ -59,17 +75,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- -@endsphinxdirective - -## inception-v3-TF [299x299] - -@sphinxdirective -.. raw:: html - -
- @endsphinxdirective ## inception-v4-TF [299x299] @@ -78,7 +84,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective ## mobilenet-ssd-CF [300x300] @@ -87,7 +93,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective ## mobilenet-v2-pytorch [224x224] @@ -96,7 +102,7 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective ## resnet-18-pytorch [224x224] @@ -105,71 +111,17 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective + ## resnet_50_TF [224x224] @sphinxdirective .. raw:: html
- -@endsphinxdirective -## se-resnext-50-CF [224x224] - -@sphinxdirective -.. raw:: html - -
- -@endsphinxdirective - -## squeezenet1.1-CF [227x227] - -@sphinxdirective -.. raw:: html - -
- -@endsphinxdirective - -## ssd300-CF [300x300] - -@sphinxdirective -.. raw:: html - -
- -@endsphinxdirective - -## yolo-v3-tiny-tf [416x416] - -@sphinxdirective -.. raw:: html - -
- -@endsphinxdirective - -## yolo_v4-tf [608x608] - -@sphinxdirective -.. raw:: html - -
- -@endsphinxdirective - - -## unet-camvid-onnx-0001 [368x480] - -@sphinxdirective -.. raw:: html - -
- @endsphinxdirective ## ssd-resnet34-1200-onnx [1200x1200] @@ -178,27 +130,45 @@ Measuring inference performance involves many variables and is extremely use-cas .. raw:: html
- + @endsphinxdirective -## vgg19-caffe [224x224] +## unet-camvid-onnx-0001 [368x480] @sphinxdirective .. raw:: html -
- +
+ +@endsphinxdirective + +## yolo-v3-tiny-tf [416x416] + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## yolo_v4-tf [608x608] + +@sphinxdirective +.. raw:: html + +
+ @endsphinxdirective ## Platform Configurations -Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.4. +Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2022.1. -Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of June 18, 2021 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure. +Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of March 17, 2022 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure. Performance varies by use, configuration and other factors. Learn more at [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex). -Your costs and results may vary. +Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. @@ -208,91 +178,113 @@ Testing by Intel done on: see test date for each HW platform below. **CPU Inference Engines** -| Configuration | Intel® Xeon® E-2124G | Intel® Xeon® W1290P | Intel® Xeon® Silver 4216R | -| ------------------------------- | ---------------------- | --------------------------- | ---------------------------- | -| Motherboard | ASUS* WS C246 PRO | ASUS* WS W480-ACE | Intel® Server Board S2600STB | -| CPU | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® W-1290P CPU @ 3.70GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz | -| Hyper Threading | OFF | ON | ON | -| Turbo Setting | ON | ON | ON | -| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 @ 2666MHz |12 x 32 GB DDR4 2666MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | -| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | -| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc. | Intel Corporation | -| BIOS Version | 0904 | 607 | SE5C620.86B.02.01.
0013.121520200651 | -| BIOS Release | April 12, 2019 | May 29, 2020 | December 15, 2020 | -| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
save & exit | Select optimized default settings,
change power policy
to "performance",
save & exit | -| Batch size | 1 | 1 | 1 -| Precision | INT8 | INT8 | INT8 -| Number of concurrent inference requests | 4 | 5 | 32 -| Test Date | June 18, 2021 | June 18, 2021 | June 18, 2021 -| Rated maximum TDP/socket in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [539](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) |[1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | +| Configuration | Intel® Xeon® E-2124G | Intel® Xeon® W1290P | +| ------------------------------- | ---------------------- | --------------------------- | +| Motherboard | ASUS* WS C246 PRO | ASUS* WS W480-ACE | +| CPU | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® W-1290P CPU @ 3.70GHz | +| Hyper Threading | OFF | ON | +| Turbo Setting | ON | ON | +| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 @ 2666MHz | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | +| Kernel Version | 5.4.0-42-generic | 5.4.0-42-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc. | +| BIOS Version | 1901 | 2301 | +| BIOS Release | September 24, 2021 | July 8, 2021 | +| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
save & exit | +| Batch size | 1 | 1 | +| Precision | INT8 | INT8 | +| Number of concurrent inference requests | 4 | 5 | +| Test Date | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [539](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) | **CPU Inference Engines (continue)** -| Configuration | Intel® Xeon® Gold 5218T | Intel® Xeon® Platinum 8270 | Intel® Xeon® Platinum 8380 | -| ------------------------------- | ---------------------------- | ---------------------------- | -----------------------------------------| -| Motherboard | Intel® Server Board S2600STB | Intel® Server Board S2600STB | Intel Corporation / WilsonCity | -| CPU | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz | Intel® Xeon® Platinum 8380 CPU @ 2.30GHz | -| Hyper Threading | ON | ON | ON | -| Turbo Setting | ON | ON | ON | -| Memory | 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2933MHz | 16 x 16 GB DDR4 3200MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | -| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | -| BIOS Vendor | Intel Corporation | Intel Corporation | Intel Corporation | -| BIOS Version | SE5C620.86B.02.01.
0013.121520200651 | SE5C620.86B.02.01.
0013.121520200651 | WLYDCRB1.SYS.0020.
P86.2103050636 | -| BIOS Release | December 15, 2020 | December 15, 2020 | March 5, 2021 | -| BIOS Settings | Select optimized default settings,
change power policy to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | -| Batch size | 1 | 1 | 1 | -| Precision | INT8 | INT8 | INT8 | -| Number of concurrent inference requests |32 | 52 | 80 | -| Test Date | June 18, 2021 | June 18, 2021 | June 18, 2021 | -| Rated maximum TDP/socket in Watt | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) | [270](https://ark.intel.com/content/www/us/en/ark/products/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz.html) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) | [8,099](https://ark.intel.com/content/www/us/en/ark/products/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz.html) | +| Configuration | Intel® Xeon® Silver 4216R | Intel® Xeon® Silver 4316 | +| ------------------------------- | ---------------------- | --------------------------- | +| Motherboard | Intel® Server Board S2600STB | Intel Corporation / WilsonCity | +| CPU | Intel® Xeon® Silver 4216R CPU @ 2.20GHz | Intel® Xeon® Silver 4316 CPU @ 2.30GHz | +| Hyper Threading | ON | ON | +| Turbo Setting | ON | ON | +| Memory | 12 x 32 GB DDR4 2666MHz | 16 x 32 GB DDR4 @ 2666MHz | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | +| Kernel Version | 5.3.0-24-generic | 5.4.0-100-generic | +| BIOS Vendor | Intel Corporation | Intel Corporation | +| BIOS Version | SE5C620.86B.02.01.
0013.121520200651 | WLYDCRB1.SYS.0021.
P41.2109200451 | +| BIOS Release | December 15, 2020 | September 20, 2021 | +| BIOS Settings | Select optimized default settings,
change power policy
to "performance",
save & exit | Select optimized default settings,
save & exit | +| Batch size | 1 | 1 | +| Precision | INT8 | INT8 | +| Number of concurrent inference requests | 32 | 10 | +| Test Date | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [150](https://ark.intel.com/content/www/us/en/ark/products/215270/intel-xeon-silver-4316-processor-30m-cache-2-30-ghz.html)| +| CPU Price/socket on June 21, 2021, USD
Prices may vary | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | [1083](https://ark.intel.com/content/www/us/en/ark/products/215270/intel-xeon-silver-4316-processor-30m-cache-2-30-ghz.html)| + +**CPU Inference Engines (continue)** + +| Configuration | Intel® Xeon® Gold 5218T | Intel® Xeon® Platinum 8270 | Intel® Xeon® Platinum 8380 | +| ------------------------------- | ---------------------------- | ---------------------------- | -----------------------------------------| +| Motherboard | Intel® Server Board S2600STB | Intel® Server Board S2600STB | Intel Corporation / WilsonCity | +| CPU | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz | Intel® Xeon® Platinum 8380 CPU @ 2.30GHz | +| Hyper Threading | ON | ON | ON | +| Turbo Setting | ON | ON | ON | +| Memory | 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2933MHz | 16 x 16 GB DDR4 3200MHz | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.1 LTS | +| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.4.0-64-generic | +| BIOS Vendor | Intel Corporation | Intel Corporation | Intel Corporation | +| BIOS Version | SE5C620.86B.02.01.
0013.121520200651 | SE5C620.86B.02.01.
0013.121520200651 | WLYDCRB1.SYS.0020.
P86.2103050636 | +| BIOS Release | December 15, 2020 | December 15, 2020 | March 5, 2021 | +| BIOS Settings | Select optimized default settings,
change power policy to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | Select optimized default settings,
change power policy to "performance",
save & exit | +| Batch size | 1 | 1 | 1 | +| Precision | INT8 | INT8 | INT8 | +| Number of concurrent inference requests | 32 | 52 | 80 | +| Test Date | March 17, 2022 | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) | [270](https://mark.intel.com/content/www/us/en/secure/mark/products/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz.html#tab-blade-1-0-1) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) | [8,099](https://mark.intel.com/content/www/us/en/secure/mark/products/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz.html#tab-blade-1-0-0) | **CPU Inference Engines (continue)** -| Configuration | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | -| -------------------- | ----------------------------------- |--------------------------------------| -| Motherboard | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | -| CPU | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | -| Hyper Threading | ON | ON | -| Turbo Setting | ON | ON | -| Memory | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | -| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | -| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | -| BIOS Version | F14c | 1004 | -| BIOS Release | March 23, 2021 | March 19, 2021 | -| BIOS Settings | Select optimized default settings,
set OS type to "other",
save & exit | Default Settings | -| Batch size | 1 | 1 | -| Precision | INT8 | INT8 | -| Number of concurrent inference requests |4 | 24 | -| Test Date | June 18, 2021 | June 18, 2021 | -| Rated maximum TDP/socket in Watt | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | +| Configuration | Intel® Core™ i9-10920X | Intel® Core™ i9-10900TE | Intel® Core™ i9-12900 | +| -------------------- | -------------------------------------| ----------------------- | -------------------------------------------------------------- | +| Motherboard | ASUS* PRIME X299-A II | B595 | Intel Corporation
internal/Reference
Validation Platform | +| CPU | Intel® Core™ i9-10920X CPU @ 3.50GHz | Intel® Core™ i9-10900TE CPU @ 1.80GHz | 12th Gen Intel® Core™ i9-12900 | +| Hyper Threading | ON | ON | OFF | +| Turbo Setting | ON | ON | - | +| Memory | 4 x 16 GB DDR4 2666MHz | 2 x 8 GB DDR4 @ 2400 MHz | 4 x 8 GB DDR4 4800MHz | +| Operating System | Ubuntu 20.04.3 LTS | Ubuntu 20.04.3 LTS | Microsoft Windows 10 Pro | +| Kernel Version | 5.4.0-42-generic | 5.4.0-42-generic | 10.0.19043 N/A Build 19043 | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | Intel Corporation | +| BIOS Version | 1004 | Z667AR10.BIN | ADLSFWI1.R00.2303.
B00.2107210432 | +| BIOS Release | March 19, 2021 | July 15, 2020 | July 21, 2021 | +| BIOS Settings | Default Settings | Default Settings | Default Settings | +| Batch size | 1 | 1 | 1 | +| Precision | INT8 | INT8 | INT8 | +| Number of concurrent inference requests | 24 | 5 | 4 | +| Test Date | March 17, 2022 | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [35](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i910900te-processor-20m-cache-up-to-4-60-ghz.html) | [65](https://ark.intel.com/content/www/us/en/ark/products/134597/intel-core-i912900-processor-30m-cache-up-to-5-10-ghz.html) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [444](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i910900te-processor-20m-cache-up-to-4-60-ghz.html) | [519](https://ark.intel.com/content/www/us/en/ark/products/134597/intel-core-i912900-processor-30m-cache-up-to-5-10-ghz.html)| **CPU Inference Engines (continue)** -| Configuration | 11th Gen Intel® Core™ i7-1185G7 | 11th Gen Intel® Core™ i7-11850HE | -| -------------------- | --------------------------------|----------------------------------| -| Motherboard | Intel Corporation
internal/Reference
Validation Platform | Intel Corporation
internal/Reference
Validation Platform | -| CPU | 11th Gen Intel® Core™ i7-1185G7 @ 3.00GHz | 11th Gen Intel® Core™ i7-11850HE @ 2.60GHz | -| Hyper Threading | ON | ON | -| Turbo Setting | ON | ON | -| Memory | 2 x 8 GB DDR4 3200MHz | 2 x 16 GB DDR4 3200MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04.4 LTS | -| Kernel Version | 5.8.0-05-generic | 5.8.0-050800-generic | -| BIOS Vendor | Intel Corporation | Intel Corporation | -| BIOS Version | TGLSFWI1.R00.3425.
A00.2010162309 | TGLIFUI1.R00.4064.
A01.2102200132 | -| BIOS Release | October 16, 2020 | February 20, 2021 | -| BIOS Settings | Default Settings | Default Settings | -| Batch size | 1 | 1 | -| Precision | INT8 | INT8 | -| Number of concurrent inference requests |4 | 4 | -| Test Date | June 18, 2021 | June 18, 2021 | -| Rated maximum TDP/socket in Watt | [28](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html) | [45](https://ark.intel.com/content/www/us/en/ark/products/213799/intel-core-i7-11850h-processor-24m-cache-up-to-4-80-ghz.html) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [426](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html) | [395](https://ark.intel.com/content/www/us/en/ark/products/213799/intel-core-i7-11850h-processor-24m-cache-up-to-4-80-ghz.html) | +| Configuration | Intel® Core™ i7-8700T | Intel® Core™ i7-1185G7 | +| -------------------- | ----------------------------------- | -------------------------------- | +| Motherboard | GIGABYTE* Z370M DS3H-CF | Intel Corporation
internal/Reference
Validation Platform | +| CPU | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i7-1185G7 @ 3.00GHz | +| Hyper Threading | ON | ON | +| Turbo Setting | ON | ON | +| Memory | 4 x 16 GB DDR4 2400MHz | 2 x 8 GB DDR4 3200MHz | +| Operating System | Ubuntu 20.04.3 LTS | Ubuntu 20.04.3 LTS | +| Kernel Version | 5.4.0-42-generic | 5.8.0-050800-generic | +| BIOS Vendor | American Megatrends Inc.* | Intel Corporation | +| BIOS Version | F14c | TGLSFWI1.R00.4391.
A00.2109201819 | +| BIOS Release | March 23, 2021 | September 20, 2021 | +| BIOS Settings | Select optimized default settings,
set OS type to "other",
save & exit | Default Settings | +| Batch size | 1 | 1 | +| Precision | INT8 | INT8 | +| Number of concurrent inference requests | 4 | 4 | +| Test Date | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [28](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [426](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html) | **CPU Inference Engines (continue)** @@ -303,18 +295,18 @@ Testing by Intel done on: see test date for each HW platform below. | Hyper Threading | OFF | OFF | ON | | Turbo Setting | OFF | ON | ON | | Memory | 4 x 8 GB DDR4 2400MHz | 2 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 @ 2666MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | -| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | +| Kernel Version | 5.3.0-24-generic | 5.4.0-42-generic | 5.4.0-42-generic | | BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | -| BIOS Version | F8 | 2401 | F3 | -| BIOS Release | May 24, 2019 | July 12, 2019 | March 25, 2020 | +| BIOS Version | F8 | 3004 | F21 | +| BIOS Release | May 24, 2019 | July 12, 2021 | November 23, 2021 | | BIOS Settings | Select optimized default settings,
set OS type to "other",
save & exit | Select optimized default settings,
save & exit | Select optimized default settings,
set OS type to "other",
save & exit | | Batch size | 1 | 1 | 1 | | Precision | INT8 | INT8 | INT8 | -| Number of concurrent inference requests | 4 | 3 | 4 | -| Test Date | June 18, 2021 | June 18, 2021 | June 18, 2021 | -| Rated maximum TDP/socket in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)| [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1)| [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | +| Number of concurrent inference requests | 4 | 3 | 4 | +| Test Date | March 17, 2022 | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)| [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1)| [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | **CPU Inference Engines (continue)** @@ -325,46 +317,42 @@ Testing by Intel done on: see test date for each HW platform below. | CPU | Intel Atom® Processor E3940 @ 1.60GHz | Intel Atom® x6425RE
Processor @ 1.90GHz | Intel® Celeron®
6305E @ 1.80GHz | | Hyper Threading | OFF | OFF | OFF | | Turbo Setting | ON | ON | ON | -| Memory | 1 x 8 GB DDR3 1600MHz | 2 x 4GB DDR4 3200MHz | 2 x 8 GB DDR4 3200MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu 18.04.5 LTS | -| Kernel Version | 5.3.0-24-generic | 5.8.0-050800-generic | 5.8.0-050800-generic | +| Memory | 1 x 8 GB DDR3 1600MHz | 2 x 4GB DDR4 3200MHz | 2 x 8 GB DDR4 3200MHz | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | Ubuntu 20.04.3 LTS | +| Kernel Version | 5.4.0-42-generic | 5.13.0-27-generic | 5.13.0-1008-intel | | BIOS Vendor | American Megatrends Inc.* | Intel Corporation | Intel Corporation | -| BIOS Version | 5.12 | EHLSFWI1.R00.2463.
A03.2011200425 | TGLIFUI1.R00.4064.A02.2102260133 | -| BIOS Release | September 6, 2017 | November 22, 2020 | February 26, 2021 | +| BIOS Version | 5.12 | EHLSFWI1.R00.3273.
A01.2106300759 | TGLIFUI1.R00.4064.A02.2102260133 | +| BIOS Release | September 6, 2017 | June 30, 2021 | February 26, 2021 | | BIOS Settings | Default settings | Default settings | Default settings | | Batch size | 1 | 1 | 1 | | Precision | INT8 | INT8 | INT8 | | Number of concurrent inference requests | 4 | 4 | 4| -| Test Date | June 18, 2021 | June 18, 2021 | June 18, 2021 | -| Rated maximum TDP/socket in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [12](https://ark.intel.com/content/www/us/en/ark/products/207899/intel-atom-x6425re-processor-1-5m-cache-1-90-ghz.html) | [15](https://ark.intel.com/content/www/us/en/ark/products/208072/intel-celeron-6305e-processor-4m-cache-1-80-ghz.html)| -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [59](https://ark.intel.com/content/www/us/en/ark/products/207899/intel-atom-x6425re-processor-1-5m-cache-1-90-ghz.html) |[107](https://ark.intel.com/content/www/us/en/ark/products/208072/intel-celeron-6305e-processor-4m-cache-1-80-ghz.html) | - - +| Test Date | March 17, 2022 | March 17, 2022 | March 17, 2022 | +| Rated maximum TDP/socket in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [12](https://mark.intel.com/content/www/us/en/secure/mark/products/207907/intel-atom-x6425e-processor-1-5m-cache-up-to-3-00-ghz.html#tab-blade-1-0-1) | [15](https://ark.intel.com/content/www/us/en/ark/products/208072/intel-celeron-6305e-processor-4m-cache-1-80-ghz.html)| +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [59](https://ark.intel.com/content/www/us/en/ark/products/207899/intel-atom-x6425re-processor-1-5m-cache-1-90-ghz.html) |[107](https://ark.intel.com/content/www/us/en/ark/products/208072/intel-celeron-6305e-processor-4m-cache-1-80-ghz.html) | **Accelerator Inference Engines** -| Configuration | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design
with Intel® Movidius™ VPUs (Mustang-V100-MX8) | +| Configuration | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design
with Intel® Movidius™ VPUs (Mustang-V100-MX8) | | --------------------------------------- | ------------------------------------- | ------------------------------------- | | VPU | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 | | Connection | USB 2.0/3.0 | PCIe X4 | | Batch size | 1 | 1 | | Precision | FP16 | FP16 | | Number of concurrent inference requests | 4 | 32 | -| Rated maximum TDP/socket in Watt | 2.5 | [30](https://www.arrow.com/en/products/mustang-v100-mx8-r10/iei-technology?gclid=Cj0KCQiA5bz-BRD-ARIsABjT4ng1v1apmxz3BVCPA-tdIsOwbEjTtqnmp_rQJGMfJ6Q2xTq6ADtf9OYaAhMUEALw_wcB) | -| CPU Price/socket on June 21, 2021, USD
Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) | [425](https://www.arrow.com/en/products/mustang-v100-mx8-r10/iei-technology?gclid=Cj0KCQiA5bz-BRD-ARIsABjT4ng1v1apmxz3BVCPA-tdIsOwbEjTtqnmp_rQJGMfJ6Q2xTq6ADtf9OYaAhMUEALw_wcB) | +| Rated maximum TDP/socket in Watt | 2.5 | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) | +| CPU Price/socket on Feb 14, 2022, USD
Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) | [492](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) | | Host Computer | Intel® Core™ i7 | Intel® Core™ i5 | | Motherboard | ASUS* Z370-A II | Uzelinfo* / US-E1300 | | CPU | Intel® Core™ i7-8700 CPU @ 3.20GHz | Intel® Core™ i5-6600 CPU @ 3.30GHz | | Hyper Threading | ON | OFF | | Turbo Setting | ON | ON | | Memory | 4 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 2400MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Operating System | Ubuntu* 20.04.3 LTS | Ubuntu* 20.04.3 LTS | | Kernel Version | 5.0.0-23-generic | 5.0.0-23-generic | | BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | | BIOS Version | 411 | 5.12 | | BIOS Release | September 21, 2018 | September 21, 2018 | -| Test Date | June 18, 2021 | June 18, 2021 | +| Test Date | March 17, 2022 | March 17, 2022 | -Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvino.ai/resources/benchmark_files/system_configurations_2021.4.html) - -Results may vary. For workloads and configurations visit: [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex) and [Legal Information](../Legal_Information.md). +For more detailed configuration descriptions, see [Configuration Details](https://docs.openvino.ai/resources/benchmark_files/system_configurations_2022.1.html). \ No newline at end of file diff --git a/docs/benchmarks/performance_benchmarks_ovms.md b/docs/benchmarks/performance_benchmarks_ovms.md index d7393aa5047..68fb10b3987 100644 --- a/docs/benchmarks/performance_benchmarks_ovms.md +++ b/docs/benchmarks/performance_benchmarks_ovms.md @@ -1,6 +1,6 @@ # OpenVINO™ Model Server Benchmark Results {#openvino_docs_performance_benchmarks_ovms} -OpenVINO™ Model Server is an open-source, production-grade inference platform that exposes a set of models via a convenient inference API over gRPC or HTTP/REST. It employs the OpenVINO Runtime libraries for from the Intel® Distribution of OpenVINO™ toolkit to extend workloads across Intel® hardware including CPU, GPU and others. +OpenVINO™ Model Server is an open-source, production-grade inference platform that exposes a set of models via a convenient inference API over gRPC or HTTP/REST. It employs the OpenVINO™ Runtime libraries from the Intel® Distribution of OpenVINO™ toolkit to extend workloads across Intel® hardware including CPU, GPU and others. ![OpenVINO™ Model Server](../img/performance_benchmarks_ovms_01.png) @@ -22,30 +22,43 @@ OpenVINO™ Model Server is measured in multiple-client-single-server configurat ![](../img/throughput_ovms_resnet50_int8.png) ## resnet-50-TF (FP32) ![](../img/throughput_ovms_resnet50_fp32_bs_1.png) -## 3D U-Net (FP32) -![](../img/throughput_ovms_3dunet.png) +## googlenet-v4-TF (FP32) +![](../img/throughput_ovms_googlenet4_fp32.png) ## yolo-v3-tf (FP32) ![](../img/throughput_ovms_yolo3_fp32.png) -## yolo-v3-tiny-tf (FP32) -![](../img/throughput_ovms_yolo3tiny_fp32.png) ## yolo-v4-tf (FP32) ![](../img/throughput_ovms_yolo4_fp32.png) -## bert-small-uncased-whole-word-masking-squad-0002 (FP32) -![](../img/throughput_ovms_bertsmall_fp32.png) +## brain-tumor-segmentation-0002 +![](../img/throughput_ovms_braintumorsegmentation.png) +## alexnet +![](../img/throughput_ovms_alexnet.png) +## mobilenet-v3-large-1.0-224-TF (FP32) +![](../img/throughput_ovms_mobilenet3large_fp32.png) +## deeplabv3 (FP32) +![](../img/throughput_ovms_deeplabv3_fp32.png) ## bert-small-uncased-whole-word-masking-squad-int8-0002 (INT8) ![](../img/throughput_ovms_bertsmall_int8.png) -## bert-large-uncased-whole-word-masking-squad-0001 (FP32) -![](../img/throughput_ovms_bertlarge_fp32.png) -## bert-large-uncased-whole-word-masking-squad-int8-0001 (INT8) -![](../img/throughput_ovms_bertlarge_int8.png) -## mobilenet-v3-large-1.0-224-tf (FP32) -![](../img/throughput_ovms_mobilenet3large_fp32.png) -## ssd_mobilenet_v1_coco (FP32) -![](../img/throughput_ovms_ssdmobilenet1_fp32.png) +## bert-small-uncased-whole-word-masking-squad-0002 (FP32) +![](../img/throughput_ovms_bertsmall_fp32.png) +## 3D U-Net (FP32) +![](../img/throughput_ovms_3dunet.png) + +## Image Compression for Improved Throughput +OpenVINO Model Server supports compressed binary input data (images in JPEG and PNG formats) for vision processing models. This +feature improves overall performance on networks where the bandwidth constitutes a system bottleneck. A good example of such use could be wireless 5G communication, a typical 1 Gbit/sec Ethernet network or a usage scenario with many client machines issuing a high rate of inference requests to one single central OpenVINO model server. Generally the performance improvement increases with increased compressibility of the data/image. The decompression on the server-side is performed by the OpenCV library. Please refer to [Supported Image Formats](#supported-image-formats-for-ovms-compression). + +### googlenet-v4-tf (FP32) +![](../img/throughput_ovms_1gbps_googlenet4_fp32.png) + +### resnet-50-tf (INT8) +![](../img/throughput_ovms_1gbps_resnet50_int8.png) + +### resnet-50-tf (FP32) +![](../img/throughput_ovms_1gbps_resnet50_fp32.png) ## Platform Configurations -OpenVINO™ Model Server performance benchmark numbers are based on release 2021.4. Performance results are based on testing as of June 17, 2021 and may not reflect all publicly available updates. +OpenVINO™ Model Server performance benchmark numbers are based on release 2021.4. Performance results are based on testing as of June 17, 2021 and may not reflect all publicly available updates. ### Platform with Intel® Xeon® Platinum 8260M @@ -463,6 +476,22 @@ OpenVINO™ Model Server performance benchmark numbers are based on release 2021 -@endsphinxdirective +@endsphinxdirective -Results may vary. For workloads and configurations visit: [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex) and [Legal Information](../Legal_Information.md). +## Supported Image Formats for OVMS Compression +- Always supported: + + - Portable image format - *.pbm, *.pgm, *.ppm *.pxm, *.pnm + - Radiance HDR - *.hdr, *.pic + - Sun rasters - *.sr, *.ras + - Windows bitmaps - *.bmp, *.dib + +- Limited support (please see OpenCV documentation): + + - Raster and Vector geospatial data supported by GDAL + - JPEG files - *.jpeg, *.jpg, *.jpe + - Portable Network Graphics - *.png + - TIFF files - *.tiff, *.tif + - OpenEXR Image files - *.exr + - JPEG 2000 files - *.jp2 + - WebP - *.webp diff --git a/docs/benchmarks/performance_int8_vs_fp32.md b/docs/benchmarks/performance_int8_vs_fp32.md index e060d3bbe73..68c2a224f19 100644 --- a/docs/benchmarks/performance_int8_vs_fp32.md +++ b/docs/benchmarks/performance_int8_vs_fp32.md @@ -1,211 +1,4 @@ -# INT8 vs FP32 Comparison on Select Networks and Platforms {#openvino_docs_performance_int8_vs_fp32} - -The table below illustrates the speed-up factor for the performance gain by switching from an FP32 representation of an OpenVINO™ supported model to its INT8 representation. - -@sphinxdirective -.. raw:: html - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Intel® Core™
i7-8700T
Intel® Core™
i7-1185G7
Intel® Xeon®
W-1290P
Intel® Xeon®
Platinum
8270
OpenVINO
benchmark
model name
DatasetThroughput speed-up FP16-INT8 vs FP32
bert-large-
uncased-whole-word-
masking-squad-0001
SQuAD1.63.11.52.5
brain-tumor-
segmentation-
0001-MXNET
BraTS1.62.01.81.8
deeplabv3-TFVOC 2012
Segmentation
1.93.02.83.1
densenet-121-TFImageNet1.83.51.93.8
facenet-
20180408-
102900-TF
LFW2.13.62.23.7
faster_rcnn_
resnet50_coco-TF
MS COCO1.93.72.03.4
inception-v3-TFImageNet1.93.82.04.1
mobilenet-
ssd-CF
VOC20121.63.11.93.6
mobilenet-v2-1.0-
224-TF
ImageNet1.52.41.83.9
mobilenet-v2-
pytorch
ImageNet1.72.41.94.0
resnet-18-
pytorch
ImageNet1.93.72.14.2
resnet-50-
pytorch
ImageNet1.93.62.03.9
resnet-50-
TF
ImageNet1.93.62.03.9
squeezenet1.1-
CF
ImageNet1.73.21.83.4
ssd_mobilenet_
v1_coco-tf
VOC20121.83.12.03.6
ssd300-CFMS COCO1.84.21.93.9
ssdlite_
mobilenet_
v2-TF
MS COCO1.72.52.43.5
yolo_v4-TFMS COCO1.93.62.03.4
unet-camvid-onnx-0001MS COCO1.73.91.73.7
ssd-resnet34-
1200-onnx
MS COCO1.74.01.73.4
googlenet-v4-tfImageNet1.93.92.04.1
vgg19-caffeImageNet1.94.72.04.5
yolo-v3-tiny-tfMS COCO1.73.41.93.5
- -@endsphinxdirective +# Model Accuracy for INT8 and FP32 Precision {#openvino_docs_performance_int8_vs_fp32} The following table shows the absolute accuracy drop that is calculated as the difference in accuracy between the FP32 representation of a model and its INT8 representation. @@ -228,45 +21,72 @@ The following table shows the absolute accuracy drop that is calculated as the d Metric Name Absolute Accuracy Drop, % + + bert-base-cased + SST-2 + accuracy + 0.57 + 0.11 + 0.11 + 0.57 + bert-large-uncased-whole-word-masking-squad-0001 - SQuAD + SQUAD F1 - 0.62 - 0.71 - 0.62 - 0.62 + 0.76 + 0.59 + 0.68 + 0.76 brain-tumor-
segmentation-
0001-MXNET BraTS Dice-index@
Mean@
Overall Tumor - 0.08 0.10 0.10 - 0.08 + 0.10 + 0.10 + + + brain-tumor-
segmentation-
0001-ONNX + BraTS + Dice-index@
Mean@
Overall Tumor + 0.11 + 0.12 + 0.12 + 0.11 deeplabv3-TF - VOC 2012
Segmentation + VOC2012 mean_iou - 0.09 - 0.41 - 0.41 - 0.09 + 0.03 + 0.42 + 0.42 + 0.03 densenet-121-TF ImageNet - acc@top-1 - 0.49 + accuracy@top1 + 0.50 0.56 0.56 - 0.49 + 0.50 + + + efficientdet-d0-tf + COCO2017 + coco_precision + 0.55 + 0.81 + 0.81 + 0.55 facenet-
20180408-
102900-TF - LFW + LFW_MTCNN pairwise_
accuracy
_subsets 0.05 0.12 @@ -275,170 +95,365 @@ The following table shows the absolute accuracy drop that is calculated as the d faster_rcnn_
resnet50_coco-TF - MS COCO + COCO2017 coco_
precision - 0.09 - 0.09 - 0.09 - 0.09 + 0.16 + 0.16 + 0.16 + 0.16 - inception-v3-TF + googlenet-v3-tf ImageNet - acc@top-1 + accuracy@top1 + 0.01 + 0.01 + 0.01 + 0.01 + + + googlenet-v4-tf + ImageNet + accuracy@top1 + 0.09 + 0.06 + 0.06 + 0.09 + + + mask_rcnn_resnet50_
atrous_coco-tf + COCO2017 + coco_orig_precision 0.02 - 0.01 - 0.01 + 0.10 + 0.10 0.02 - mobilenet-
ssd-CF + mobilenet-
ssd-caffe VOC2012 mAP - 0.06 - 0.04 - 0.04 - 0.06 + 0.51 + 0.54 + 0.54 + 0.51 mobilenet-v2-1.0-
224-TF ImageNet acc@top-1 - 0.40 - 0.76 - 0.76 - 0.40 + 0.35 + 0.79 + 0.79 + 0.35 - + mobilenet-v2-
PYTORCH ImageNet acc@top-1 - 0.36 - 0.52 - 0.52 - 0.36 + 0.34 + 0.58 + 0.58 + 0.34 resnet-18-
pytorch ImageNet acc@top-1 + 0.29 0.25 0.25 - 0.25 - 0.25 + 0.29 resnet-50-
PYTORCH ImageNet acc@top-1 - 0.19 - 0.21 - 0.21 - 0.19 + 0.24 + 0.20 + 0.20 + 0.24 resnet-50-
TF ImageNet acc@top-1 - 0.11 - 0.11 - 0.11 - 0.11 - - - squeezenet1.1-
CF - ImageNet - acc@top-1 - 0.64 - 0.66 - 0.66 - 0.64 + 0.10 + 0.09 + 0.09 + 0.10 ssd_mobilenet_
v1_coco-tf - VOC2012 - COCO mAp - 0.17 - 2.96 - 2.96 - 0.17 - - - ssd300-CF - MS COCO - COCO mAp - 0.18 + COCO2017 + coco_precision + 0.23 3.06 3.06 - 0.18 + 0.17 ssdlite_
mobilenet_
v2-TF - MS COCO - COCO mAp - 0.11 - 0.43 - 0.43 - 0.11 - - - yolo_v4-TF - MS COCO - COCO mAp - 0.06 - 0.03 - 0.03 - 0.06 - - - unet-camvid-
onnx-0001 - MS COCO - COCO mAp - 0.29 - 0.29 - 0.31 - 0.29 + COCO2017 + coco_precision + 0.09 + 0.44 + 0.44 + 0.09 ssd-resnet34-
1200-onnx - MS COCO - COCO mAp - 0.02 - 0.03 - 0.03 - 0.02 - - - googlenet-v4-tf - ImageNet + COCO2017 COCO mAp + 0.09 0.08 - 0.06 - 0.06 - 0.06 + 0.09 + 0.09 - vgg19-caffe - ImageNet - COCO mAp - 0.02 - 0.04 - 0.04 - 0.02 + unet-camvid-
onnx-0001 + CamVid + mean_iou@mean + 0.33 + 0.33 + 0.33 + 0.33 yolo-v3-tiny-tf - MS COCO + COCO2017 COCO mAp - 0.02 - 0.6 - 0.6 - 0.02 + 0.05 + 0.08 + 0.08 + 0.05 + + + yolo_v4-TF + COCO2017 + COCO mAp + 0.03 + 0.01 + 0.01 + 0.03 @endsphinxdirective -![INT8 vs FP32 Comparison](../img/int8vsfp32.png) +The table below illustrates the speed-up factor for the performance gain by switching from an FP32 representation of an OpenVINO™ supported model to its INT8 representation. -For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md). +@sphinxdirective +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Intel® Core™
i7-8700T
Intel® Core™
i7-1185G7
Intel® Xeon®
W-1290P
Intel® Xeon®
Platinum
8270
OpenVINO
benchmark
model name
DatasetThroughput speed-up FP16-INT8 vs FP32
bert-base-casedSST-21.53.01.42.4
bert-large-uncased-whole-word-masking-squad-0001SQUAD1.73.21.73.3
brain-tumor-
segmentation-
0001-MXNET
BraTS1.62.01.92.1
brain-tumor-
segmentation-
0001-ONNX
BraTS2.63.23.33.0
deeplabv3-TFVOC20121.93.13.53.8
densenet-121-TFImageNet1.73.31.93.7
efficientdet-d0-tfCOCO20171.61.92.52.3
facenet-
20180408-
102900-TF
LFW_MTCNN2.13.52.43.4
faster_rcnn_
resnet50_coco-TF
COCO20171.93.71.93.3
googlenet-v3-tfImageNet1.93.72.04.0
googlenet-v4-tfImageNet1.93.72.04.2
mask_rcnn_resnet50_
atrous_coco-tf
COCO20171.63.61.62.3
mobilenet-
ssd-caffe
VOC20121.63.12.23.8
mobilenet-v2-1.0-
224-TF
ImageNet1.52.42.13.3
mobilenet-v2-
PYTORCH
ImageNet1.52.42.13.4
resnet-18-
pytorch
ImageNet2.04.12.24.1
resnet-50-
PYTORCH
ImageNet1.93.52.14.0
resnet-50-
TF
ImageNet1.93.52.04.0
ssd_mobilenet_
v1_coco-tf
COCO20171.73.12.23.6
ssdlite_
mobilenet_
v2-TF
COCO20171.62.42.73.2
ssd-resnet34-
1200-onnx
COCO20171.74.01.73.2
unet-camvid-
onnx-0001
CamVid1.64.61.66.2
yolo-v3-tiny-tfCOCO20171.83.42.03.5
yolo_v4-TFCOCO20172.33.42.43.1
+ +@endsphinxdirective + +![INT8 vs FP32 Comparison](../img/int8vsfp32.png) \ No newline at end of file diff --git a/docs/benchmarks/performance_ov_vs_tf.md b/docs/benchmarks/performance_ov_vs_tf.md new file mode 100644 index 00000000000..55ba34455c5 --- /dev/null +++ b/docs/benchmarks/performance_ov_vs_tf.md @@ -0,0 +1,102 @@ +# OpenVINO™ and TensorFlow Comparison on Select Networks and Platforms + +This page presents the results of comparing OpenVINO™ and TensorFlow executing benchmarking on the same hardware platforms, and using neural network models based on the same original source models. All models were converted using the processes and conversion tools native to each framework. The hardware platforms represent a broad performance range, covering Intel® Celeron®, Intel® Core™, and Intel® Xeon® Scalable based platforms. (Refer to [System Description](https://docs.openvino.ai/resources/benchmark_files/system_configurations_2022.1.html) for further details). + +## deeplabv3 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## densenet-121 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## facenet-20180408-102900 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## faster_rcnn_resnet50_coco + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## inception-v3 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## inception-v4 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## resnet-50 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## ssd_mobilenet_v1_coco + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## ssd_resnet34_1200x1200 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## yolo-v3-tiny + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective + +## YOLOv4 + +@sphinxdirective +.. raw:: html + +
+ +@endsphinxdirective \ No newline at end of file diff --git a/docs/documentation.md b/docs/documentation.md index 776be27753e..36732f14532 100644 --- a/docs/documentation.md +++ b/docs/documentation.md @@ -17,7 +17,8 @@ :hidden: openvino_docs_OV_Runtime_User_Guide - openvino_docs_install_guides_deployment_manager_tool + openvino_2_0_transition_guide + openvino_deployment_guide openvino_inference_engine_tools_compile_tool_README @@ -42,8 +43,8 @@ workbench_docs_Workbench_DG_Introduction workbench_docs_Workbench_DG_Install workbench_docs_Workbench_DG_Work_with_Models_and_Sample_Datasets - workbench_docs_Workbench_DG_User_Guide - workbench_docs_security_Workbench + Tutorials + User Guide workbench_docs_Workbench_DG_Troubleshooting .. toctree:: @@ -92,7 +93,7 @@ This section provides reference documents that guide you through developing your With the [Model Downloader](@ref omz_tools_downloader) and [Model Optimizer](MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) guides, you will learn to download pre-trained models and convert them for use with the OpenVINO™ toolkit. You can provide your own model or choose a public or Intel model from a broad selection provided in the [Open Model Zoo](model_zoo.md). ## Deploying Inference -The [OpenVINO™ Runtime User Guide](OV_Runtime_UG/openvino_intro.md) explains the process of creating your own application that runs inference with the OpenVINO™ toolkit. The [API Reference](./api_references.html) defines the OpenVINO Runtime API for Python, C++, and C. The OpenVINO Runtime API is what you'll use to create an OpenVINO™ inference application, use enhanced operations sets and other features. After writing your application, you can use the [Deployment Manager](install_guides/deployment-manager-tool.md) for deploying to target devices. +The [OpenVINO™ Runtime User Guide](./OV_Runtime_UG/openvino_intro.md) explains the process of creating your own application that runs inference with the OpenVINO™ toolkit. The [API Reference](./api_references.html) defines the OpenVINO Runtime API for Python, C++, and C. The OpenVINO Runtime API is what you'll use to create an OpenVINO™ inference application, use enhanced operations sets and other features. After writing your application, you can use the [Deployment with OpenVINO](./OV_Runtime_UG/deployment/deployment_intro.md) for deploying to target devices. ## Tuning for Performance The toolkit provides a [Performance Optimization Guide](optimization_guide/dldt_optimization_guide.md) and utilities for squeezing the best performance out of your application, including [Accuracy Checker](@ref omz_tools_accuracy_checker), [Post-Training Optimization Tool](@ref pot_README), and other tools for measuring accuracy, benchmarking performance, and tuning your application. diff --git a/docs/doxyrest/frame/common/doc.lua b/docs/doxyrest/frame/common/doc.lua index 780cb1ac74a..2b60339fbda 100644 --- a/docs/doxyrest/frame/common/doc.lua +++ b/docs/doxyrest/frame/common/doc.lua @@ -464,6 +464,24 @@ function formatDocBlock_sphinxdirective(block, context) return "\n\n" .. code .. "\n\n" end +function formatDocBlock_sphinxtabset(block, context) + return "\n\n" .. ".. raw:: html\n\n
" .. "\n\n" +end + +function formatDocBlock_endsphinxtabset(block, context) + return "\n\n" .. ".. raw:: html\n\n
" .. "\n\n" +end + +function formatDocBlock_sphinxtab(block, context) + local code = getCodeDocBlockContents(block, context) + return "\n\n" .. '.. raw:: html\n\n
' .. "\n\n" +end + +function formatDocBlock_endsphinxtab(block, context) + return "\n\n" .. ".. raw:: html\n\n
" .. "\n\n" +end + + -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - g_blockKindFormatMap = @@ -500,7 +518,11 @@ g_blockKindFormatMap = ["plantuml"] = function(b, c) return formatDocBlock_graph(b, c, "uml") end, ["msc"] = function(b, c) return formatDocBlock_graph(b, c, "msc") end, ["blockquote"] = formatDocBlock_blockquote, - ["sphinxdirective"] = formatDocBlock_sphinxdirective + ["sphinxdirective"] = formatDocBlock_sphinxdirective, + ["sphinxtabset"] = formatDocBlock_sphinxtabset, + ["endsphinxtabset"] = formatDocBlock_endsphinxtabset, + ["sphinxtab"] = formatDocBlock_sphinxtab, + ["endsphinxtab"] = formatDocBlock_endsphinxtab } function getDocBlockContents(block, context) diff --git a/docs/get_started/get_started_demos.md b/docs/get_started/get_started_demos.md index 9a6a34fa256..f93167f61d2 100644 --- a/docs/get_started/get_started_demos.md +++ b/docs/get_started/get_started_demos.md @@ -11,8 +11,6 @@ You will perform the following steps: 4. Run inference on the sample and see the results: - Image Classification Code Sample -If you installed OpenVINO™ via `pip` you can quickly getting started with the product by using these [tutorials](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks). - This guide assumes you completed all installation and configuration steps. If you have not yet installed and configured the toolkit: @sphinxdirective @@ -94,13 +92,13 @@ This guide uses the OpenVINO™ Model Downloader to get pre-trained models. You * List the models available in the downloader ``` sh - info_dumper --print_all + omz_info_dumper --print_all ``` * Use `grep` to list models that have a specific name pattern ``` sh - info_dumper --print_all | grep + omz_info_dumper --print_all | grep ``` * Use Model Downloader to download models. @@ -244,7 +242,7 @@ Create an `` directory to contain the model's Intermediate Representatio @endsphinxdirective -The OpenVINO Runtime can perform inference on different precision formats, such as FP32, FP16, or INT8. To generate an IR with a specific precision, run the Model Optimizer with the appropriate `--data_type` option. +The OpenVINO Runtime can infer models where floating-point weights are [compressed to FP16](../MO_DG/prepare_model/FP16_Compression.md). To generate an IR with a specific precision, run the Model Optimizer with the appropriate `--data_type` option. Generic Model Optimizer script: @@ -370,7 +368,7 @@ To run the **Image Classification** code sample with an input image using the IR @endsphinxdirective -The following commands run the Image Classification Code Sample using the [`dog.bmp`](https://storage.openvinotoolkit.org/data/test_data/images/224x224/dog.bmp) file as an input image, the model in IR format from the `ir` directory, and on different hardware devices: +The following commands run the Image Classification Code Sample using the [dog.bmp](https://storage.openvinotoolkit.org/data/test_data/images/224x224/dog.bmp) file as an input image, the model in IR format from the `ir` directory, and on different hardware devices: **CPU:** @sphinxdirective diff --git a/docs/how_tos/MonoDepth_how_to.md b/docs/how_tos/MonoDepth_how_to.md deleted file mode 100644 index 53d36524d4b..00000000000 --- a/docs/how_tos/MonoDepth_how_to.md +++ /dev/null @@ -1,70 +0,0 @@ -# OpenVINO™ MonoDepth Python Demo - -This tutorial describes the example from the following YouTube* video: -/// - -To learn more about how to run the MonoDepth Python* demo application, refer to the [documentation](https://docs.openvino.ai/latest/omz_demos_monodepth_demo_python.html). - -Tested on OpenVINO™ 2021, Ubuntu 18.04. - -## 1. Set Environment - -Define the OpenVINO™ install directory: -``` -export OV=/opt/intel/openvino_2022/ -``` -Define the working directory. Make sure the directory exist: -``` -export WD=~/MonoDepth_Python/ -``` - -## 2. Install Prerequisits - -Initialize OpenVINO™: -``` -source $OV/setupvars.sh -``` - -Install the Model Optimizer prerequisites: -``` -cd $OV/tools/model_optimizer/install_prerequisites/ -sudo ./install_prerequisites.sh -``` - -Install the Model Downloader prerequisites: - -``` -cd $OV/extras/open_model_zoo/tools/downloader/ -python3 -mpip install --user -r ./requirements.in -sudo python3 -mpip install --user -r ./requirements-pytorch.in -sudo python3 -mpip install --user -r ./requirements-caffe2.in -``` - -## 3. Download Models - -Download all models from the Demo Models list: -``` -python3 $OV/extras/open_model_zoo/tools/downloader/downloader.py --list $OV/deployment_tools/inference_engine/demos/python_demos/monodepth_demo/models.lst -o $WD -``` - -## 4. Convert Models to Intermediate Representation (IR) - -Use the convert script to convert the models to ONNX*, and then to IR format: -``` -cd $WD -python3 $OV/extras/open_model_zoo/tools/downloader/converter.py --list $OV/deployment_tools/inference_engine/demos/python_demos/monodepth_demo/models.lst -``` - -## 5. Run Demo - -Install required Python modules, for example, kiwisolver or cycler, if you get missing module indication. - -Use your input image: -``` -python3 $OV/inference_engine/demos/python_demos/monodepth_demo/monodepth_demo.py -m $WD/public/midasnet/FP32/midasnet.xml -i input-image.jpg -``` -Check the result depth image: -``` -eog disp.png & -``` -You can also try to use another model. Note that the algorithm is the same, but the depth map will be different. diff --git a/docs/how_tos/POT_how_to_example.md b/docs/how_tos/POT_how_to_example.md deleted file mode 100644 index b6a948fd2c6..00000000000 --- a/docs/how_tos/POT_how_to_example.md +++ /dev/null @@ -1,163 +0,0 @@ -# Post-Training Optimization Tool - A real example - -This tutorial describes the example from the following YouTube* video: -https://www.youtube.com/watch?v=cGQesbWuRhk&t=49s - - -Watch this video to learn the basics of Post-training Optimization Tool (POT): - https://www.youtube.com/watch?v=SvkI25Ca_SQ - -The example has been tested on OpenVINO™ 2021 on Ubuntu 18.04 Operating System. - - -## 1. Installation - -Install OpenVINO™ toolkit and Model Optimizer, Accuracy Checker, and Post-training Optimization Tool components. - -1. Define the OpenVINO™ install directory: -``` -export OV=/opt/intel/openvino_2022/ -``` -2. Install the Model Optimizer prerequisites: -``` -cd $OV/tools/model_optimizer/install_prerequisites -sudo ./install_prerequisites.sh -``` -3. Install the Accuracy Checker requirements: -``` -cd $OV/tools/accuracy_checker -sudo python3 setup.py install -``` -4. Install the Post-training Optimization Tool: -``` -cd $OV/tools/post_training_optimization_toolkit -sudo python3 setup.py install -``` - -## 2. Download Model - -This tutorial describes MobileNet v2 model from PyTorch* framework. You can choose any other model. - -Download the MobileNet v2 PyTorch* model using the commands below: -``` -mkdir ~/POT -``` -``` -cd ~/POT -``` -``` -python3 $OV/extras/open_model_zoo/tools/downloader/downloader.py --name mobilenet-v2-pytorch -o . -``` - -## 3. Prepare Model for Inference - -Install requirements for PyTorch using the commands below: -``` -cd $OV/extras/open_model_zoo/tools/downloader -``` -``` -python3 -mpip install --user -r ./requirements-pytorch.in -``` - -You can find the parameters for Mobilnet v2 conversion here: -``` -vi /opt/intel/openvino_2022/extras/open_model_zoo/models/public/mobilenet-v2-pytorch/model.yml -``` - -Convert the model from PyTorch to ONNX*: -``` -cd ~/POT/public/mobilenet-v2-pytorch -python3 /opt/intel/openvino_2022/extras/open_model_zoo/tools/downloader/pytorch_to_onnx.py \ - --model-name=MobileNetV2 \ - --model-path=. \ - --weights=mobilenet-v2.pth \ - --import-module=MobileNetV2 \ - --input-shape=1,3,224,224 / - --output-file=mobilenet-v2.onnx \ - --input-names=data \ - --output-names=prob - -``` -Convert the model from ONNX to the OpenVINO™ Intermediate Representation (IR): -``` -mo \ - -m mobilenet-v2.onnx \ - --input=data \ - --mean_values=data[123.675,116.28,103.53] \ - --scale_values=data[58.624,57.12,57.375] \ - --reverse_input_channels \ - --output=prob -``` - -Move the IR files to my directory: - -``` -mv mobilenet-v2.xml ~/POT/model.xml -mv mobilenet-v2.bin ~/POT/model.bin -``` - -## 4. Edit Configurations - -Edit the configuration files: -``` -sudo vi $OV/tools/accuracy_checker/dataset_definitions.yml -(edit imagenet_1000_classes) -``` -``` -export DEFINITIONS_FILE=/opt/intel/openvino_2022/tools/accuracy_checker/dataset_definitions.yml -``` - -Copy the JSON file to my directory and edit: - -``` -cp $OV/tools/post_training_optimization_toolkit/configs/examples/quantization/classification/mobilenetV2_pytorch_int8.json ~/POT -``` -``` -vi mobilenetV2_pytorch_int8.json -``` - -Copy the YML file to my directory and edit: - -``` -cp /opt/intel/openvino_2022/tools/accuracy_checker/configs/mobilenet-v2.yml ~/POT -``` -``` -vi mobilenet-v2.yml -``` - -## 5. Run Baseline - -Run Accuracy Checker on the original model: - -``` -accuracy_check -c mobilenet-v2.yml -``` - -Install the Benchmark Tool first. To learn more about Benchmark Tool refer to [Benchmark C++ Tool](https://docs.openvino.ai/latest/openvino_inference_engine_samples_benchmark_app_README.html) - or [Benchmark Python* Tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html). - -Run performance benchmark: -``` -~/inference_engine_cpp_samples_build/intel64/Release/benchmark_app -m ~/POT/model.xml -``` - -## 6. Run Integer Calibration - -You can edit the JSON file to switch between two modes of calibration: - - - AccuracyAwareQuantization - - DefaultQuantization - - -``` -pot --config /home/~/POT/mobilenetV2_pytorch_int8.json \ - --output-dir /home/~/POT/ \ - --evaluate \ - --log-level INFO -``` - -Run the Benchmark Tool for the calibrated model. Make sure the name contains `DafultQuantization/.../optimized/...` - -``` -~/inference_engine_cpp_samples_build/intel64/Release/benchmark_app -m mobilenetv2_DefaultQuantization/2021-03-07/optimized/mobilenetv2.xml -``` diff --git a/docs/how_tos/how-to-links.md b/docs/how_tos/how-to-links.md deleted file mode 100644 index e808efa1ef9..00000000000 --- a/docs/how_tos/how-to-links.md +++ /dev/null @@ -1,86 +0,0 @@ -# "Hot Topic" How-To Links - -## Blogs & Articles - -* [Maximize CPU Inference Performance with Improved Threads and Memory Management in Intel® Distribution of OpenVINO™ toolkit](https://www.edge-ai-vision.com/2020/03/maximize-cpu-inference-performance-with-improved-threads-and-memory-management-in-intel-distribution-of-openvino-toolkit/) -* [Simplifying Cloud to Edge AI Deployments with the Intel® Distribution of OpenVINO™ Toolkit, Microsoft Azure, and ONNX Runtime](https://www.intel.ai/microsoft-azure-openvino-toolkit/#gs.11oa13) -* [Streamline your Intel® Distribution of OpenVINO™ Toolkit development with Deep Learning Workbench](https://www.intel.ai/openvino-dlworkbench/#gs.wwj3bq) -* [Enhanced Low-Precision Pipeline to Accelerate Inference with OpenVINO Toolkit](https://www.intel.ai/open-vino-low-precision-pipeline/) -* [Improving DL Performance Using Binary Convolution Support in OpenVINO Toolkit](https://www.intel.ai/binary-convolution-openvino) -* [Automatic Multi-Device Inference with the Intel® Distribution of OpenVINO™ toolkit](https://www.intel.ai/automatic-multi-device-inference-with-intel-distribution-of-openvino-toolkit/) -* [CPU Inference Performance Boost with “Throughput” Mode in the Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.ai/cpu-inference-performance-boost-openvino/) -* [Introducing int8 quantization for fast CPU inference using OpenVINO](https://www.intel.ai/introducing-int8-quantization-for-fast-cpu-inference-using-openvino/) -* [Accelerate Vision-based AI with Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.ai/accelerate-vision-based-ai-with-intel-distribution-of-openvino-toolkit/) - -## Custom Operations Guide -To learn about what is *custom operation* and how to work with them in the Deep Learning Deployment Toolkit, see the [Custom Operations Guide](../Extensibility_UG/Intro.md). - -## Introducing OpenVINO™ and Computer Vision | IoT Developer Show Season 2 | Intel Software - -@sphinxdirective -.. raw:: html - - - -@endsphinxdirective - -## OpenVINO™ Toolkit and Two Hardware Development Kits | IoT Developer Show Season 2 | Intel Software - -@sphinxdirective -.. raw:: html - - - -@endsphinxdirective - -## Intel Demonstration of High Performance Vision Deployment - The OpenVINO Toolkit in Action - -@sphinxdirective -.. raw:: html - - - -@endsphinxdirective - -## Computer Vision at the Edge with OpenVINO by Krishnakumar Shetti at ODSC_India - -@sphinxdirective -..raw:: html - - - -@endsphinxdirective - -## Model optimizer concept - -@sphinxdirective -.. raw:: html - - - -@endsphinxdirective - -## Computer Vision with Intel - -@sphinxdirective -.. raw:: html - - - -@endsphinxdirective - -## Case Studies - -|| Link to tutorial | -|:---:|:---:| -|![dl_healthcare]"" | [Deep Learning for Healthcare Imaging](https://ai.intel.com/wp-content/uploads/sites/53/2018/03/IntelSWDevTools_OptimizeDLforHealthcare.pdf) | -|![performance-boost-dl]"" | [Performance Boost for a Deep Learning Algorithm](https://software.intel.com/en-us/download/geovision-case-study) | -|![digital-security-surveillance]"" | [Digital Security & Surveillance Solutions](https://software.intel.com/en-us/download/agent-vi-case-study) | -|![robotics-with-AI]"" | [Robotics with AI for Industry 4.0](https://software.intel.com/en-us/download/intel-vision-accelerator-design-products-intel-nexcom-solution-brief) | -|![people-counter-syestem]"" | [People Counter Reference Implementation](https://software.intel.com/en-us/articles/iot-reference-implementation-people-counter) | - -[dl_healthcare]: ../img/DL-for-Healthcare-Imaging.jpg -[performance-boost-dl]: ../img/performance-boost-DL-algorithm.jpg -[digital-security-surveillance]: ../img/digital-security-surveillance.jpg -[robotics-with-AI]: ../img/robotics-with-AI.jpg -[people-counter-syestem]: ../img/people-counter-syestem.jpg diff --git a/docs/img/configuration_dialog.png b/docs/img/configuration_dialog.png index ffd02aff241..e8f3995d432 100644 --- a/docs/img/configuration_dialog.png +++ b/docs/img/configuration_dialog.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:a9a30b2cc5ca8ebe2da122247e292a9b415beb7bb6fbfd88f6843061d81a9e83 -size 29381 +oid sha256:2d6db31aee32fc54a0c58fff77aca191070da87a85148998ed837e81cd3b708e +size 42540 diff --git a/docs/img/deploy_encrypted_model.png b/docs/img/deploy_encrypted_model.png index 9338c59dcf2..419e0a22fb6 100644 --- a/docs/img/deploy_encrypted_model.png +++ b/docs/img/deploy_encrypted_model.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:25ed719bdd525dc0b606ef17a3fec5303ea032dfe6b2d167e1b19b6100b6fb37 -size 16516 +oid sha256:9ba2a85ae6c93405f9b6e11c3c41ab20ffe13e8ae64403fa9802af6d96b314b1 +size 35008 diff --git a/docs/img/deploy_encrypted_model.png.vsdx b/docs/img/deploy_encrypted_model.png.vsdx new file mode 100644 index 00000000000..07d756a23ec --- /dev/null +++ b/docs/img/deploy_encrypted_model.png.vsdx @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfa6b834abf8d7add9877791f5340126c77ba6df6f7b026ecd96576af2e16816 +size 53871 diff --git a/docs/img/deployment_full.png b/docs/img/deployment_full.png new file mode 100644 index 00000000000..e55d453c572 --- /dev/null +++ b/docs/img/deployment_full.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff1c40e8c72d33a3cdeea09f771f4a799990e9e96d0d75257b21c6e9c447c7be +size 62393 diff --git a/docs/img/deployment_simplified.png b/docs/img/deployment_simplified.png new file mode 100644 index 00000000000..b73d3a71491 --- /dev/null +++ b/docs/img/deployment_simplified.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08c2d103bec9bac58fc9ccb6801e950f233b93c6b034e92bc354b6fb2af86d5f +size 25758 diff --git a/docs/img/int8vsfp32.png b/docs/img/int8vsfp32.png index cc361c56905..fed768e8897 100644 --- a/docs/img/int8vsfp32.png +++ b/docs/img/int8vsfp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:8cbe1a1c1dc477edc6909a011c1467b375f4f2ba868007befa4b2eccbaa2f2b1 -size 28229 +oid sha256:b9f29fd468777e09c1e02bdf23996c5a05c7aa14ccee73cb6c48e9afae39af16 +size 30476 diff --git a/docs/img/selection_dialog.png b/docs/img/selection_dialog.png index fa9e97725d3..86570aae170 100644 --- a/docs/img/selection_dialog.png +++ b/docs/img/selection_dialog.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:aee73cd3275e6aaeb13a3df843ce23889cadc6e7e4d031349de7c4dfe851c2f5 -size 25629 +oid sha256:0812f173a2fca3a3fce86d5b1df36e4d956c35bb09fcadbab0f26f17ccc97b5e +size 43417 diff --git a/docs/img/throughput_ovms_1gbps_facedetection0200_fp32.png b/docs/img/throughput_ovms_1gbps_facedetection0200_fp32.png new file mode 100644 index 00000000000..532469b3173 --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_facedetection0200_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f167fd98d3dd4ff262ffc2ab97d7b1d872365e1770581624442c5dd6c5a01331 +size 33415 diff --git a/docs/img/throughput_ovms_1gbps_facedetection0200_int8.png b/docs/img/throughput_ovms_1gbps_facedetection0200_int8.png new file mode 100644 index 00000000000..f8b6f45ab79 --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_facedetection0200_int8.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f989adc2f4c855187bc7f739217f7505ae777ae5d740ef919f86ea800f03ef7 +size 32752 diff --git a/docs/img/throughput_ovms_1gbps_googlenet4_fp32.png b/docs/img/throughput_ovms_1gbps_googlenet4_fp32.png new file mode 100644 index 00000000000..d46ea2239ec --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_googlenet4_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63f12ac11942b361f13784200ef5db387273d10e354baa11172397d9e620ce14 +size 32874 diff --git a/docs/img/throughput_ovms_1gbps_mobilnet3small_fp32.png b/docs/img/throughput_ovms_1gbps_mobilnet3small_fp32.png new file mode 100644 index 00000000000..999426b6066 --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_mobilnet3small_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9118eefc96b0976fb808a5aef74200b74914f82c08664f1f830f464cb89462b7 +size 32466 diff --git a/docs/img/throughput_ovms_1gbps_resnet50_fp32.png b/docs/img/throughput_ovms_1gbps_resnet50_fp32.png new file mode 100644 index 00000000000..cb9bdc84169 --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_resnet50_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eaf6ef20963b4b4db19fa073db76fb96e870f2a51adea430e4a1fb1c98e8acdb +size 32146 diff --git a/docs/img/throughput_ovms_1gbps_resnet50_int8.png b/docs/img/throughput_ovms_1gbps_resnet50_int8.png new file mode 100644 index 00000000000..650cb366a76 --- /dev/null +++ b/docs/img/throughput_ovms_1gbps_resnet50_int8.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:389b21ea0516150a9802085b198b84f4d6d3db448608be1b4890f732af3636d3 +size 30504 diff --git a/docs/img/throughput_ovms_3dunet.png b/docs/img/throughput_ovms_3dunet.png index cebe9eb4c68..4503ea54b50 100644 --- a/docs/img/throughput_ovms_3dunet.png +++ b/docs/img/throughput_ovms_3dunet.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:d4cbf542d393f920c5731ce973f09836e08aaa35987ef0a19355e3e895179936 -size 17981 +oid sha256:7f3a79cef5d6c50567e6bf4bef0d071fd27fa3c9750d3916a36294a8a7779569 +size 25153 diff --git a/docs/img/throughput_ovms_alexnet.png b/docs/img/throughput_ovms_alexnet.png new file mode 100644 index 00000000000..687ff857bf1 --- /dev/null +++ b/docs/img/throughput_ovms_alexnet.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b197bb16147e11fbf0e2b0f68ba10725618e9d551990d03195b6fd40a69c944 +size 23627 diff --git a/docs/img/throughput_ovms_bertsmall_fp32.png b/docs/img/throughput_ovms_bertsmall_fp32.png index 7e1a1785d55..5ec96dc68ac 100644 --- a/docs/img/throughput_ovms_bertsmall_fp32.png +++ b/docs/img/throughput_ovms_bertsmall_fp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5663cfab7a1611e921fc0b775d946009d6f7a7019e5e9dc6ebe96ccb6c6f1d7f -size 20145 +oid sha256:ba8a65e7bb68b394454353659900f963456115c2307a4c023a702de7218a3431 +size 31563 diff --git a/docs/img/throughput_ovms_bertsmall_int8.png b/docs/img/throughput_ovms_bertsmall_int8.png index d5edcccac5f..c51fd436d65 100644 --- a/docs/img/throughput_ovms_bertsmall_int8.png +++ b/docs/img/throughput_ovms_bertsmall_int8.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:aad18293f64089992862e6a17b5271cc982da89b6b7493516a59252368945c87 -size 20998 +oid sha256:918a16791237ca578839102f3f7b2434a66e0ed8ce6a6f26c9de69a0944c6efc +size 25087 diff --git a/docs/img/throughput_ovms_braintumorsegmentation.png b/docs/img/throughput_ovms_braintumorsegmentation.png new file mode 100644 index 00000000000..5fce4008354 --- /dev/null +++ b/docs/img/throughput_ovms_braintumorsegmentation.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f675d5e45ad3566b73c73a20af5d31db2c67a7d0a88d462907bd30ca895d6a9 +size 27407 diff --git a/docs/img/throughput_ovms_deeplabv3_fp32.png b/docs/img/throughput_ovms_deeplabv3_fp32.png new file mode 100644 index 00000000000..1acaf8af6de --- /dev/null +++ b/docs/img/throughput_ovms_deeplabv3_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc9a1631c4f233afde2c15dde7ce5f9aeffaa4e3f6ec2db5c802dc6211b76300 +size 26651 diff --git a/docs/img/throughput_ovms_facedetection0200_int8.png b/docs/img/throughput_ovms_facedetection0200_int8.png new file mode 100644 index 00000000000..fe2d0dcca57 --- /dev/null +++ b/docs/img/throughput_ovms_facedetection0200_int8.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80bdac5e802b26913b691f593d54d27e94f19138be24fe920ede2bd46b98434e +size 22778 diff --git a/docs/img/throughput_ovms_googlenet4_fp32.png b/docs/img/throughput_ovms_googlenet4_fp32.png new file mode 100644 index 00000000000..927e4b59444 --- /dev/null +++ b/docs/img/throughput_ovms_googlenet4_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ca07985e1a3e73773a669e50015d18b7d0648915d053dc233fc5c0318bacc18 +size 23027 diff --git a/docs/img/throughput_ovms_mobilenet3large_fp32.png b/docs/img/throughput_ovms_mobilenet3large_fp32.png index bae4a1b9a7c..9c253b22c2e 100644 --- a/docs/img/throughput_ovms_mobilenet3large_fp32.png +++ b/docs/img/throughput_ovms_mobilenet3large_fp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:70daf9e0016e56d8c7bb2f0efe2ac592434962bb8bea95f9120acd7b14d8b5b0 -size 21763 +oid sha256:1ec1a855a90b370a97ee4eb7f4968930bdae676d1b2b5b0960e21e4f65ab9388 +size 29372 diff --git a/docs/img/throughput_ovms_resnet50_fp32.png b/docs/img/throughput_ovms_resnet50_fp32.png index 324acaf22ec..d26281deb7f 100644 --- a/docs/img/throughput_ovms_resnet50_fp32.png +++ b/docs/img/throughput_ovms_resnet50_fp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:3ad19ace847da73176f20f21052f9dd23fd65779f4e1027b2debdaf8fc772c00 -size 18735 +oid sha256:0c78ba46253ad1120c238433ead210d1c343168611178f63bd3395b06e89bb9c +size 23012 diff --git a/docs/img/throughput_ovms_resnet50_int8.png b/docs/img/throughput_ovms_resnet50_int8.png index 8601a4c244e..00f013e1550 100644 --- a/docs/img/throughput_ovms_resnet50_int8.png +++ b/docs/img/throughput_ovms_resnet50_int8.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:5d96e146a1b7d4e48b683de3ed7665c41244ec68cdad94eb79ac497948af9b08 -size 21255 +oid sha256:f10b3e9ca512d99847f962622174b092dd9c67b0531b2c4619f409d04e8bae60 +size 22626 diff --git a/docs/img/throughput_ovms_unetcamvidonnx0001_fp32.png b/docs/img/throughput_ovms_unetcamvidonnx0001_fp32.png new file mode 100644 index 00000000000..10c4d2e3a0a --- /dev/null +++ b/docs/img/throughput_ovms_unetcamvidonnx0001_fp32.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:928e9ae9c4ae890905211f821c719e7cad9f0cca63e6d0346b004f706bfb2a86 +size 22141 diff --git a/docs/img/throughput_ovms_unetcamvidonnx0001_int8.png b/docs/img/throughput_ovms_unetcamvidonnx0001_int8.png new file mode 100644 index 00000000000..d73571b5852 --- /dev/null +++ b/docs/img/throughput_ovms_unetcamvidonnx0001_int8.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a6304535c70dbc3d6ad627c0a52159307427084098a004f6d81e739f1bf72563 +size 22652 diff --git a/docs/img/throughput_ovms_yolo3_fp32.png b/docs/img/throughput_ovms_yolo3_fp32.png index 50090422314..1533017c898 100644 --- a/docs/img/throughput_ovms_yolo3_fp32.png +++ b/docs/img/throughput_ovms_yolo3_fp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:b16674fabd80d73e455c276ef262f3d0a1cf6b00152340dd4e2645330f358432 -size 19341 +oid sha256:c28fa3a68be41373bd86dfa6b849ca023c617c73af804f4f233f2f74b25bc12b +size 22361 diff --git a/docs/img/throughput_ovms_yolo4_fp32.png b/docs/img/throughput_ovms_yolo4_fp32.png index c1bb655a5cd..018ddfd4cae 100644 --- a/docs/img/throughput_ovms_yolo4_fp32.png +++ b/docs/img/throughput_ovms_yolo4_fp32.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:f472d1fa6058d7ce988e9a2da8b5c6c106d8aa7e90bf2d383d2eaf685a725ab4 -size 19107 +oid sha256:ffd27dd41cf945563aaa8067ed50274070f83df41ff0dc0786a9d097ae12617b +size 22363 diff --git a/docs/img/vtune_async.png b/docs/img/vtune_async.png index 044d7a606e0..b503e607549 100644 --- a/docs/img/vtune_async.png +++ b/docs/img/vtune_async.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:c47ede993681ba3f0a3e3f4274369ee1854365b1bcd1b5cb0f649a781fdf51bd -size 6215 +oid sha256:1af95a7e8f12f3e663530e6d7eb6b48633f759aa7d83459633f36655a67047e8 +size 174761 diff --git a/docs/img/vtune_regular.png b/docs/img/vtune_regular.png index 9d01e7627ad..b4e2b5547a3 100644 --- a/docs/img/vtune_regular.png +++ b/docs/img/vtune_regular.png @@ -1,3 +1,3 @@ version https://git-lfs.github.com/spec/v1 -oid sha256:9a4fce51076df19fbca04a36d6886765771f8ffc174bebbd751bfc77d91ab1f2 -size 7081 +oid sha256:63b9d3bbea1efba0d30c465dcaa3552a61c5c4317d073f8993ec08f3f9db051b +size 132583 diff --git a/docs/index.rst b/docs/index.rst index 5aa299039ae..2951262f181 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -29,7 +29,7 @@ OpenVINO™ Documentation alt="OpenVINO allows to process models built with Caffe, Keras, mxnet, TensorFlow, ONNX, and PyTorch. They can be easily optimized and deployed on devices running Windows, Linux, or MacOS." />

Check the full range of supported hardware in the - Supported Devices page and see how it stacks up in our + Supported Devices page and see how it stacks up in our Performance Benchmarks page.
Supports deployment on Windows, Linux, and macOS.

diff --git a/docs/install_guides/installing-openvino-apt.md b/docs/install_guides/installing-openvino-apt.md index d97d8a211c7..25a7aa209d8 100644 --- a/docs/install_guides/installing-openvino-apt.md +++ b/docs/install_guides/installing-openvino-apt.md @@ -38,21 +38,21 @@ The complete list of supported hardware is available in the [Release Notes](http > **NOTE**: You might need to install GnuPG: `sudo apt-get install gnupg` 2. Add the repository via the following command: - @sphinxdirective +@sphinxdirective - .. tab:: Ubuntu 18 +.. tab:: Ubuntu 18 - .. code-block:: sh + .. code-block:: sh - echo "deb https://apt.repos.intel.com/openvino/2022 bionic main" | sudo tee /etc/apt/sources.list.d/intel-openvino-2022.list + echo "deb https://apt.repos.intel.com/openvino/2022 bionic main" | sudo tee /etc/apt/sources.list.d/intel-openvino-2022.list - .. tab:: Ubuntu 20 +.. tab:: Ubuntu 20 - .. code-block:: sh + .. code-block:: sh - echo "deb https://apt.repos.intel.com/openvino/2022 focal main" | sudo tee /etc/apt/sources.list.d/intel-openvino-2022.list + echo "deb https://apt.repos.intel.com/openvino/2022 focal main" | sudo tee /etc/apt/sources.list.d/intel-openvino-2022.list - @endsphinxdirective +@endsphinxdirective 3. Update the list of packages via the update command: diff --git a/docs/install_guides/installing-openvino-linux.md b/docs/install_guides/installing-openvino-linux.md index 90574b4763f..b0abda42a4e 100644 --- a/docs/install_guides/installing-openvino-linux.md +++ b/docs/install_guides/installing-openvino-linux.md @@ -225,7 +225,7 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli .. dropdown:: Additional Resources * Converting models for use with OpenVINO™: :ref:`Model Optimizer Developer Guide ` - * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` + * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` * Sample applications: :ref:`OpenVINO™ Toolkit Samples Overview ` * Pre-trained deep learning models: :ref:`Overview of OpenVINO™ Toolkit Pre-Trained Models ` * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_ diff --git a/docs/install_guides/installing-openvino-macos.md b/docs/install_guides/installing-openvino-macos.md index 383e56524b3..81e4f01c25e 100644 --- a/docs/install_guides/installing-openvino-macos.md +++ b/docs/install_guides/installing-openvino-macos.md @@ -143,7 +143,7 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli .. dropdown:: Additional Resources * Converting models for use with OpenVINO™: :ref:`Model Optimizer Developer Guide ` - * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` + * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` * Sample applications: :ref:`OpenVINO™ Toolkit Samples Overview ` * Pre-trained deep learning models: :ref:`Overview of OpenVINO™ Toolkit Pre-Trained Models ` * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_ diff --git a/docs/install_guides/installing-openvino-windows.md b/docs/install_guides/installing-openvino-windows.md index 0aa6244de66..bfa8f374a04 100644 --- a/docs/install_guides/installing-openvino-windows.md +++ b/docs/install_guides/installing-openvino-windows.md @@ -83,7 +83,7 @@ To check **Release Notes** please visit: [Release Notes](https://software.intel. The core components are now installed. Continue to the next section to configure environment. -## Step 2: Configure the Environment +## Step 2: Configure the Environment > **NOTE**: If you installed the Intel® Distribution of OpenVINO™ to a non-default install directory, replace `C:\Program Files (x86)\Intel` with that directory in this guide's instructions. @@ -99,7 +99,7 @@ You must update several environment variables before you can compile and run Ope The environment variables are set. Next, you can download some additional tools. -## Step 3 (Optional): Download additional components +## Step 3 (Optional): Download Additional Components > **NOTE**: Since the OpenVINO™ 2022.1 release, the following development tools: Model Optimizer, Post-Training Optimization Tool, Model Downloader and other Open Model Zoo tools, Accuracy Checker, and Annotation Converter are not part of the installer. The OpenVINO™ Development Tools can only be installed via PyPI now. See [Install OpenVINO™ Development Tools](installing-model-dev-tools.md) for detailed steps. @@ -180,7 +180,7 @@ To uninstall the toolkit, follow the steps on the [Uninstalling page](uninstalli .. dropdown:: Additional Resources * Converting models for use with OpenVINO™: :ref:`Model Optimizer Developer Guide ` - * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` + * Writing your own OpenVINO™ applications: :ref:`OpenVINO™ Runtime User Guide ` * Sample applications: :ref:`OpenVINO™ Toolkit Samples Overview ` * Pre-trained deep learning models: :ref:`Overview of OpenVINO™ Toolkit Pre-Trained Models ` * IoT libraries and code samples in the GitHUB repository: `Intel® IoT Developer Kit`_ diff --git a/docs/install_guides/movidius-setup-guide.md b/docs/install_guides/movidius-setup-guide.md index 993d52dae57..81940bb34a6 100644 --- a/docs/install_guides/movidius-setup-guide.md +++ b/docs/install_guides/movidius-setup-guide.md @@ -127,7 +127,6 @@ This setting reports the total FPS for the dispatching hddl_service (which will ## Additional Resources - [Intel Distribution of OpenVINO Toolkit home page](https://software.intel.com/en-us/openvino-toolkit) -- [Intel Distribution of OpenVINO Toolkit documentation](https://docs.openvino.ai) - [Troubleshooting Guide](troubleshooting.md) - [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs HAL Configuration Guide](/downloads/595850_Intel_Vision_Accelerator_Design_with_Intel_Movidius_VPUs-HAL Configuration Guide_rev1.3.pdf) - [Intel® Vision Accelerator Design with Intel® Movidius™ VPUs Workload Distribution User Guide](/downloads/613514_Intel Vision Accelerator Design with Intel Movidius VPUs Workload Distribution_UG_r0.9.pdf) diff --git a/docs/install_guides/pypi-openvino-dev.md b/docs/install_guides/pypi-openvino-dev.md index 977e21048bc..251edd42738 100644 --- a/docs/install_guides/pypi-openvino-dev.md +++ b/docs/install_guides/pypi-openvino-dev.md @@ -8,11 +8,11 @@ OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applicatio | Component | Console Script | Description | |------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) | `mo` |**Model Optimizer** imports, converts, and optimizes models that were trained in popular frameworks to a format usable by OpenVINO components. 
Supported frameworks include Caffe\*, TensorFlow\*, MXNet\*, PaddlePaddle\*, and ONNX\*. | -| [Benchmark Tool](https://docs.openvino.ai/latest/openvino_inference_engine_tools_benchmark_tool_README.html)| `benchmark_app` | **Benchmark Application** allows you to estimate deep learning inference performance on supported devices for synchronous and asynchronous modes. | -| [Accuracy Checker](https://docs.openvino.ai/latest/omz_tools_accuracy_checker.html) and
[Annotation Converter](https://docs.openvino.ai/latest/omz_tools_accuracy_checker_annotation_converters.html) | `accuracy_check`
`convert_annotation` |**Accuracy Checker** is a deep learning accuracy validation tool that allows you to collect accuracy metrics against popular datasets. The main advantages of the tool are the flexibility of configuration and a set of supported datasets, preprocessing, postprocessing, and metrics.
**Annotation Converter** is a utility that prepares datasets for evaluation with Accuracy Checker. | -| [Post-Training Optimization Tool](https://docs.openvino.ai/latest/pot_README.html)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. Optimizations are also available through the [API](https://docs.openvino.ai/latest/pot_compression_api_README.html). | -| [Model Downloader and other Open Model Zoo tools](https://docs.openvino.ai/latest/omz_tools_downloader.html)| `omz_downloader`
`omz_converter`
`omz_quantizer`
`omz_info_dumper`| **Model Downloader** is a tool for getting access to the collection of high-quality and extremely fast pre-trained deep learning [public](https://docs.openvino.ai/latest/omz_models_group_public.html) and [Intel](https://docs.openvino.ai/latest/omz_models_group_intel.html)-trained models. These free pre-trained models can be used to speed up the development and production deployment process without training your own models. The tool downloads model files from online sources and, if necessary, patches them to make them more usable with Model Optimizer. A number of additional tools are also provided to automate the process of working with downloaded models:
**Model Converter** is a tool for converting Open Model Zoo models that are stored in an original deep learning framework format into the OpenVINO Intermediate Representation (IR) using Model Optimizer.
**Model Quantizer** is a tool for automatic quantization of full-precision models in the IR format into low-precision versions using the Post-Training Optimization Tool.
**Model Information Dumper** is a helper utility for dumping information about the models to a stable, machine-readable format. +| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | `mo` |**Model Optimizer** imports, converts, and optimizes models that were trained in popular frameworks to a format usable by OpenVINO components. 
Supported frameworks include Caffe\*, TensorFlow\*, MXNet\*, PaddlePaddle\*, and ONNX\*. | +| [Benchmark Tool](../../tools/benchmark_tool/README.md)| `benchmark_app` | **Benchmark Application** allows you to estimate deep learning inference performance on supported devices for synchronous and asynchronous modes. | +| [Accuracy Checker](@ref omz_tools_accuracy_checker) and
[Annotation Converter](@ref omz_tools_accuracy_checker_annotation_converters) | `accuracy_check`
`convert_annotation` |**Accuracy Checker** is a deep learning accuracy validation tool that allows you to collect accuracy metrics against popular datasets. The main advantages of the tool are the flexibility of configuration and a set of supported datasets, preprocessing, postprocessing, and metrics.
**Annotation Converter** is a utility that prepares datasets for evaluation with Accuracy Checker. | +| [Post-Training Optimization Tool](../../tools/pot/README.md)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. Optimizations are also available through the [API](../../tools/pot/openvino/tools/pot/api/README.md). | +| [Model Downloader and other Open Model Zoo tools](@ref omz_tools_downloader)| `omz_downloader`
`omz_converter`
`omz_quantizer`
`omz_info_dumper`| **Model Downloader** is a tool for getting access to the collection of high-quality and extremely fast pre-trained deep learning [public](@ref omz_models_group_public) and [Intel](@ref omz_models_group_intel)-trained models. These free pre-trained models can be used to speed up the development and production deployment process without training your own models. The tool downloads model files from online sources and, if necessary, patches them to make them more usable with Model Optimizer. A number of additional tools are also provided to automate the process of working with downloaded models:
**Model Converter** is a tool for converting Open Model Zoo models that are stored in an original deep learning framework format into the OpenVINO Intermediate Representation (IR) using Model Optimizer.
**Model Quantizer** is a tool for automatic quantization of full-precision models in the IR format into low-precision versions using the Post-Training Optimization Tool.
**Model Information Dumper** is a helper utility for dumping information about the models to a stable, machine-readable format. The developer package also installs the OpenVINO™ Runtime package as a dependency. @@ -146,7 +146,6 @@ sudo apt-get install libpython3.7 ## Additional Resources - [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) -- [OpenVINO™ toolkit online documentation](https://docs.openvino.ai) - [OpenVINO™ Notebooks](https://github.com/openvinotoolkit/openvino_notebooks) Copyright © 2018-2022 Intel Corporation diff --git a/docs/install_guides/pypi-openvino-rt.md b/docs/install_guides/pypi-openvino-rt.md index ffc6b18e3d1..9e28be5f980 100644 --- a/docs/install_guides/pypi-openvino-rt.md +++ b/docs/install_guides/pypi-openvino-rt.md @@ -4,7 +4,7 @@ OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applications and solutions that solve a variety of tasks including emulation of human vision, automatic speech recognition, natural language processing, recommendation systems, and many others. Based on latest generations of artificial neural networks, including Convolutional Neural Networks (CNNs), recurrent and attention-based networks, the toolkit extends computer vision and non-vision workloads across Intel® hardware, maximizing performance. It accelerates applications with high-performance, AI and deep learning inference deployed from edge to cloud. -[OpenVINO™ Runtime](https://docs.openvino.ai/latest/openvino_docs_OV_Runtime_User_Guide.html) package for Python includes a set of libraries for an easy inference integration into your Python applications and supports of heterogeneous execution across Intel® CPU and Intel® GPU hardware. +[OpenVINO™ Runtime](../OV_Runtime_UG/openvino_intro.md) package for Python includes a set of libraries for an easy inference integration into your Python applications and supports of heterogeneous execution across Intel® CPU and Intel® GPU hardware. ## System Requirements The complete list of supported hardware is available in the [Release Notes](https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-relnotes.html). @@ -90,7 +90,6 @@ sudo apt-get install libpython3.7 ## Additional Resources - [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) -- [OpenVINO™ toolkit online documentation](https://docs.openvino.ai) - [OpenVINO™ Notebooks](https://github.com/openvinotoolkit/openvino_notebooks) Copyright © 2018-2022 Intel Corporation diff --git a/docs/install_guides/troubleshooting.md b/docs/install_guides/troubleshooting.md index caf9225b60c..7be40768e38 100644 --- a/docs/install_guides/troubleshooting.md +++ b/docs/install_guides/troubleshooting.md @@ -2,11 +2,33 @@ -## Issues with Installing OpenVINO™ for Linux from Docker +## Errors with Installing via PIP for PRC Users + +PRC users might encounter errors while downloading sources via PIP during OpenVINO™ installation. To resolve the issues, try one of the following options: + +* Add the download source using the ``-i`` parameter with the Python ``pip`` command. For example: + + ``` sh + pip install openvino-dev -i https://mirrors.aliyun.com/pypi/simple/ + ``` + Use the ``--trusted-host`` parameter if the URL above is ``http`` instead of ``https``. + You can also run the following command to install specific framework. For example: + + ``` + pip install openvino-dev[tensorflow2] -i https://mirrors.aliyun.com/pypi/simple/ + ``` + +* If you run into incompatibility issues between components after installing OpenVINO, try running ``requirements.txt`` with the following command: + + ``` sh + pip install -r /tools/requirements.txt + ``` + +## Issues with Installing OpenVINO on Linux from Docker ### Proxy Issues -If you met proxy issues during the installation with Docker, please set up proxy settings for Docker. See the Proxy section in the [Install the DL Workbench from DockerHub*](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Prerequisites.html#set-proxy) topic. +If you met proxy issues during the installation with Docker, please set up proxy settings for Docker. See the Proxy section in the [Install the DL Workbench from DockerHub](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Prerequisites.html#set-proxy) topic. ### Permission Errors for /dev/shm diff --git a/docs/optimization_guide/dldt_deployment_optimization_common.md b/docs/optimization_guide/dldt_deployment_optimization_common.md index 5844230245d..9fe1da418e2 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_common.md +++ b/docs/optimization_guide/dldt_deployment_optimization_common.md @@ -5,19 +5,19 @@ In many cases, a network expects a pre-processed image, so make sure you do not perform unnecessary steps in your code: - Model Optimizer can efficiently bake the mean and normalization (scale) values into the model (for example, to the weights of the first convolution). Please see [relevant Model Optimizer command-line options](../MO_DG/prepare_model/Additional_Optimizations.md). - Let the OpenVINO accelerate other means of [Image Pre-processing and Conversion](../OV_Runtime_UG/preprocessing_overview.md). -- Note that in many cases, you can directly share the (input) data with the OpenVINO, for example consider [remote tensors API of the GPU Plugin](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md). +- You can directly input a data that is already in the _on-device_ memory, by using the [remote tensors API of the GPU Plugin](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md). -## Prefer OpenVINO Async API +## Prefer OpenVINO Async API The API of the inference requests offers Sync and Async execution. While the `ov::InferRequest::infer()` is inherently synchronous and executes immediately (effectively serializing the execution flow in the current application thread), the Async "splits" the `infer()` into `ov::InferRequest::start_async()` and `ov::InferRequest::wait()`. Please consider the [API examples](../OV_Runtime_UG/ov_infer_request.md). A typical use-case for the `ov::InferRequest::infer()` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, results parsing and associated logic) is kept serial within the thread. In contrast, the `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` allow the application to continue its activities and poll or wait for the inference completion when really needed. So one reason for using asynchronous code is _efficiency_. -**NOTE**: Although the Synchronous API can be somewhat easier to start with, in the production code always prefer to use the Asynchronous (callbacks-based, below) API, as it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios). +> **NOTE**: Although the Synchronous API can be somewhat easier to start with, in the production code always prefer to use the Asynchronous (callbacks-based, below) API, as it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios). -Let's see how the OpenVINO Async API can improve overall throughput rate of the application. The key advantage of the Async approach is as follows: while a device is busy with the inference, the application can do other things in parallel (e.g. populating inputs or scheduling other requests) rather than wait for the inference to complete. +Let's see how the OpenVINO Async API can improve overall frame rate of the application. The key advantage of the Async approach is as follows: while a device is busy with the inference, the application can do other things in parallel (e.g. populating inputs or scheduling other requests) rather than wait for the current inference to complete first. -In the example below, inference is applied to the results of the video decoding. So it is possible to keep two parallel infer requests, and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding IR inference) and not by the sum of the stages. +In the example below, inference is applied to the results of the video decoding. So it is possible to keep two parallel infer requests, and while the current is processed, the input frame for the next is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding vs inference) and not by the sum of the stages. You can compare the pseudo-codes for the regular and async-based approaches: @@ -36,6 +36,8 @@ You can compare the pseudo-codes for the regular and async-based approaches: The technique can be generalized to any available parallel slack. For example, you can do inference and simultaneously encode the resulting or previous frames or run further inference, like emotion detection on top of the face detection results. Refer to the [Object Detection С++ Demo](@ref omz_demos_object_detection_demo_cpp), [Object Detection Python Demo](@ref omz_demos_object_detection_demo_python)(latency-oriented Async API showcase) and [Benchmark App Sample](../../samples/cpp/benchmark_app/README.md) for complete examples of the Async API in action. +> **NOTE**: Using the Asynchronous API is a must for [throughput-oriented scenarios](./dldt_deployment_optimization_tput.md). + ### Notes on Callbacks Notice that the Async's `ov::InferRequest::wait()` waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait`. The most scalable approach is using callbacks (set via the `ov::InferRequest::set_callback`) that are executed upon completion of the request. The callback functions will be used by the OpenVINO runtime to notify on the results (or errors. This is more event-driven approach. @@ -44,8 +46,13 @@ Few important points on the callbacks: - It is the application responsibility to ensure that any callback function is thread-safe - Although executed asynchronously by a dedicated threads the callbacks should NOT include heavy operations (e.g. I/O) and/or blocking calls. Keep the work done by any callback to a minimum. -## "get_tensor" Idiom +## "get_tensor" Idiom +Within the OpenVINO, each device may have different internal requirements on the memory padding, alignment, etc for intermediate tensors. The **input/output tensors** are also accessible by the application code. +As every `ov::InferRequest` is created by the particular instance of the `ov::CompiledModel`(that is already device-specific) the requirements are respected and the requests' input/output tensors are still device-friendly. +Thus: +* `get_tensor` (that offers the `data()` method to get a system-memory pointer to the tensor's content), is a recommended way to populate the inference inputs (and read back the outputs) **from/to the host memory** + * For example, for the GPU device, the inputs/outputs tensors are mapped to the host (which is fast) only when the `get_tensor` is used, while for the `set_tensor` a copy into the internal GPU structures may happen +* In contrast, when the input tensors are already in the **on-device memory** (e.g. as a result of the video-decoding), prefer the `set_tensor` as a zero-copy way to proceed + * Consider [GPU device Remote tensors API](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md). -`get_tensor` is a recommended way to populate the inference inputs (and read back the outputs), as it internally allocates the data with right padding/alignment for the device. For example, the GPU inputs/outputs tensors are mapped to the host (which is fast) only when the `get_tensor` is used, while for the `set_tensor` a copy into the internal GPU structures may happen. -Please consider the [API examples](../OV_Runtime_UG/ov_infer_request.md). -In contrast, the `set_tensor` is a preferable way to handle remote tensors, [for example with the GPU device](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md). +Please consider the [API examples](../OV_Runtime_UG/ov_infer_request.md) for `get_tensor` and `set_tensor`. \ No newline at end of file diff --git a/docs/optimization_guide/dldt_deployment_optimization_guide.md b/docs/optimization_guide/dldt_deployment_optimization_guide.md index fe13deb6801..5ddbbd7637e 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_guide.md +++ b/docs/optimization_guide/dldt_deployment_optimization_guide.md @@ -5,40 +5,40 @@ .. toctree:: :maxdepth: 1 :hidden: - + openvino_docs_deployment_optimization_guide_common openvino_docs_deployment_optimization_guide_latency openvino_docs_deployment_optimization_guide_tput openvino_docs_deployment_optimization_guide_hints + openvino_docs_deployment_optimization_guide_internals @endsphinxdirective ## Deployment Optimizations Overview {#openvino_docs_deployment_optimization_guide_overview} -Runtime or deployment optimizations focus is tuning of the inference parameters (e.g. optimal number of the requests executed simultaneously) and other means of how a model is _executed_. +Runtime or deployment optimizations are focused on tuning of the inference _parameters_ (e.g. optimal number of the requests executed simultaneously) and other means of how a model is _executed_. -Here, possible optimization should start with defining the use-case. For example, whether the target scenario emphasizes throughput over latency like processing millions of samples by overnight jobs in the data centers. -In contrast, real-time usages would likely trade off the throughput to deliver the results at minimal latency. -Often this is a combined scenario that targets highest possible throughput while maintaining a specific latency threshold. +As referenced in the parent [performance introduction topic](./dldt_optimization_guide.md), the [dedicated document](./model_optimization_guide.md) covers the **model-level optimizations** like quantization that unlocks the [int8 inference](../OV_Runtime_UG/Int8Inference.md). Model-optimizations are most general and help any scenario and any device (that accelerated the quantized models). The relevant _runtime_ configuration is `ov::hint::inference_precision` allowing the devices to trade the accuracy for the performance (e.g. by allowing the fp16/bf16 execution for the layers that remain in fp32 after quantization of the original fp32 model). -Each of the [OpenVINO supported devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md) offers low-level performance configuration. This allows to leverage the optimal model performance on the _specific_ device, but may require careful re-tuning when the model or device has changed. -**If the performance portability is of concern, consider using the [OpenVINO High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md) first.** +Then, possible optimization should start with defining the use-case. For example, whether the target scenario emphasizes throughput over latency like processing millions of samples by overnight jobs in the data centers. +In contrast, real-time usages would likely trade off the throughput to deliver the results at minimal latency. Often this is a combined scenario that targets highest possible throughput while maintaining a specific latency threshold. +Below you can find summary on the associated tips. -Finally, how the full-stack application uses the inference component _end-to-end_ is important. -For example, what are the stages that needs to be orchestrated? In some cases a significant part of the workload time is spent on bringing and preparing the input data. As detailed in the section on the [general optimizations](./dldt_deployment_optimization_common.md), the inputs population can be performed asynchronously to the inference. Also, in many cases the (image) [pre-processing can be offloaded to the OpenVINO](../OV_Runtime_UG/preprocessing_overview.md). For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md) to efficiently connect the data input pipeline and the model inference. -These are common performance tricks that help both latency and throughput scenarios. +How the full-stack application uses the inference component _end-to-end_ is also important. For example, what are the stages that needs to be orchestrated? In some cases a significant part of the workload time is spent on bringing and preparing the input data. Below you can find multiple tips on connecting the data input pipeline and the model inference efficiently. +These are also common performance tricks that help both latency and throughput scenarios. - Similarly, the _model-level_ optimizations like [quantization that unlocks the int8 inference](../OV_Runtime_UG/Int8Inference.md) are general and help any scenario. As referenced in the [performance introduction topic](./dldt_optimization_guide.md), these are covered in the [dedicated document](./model_optimization_guide.md). Additionally, the `ov::hint::inference_precision` allows the devices to trade the accuracy for the performance at the _runtime_ (e.g. by allowing the fp16/bf16 execution for the layers that remain in fp32 after quantization of the original fp32 model). - -Further documents cover the _runtime_ performance optimizations topics. Please also consider [matrix support of the features by the individual devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md). +Further documents cover the associated _runtime_ performance optimizations topics. Please also consider [matrix support of the features by the individual devices](@ref features_support_matrix). -[General, application-level optimizations](./dldt_deployment_optimization_common.md): - -* Inputs Pre-processing with the OpenVINO +**General, application-level optimizations**, and specifically: -* Async API and 'get_tensor' Idiom +* [Inputs Pre-processing with the OpenVINO](../OV_Runtime_UG/preprocessing_overview.md) + +* [Async API and 'get_tensor' Idiom](./dldt_deployment_optimization_common.md) + +* For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md) + +**Use-case specific optimizations** along with some implementation details: -Use-case specific optimizations along with some implementation details: - * Optimizing for [throughput](./dldt_deployment_optimization_tput.md) and [latency](./dldt_deployment_optimization_latency.md) - -* [OpenVINO's high-level performance hints](./dldt_deployment_optimization_hints.md) as the portable, future-proof approach for performance configuration + +* [OpenVINO's high-level performance hints](./dldt_deployment_optimization_hints.md) as the portable, future-proof approach for performance configuration, thar does not requires re-tuning when the model or device has changed. + * **If the performance portability is of concern, consider using the [hints](../OV_Runtime_UG/performance_hints.md) first.** \ No newline at end of file diff --git a/docs/optimization_guide/dldt_deployment_optimization_hints.md b/docs/optimization_guide/dldt_deployment_optimization_hints.md index c06cfc4caa2..e4fa34a2906 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_hints.md +++ b/docs/optimization_guide/dldt_deployment_optimization_hints.md @@ -12,7 +12,7 @@ Also, while the resulting performance may be optimal for the specific combinatio Beyond execution _parameters_ there are potentially many device-specific details like _scheduling_ that greatly affect the performance. Specifically, GPU-oriented tricks like batching, which combines many (potentially tens) of input images to achieve optimal throughput, do not always map well to the CPU, as e.g. detailed in the next sections. The hints allow to really hide _execution_ specifics required to saturate the device. For example, no need to explicitly combine multiple inputs into a batch to achieve good GPU performance. -Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using OpenVINO Async API. +Instead, it is possible to keep a separate infer request per camera or another source of input and process the requests in parallel using Async API as explained in the [common-optimizations section](@ref openvino_docs_deployment_optimization_guide_common). The only requirement for the application to leverage the throughput is about **running multiple inference requests in parallel**. OpenVINO's device-specific implementation of the hints will take care of the rest. This allows a developer to greatly simplify the app-logic. diff --git a/docs/optimization_guide/dldt_deployment_optimization_internals.md b/docs/optimization_guide/dldt_deployment_optimization_internals.md new file mode 100644 index 00000000000..5899e85f4e7 --- /dev/null +++ b/docs/optimization_guide/dldt_deployment_optimization_internals.md @@ -0,0 +1,24 @@ +# Further Low-Level Implementation Details {#openvino_docs_deployment_optimization_guide_internals} +## Throughput on the CPU: Internals +As explained in the [throughput-related section](./dldt_deployment_optimization_tput.md), the OpenVINO streams is a mean of running multiple requests in parallel. +In order to best serve multiple inference requests executed simultaneously, the inference threads are grouped/pinned to the particular CPU cores, constituting the "CPU" streams. +This provides much better performance for the networks than batching especially for the many-core machines: +![](../img/cpu_streams_explained_1.png) + +Compared with the batching, the parallelism is somewhat transposed (i.e. performed over inputs, with much less synchronization within CNN ops): +![](../img/cpu_streams_explained.png) + +Notice that [high-level performance hints](../OV_Runtime_UG/performance_hints.md) allows the implementation to select the optimal number of the streams, _depending on the model compute demands_ and CPU capabilities (including [int8 inference](../OV_Runtime_UG/Int8Inference.md) hardware acceleration, number of cores, etc). + +## Automatic Batching Internals +As explained in the section on the [automatic batching](../OV_Runtime_UG/automatic_batching.md), the feature performs on-the-fly grouping of the inference requests to improve device utilization. +The Automatic Batching relaxes the requirement for an application to saturate devices like GPU by _explicitly_ using a large batch. It performs transparent inputs gathering from +individual inference requests followed by the actual batched execution, with no programming effort from the user: +![](../img/BATCH_device.PNG) + +Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches. Thus, for the execution to be efficient it is very important that the requests arrive timely, without causing a batching timeout. +Normally, the timeout should never be hit. It is rather a graceful way to handle the application exit (when the inputs are not arriving anymore, so the full batch is not possible to collect). + +So if your workload experiences the timeouts (resulting in the performance drop, as the timeout value adds itself to the latency of every request), consider balancing the timeout value vs the batch size. For example in many cases having smaller timeout value and batch size may yield better performance than large batch size, but coupled with the timeout value that cannot guarantee accommodating the full number of the required requests. + +Finally, following the "get_tensor idiom" section from the [general optimizations](./dldt_deployment_optimization_common.md) helps the Automatic Batching to save on inputs/outputs copies. Thus, in your application always prefer the "get" versions of the tensors' data access APIs. diff --git a/docs/optimization_guide/dldt_deployment_optimization_latency.md b/docs/optimization_guide/dldt_deployment_optimization_latency.md index cf75edc6bc1..17362f92044 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_latency.md +++ b/docs/optimization_guide/dldt_deployment_optimization_latency.md @@ -5,25 +5,26 @@ .. toctree:: :maxdepth: 1 :hidden: - + openvino_docs_IE_DG_Model_caching_overview @endsphinxdirective ## Latency Specifics A significant fraction of applications focused on the situations where typically a single model is loaded (and single input is used) at a time. -This is a regular "consumer" use case and a default (also for the legacy reasons) performance setup for any OpenVINO device. -Notice that an application can create more than one request if needed (for example to support asynchronous inputs population), the question is really about how many requests are being executed in parallel. +This is a regular "consumer" use case. +While an application can create more than one request if needed (for example to support [asynchronous inputs population](./dldt_deployment_optimization_common.md)), the inference performance depends on **how many requests are being inferenced in parallel** on a device. Similarly, when multiple models are served on the same device, it is important whether the models are executed simultaneously, or in chain (for example in the inference pipeline). -As expected, the lowest latency is achieved with only one concurrent inference at a moment. Accordingly, any additional concurrency usually results in the latency growing fast. +As expected, the easiest way to achieve the lowest latency is **running only one concurrent inference at a moment** on the device. Accordingly, any additional concurrency usually results in the latency growing fast. -However, for example, specific configurations, like multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes in the machine. -Thus, human expertise is required to get the most out of the device even in the latency case. Consider using [OpenVINO high-level performance hints](../OV_Runtime_UG/performance_hints.md) instead. +However, some conventional "root" devices (e.g. CPU or GPU) can be in fact internally composed of several "sub-devices". In many cases letting the OpenVINO to transparently leverage the "sub-devices" helps to improve the application throughput (e.g. serve multiple clients simultaneously) without degrading the latency. For example, multi-socket CPUs can deliver as high number of requests (at the same minimal latency) as there are NUMA nodes in the machine. Similarly, a multi-tile GPU (which is essentially multiple GPUs in a single package), can deliver a multi-tile scalability with the number of inference requests, while preserving the single-tile latency. -**NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. +Thus, human expertise is required to get more _throughput_ out of the device even in the inherently latency-oriented cases. OpenVINO can take this configuration burden via [high-level performance hints](../OV_Runtime_UG/performance_hints.md). -In the case when there are multiple models to be used simultaneously, consider using different devices for inferencing the different models. Finally, when multiple models are executed in parallel on the device, using additional `ov::hint::model_priority` may help to define relative priorities of the models (please refer to the documentation on the [matrix features support for OpenVINO devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md) to check for the support of the feature by the specific device). +> **NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. + +In the case when there are multiple models to be used simultaneously, consider using different devices for inferencing the different models. Finally, when multiple models are executed in parallel on the device, using additional `ov::hint::model_priority` may help to define relative priorities of the models (please refer to the documentation on the [matrix features support for OpenVINO devices](@ref features_support_matrix) to check for the support of the feature by the specific device). ## First-Inference Latency and Model Load/Compile Time There are cases when model loading/compilation are heavily contributing to the end-to-end latencies. diff --git a/docs/optimization_guide/dldt_deployment_optimization_tput.md b/docs/optimization_guide/dldt_deployment_optimization_tput.md index 5fdfe20bc57..d1ca74439b2 100644 --- a/docs/optimization_guide/dldt_deployment_optimization_tput.md +++ b/docs/optimization_guide/dldt_deployment_optimization_tput.md @@ -1,68 +1,79 @@ # Optimizing for Throughput {#openvino_docs_deployment_optimization_guide_tput} ## General Throughput Considerations -As described in the section on the [latency-specific considerations](./dldt_deployment_optimization_latency.md) one possible use-case is delivering the every single request at the minimal delay. -Throughput on the other hand, is about inference scenarios in which potentially large number of inference requests are served simultaneously. -Here, the overall application throughput can be significantly improved with the right performance configuration. -Also, if the model is not already compute- or memory bandwidth-limited, the associated increase in latency is not linearly dependent on the number of requests executed in parallel. +As described in the section on the [latency-specific considerations](./dldt_deployment_optimization_latency.md) one possible use-case is delivering every single request at the minimal delay. +Throughput on the other hand, is about inference scenarios in which potentially large **number of inference requests are served simultaneously to improve the device utilization**. -With the OpenVINO there two major means of running the multiple requests simultaneously: batching and "streams", explained in this document. -Yet, different GPUs behave differently with batch sizes, just like different CPUs require different number of execution streams to maximize the throughput. -Predicting inference performance is difficult and and finding optimal execution parameters requires direct experiments measurements. -One possible throughput optimization strategy is to set an upper bound for latency and then increase the batch size or number of the streams until that tail latency is met (or the throughput is not growing anymore). -Also, consider [Deep Learning Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Introduction.html). +Here, the overall application inference rate can be significantly improved with the right performance configuration. +Also, if the model is not already memory bandwidth-limited, the associated increase in latency is not linearly dependent on the number of requests executed in parallel. +With the OpenVINO there are two major means of processing multiple inputs simultaneously: **batching** and **streams**, explained in this document. -Finally, the [automatic multi-device execution](../OV_Runtime_UG/multi_device.md) helps to improve the throughput, please also see the section below. -While the same approach of optimizing the parameters of each device separately does work, the resulting multi-device performance is a fraction (that is different for different models) of the “ideal” (plain sum) performance. +## OpenVINO Streams +As detailed in the [common-optimizations section](@ref openvino_docs_deployment_optimization_guide_common) running multiple inference requests asynchronously is important for general application efficiency. +The [Asynchronous API](./dldt_deployment_optimization_common.md) is in fact the "application side" of scheduling, as every device internally implements a queue. The queue acts as a buffer, storing the inference requests until retrieved by the device at its own pace. +Further, the devices may actually process multiple inference requests in parallel in order to improve the device utilization and overall throughput. This parallelism is commonly referred as 'streams'. Some devices (like GPU) may run several requests per stream to amortize the host-side costs. +Notice that streams are **really executing the requests in parallel, but not in the lock step** (as e.g. the batching does), which makes the streams fully compatible with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md) when individual requests can have different shapes. + +For efficient asynchronous execution, the streams are actually handling inference with special pool of the threads. +So each time you start inference requests (potentially from different application threads), they are actually muxed into a inference queue of the particular `ov:Compiled_Model`. +If there is a vacant stream, it pops the request from the queue and actually expedites that to the on-device execution. + +The multi-streams approach is inherently throughput-oriented, as every stream requires a dedicated device memory to do inference in parallel to the rest of streams. +Although similar, the streams are always preferable compared to creating multiple `ov:Compiled_Model` instances for the same model, as weights memory is shared across streams, reducing the overall memory consumption. +Notice that the streams inflate the model load/compilation time. +Finally, using streams does increase the latency of an individual request, this is why for example the [latency hint](./dldt_deployment_optimization_hints.md) governs a device to create a bare minimum of streams (usually just one). +Please find the considerations for the optimal number of the streams in the later sections. + +## Batching +Hardware accelerators like GPUs are optimized for massive compute parallelism, so the batching helps to saturate the device and leads to higher throughput. +While the streams (described) earlier already allow to hide the communication overheads and certain bubbles in the scheduling, running multiple OpenCL kernels simultaneously is less GPU-efficient, compared to calling a kernel on the multiple inputs at once. +As explained in the next section, the batching is a must to leverage maximum throughput on the GPUs. + +There are two primary ways of using the batching to help application performance: +* Collecting the inputs explicitly on the application side and then _sending these batched requests to the OpenVINO_ + * Although this gives flexibility with the possible batching strategies, the approach requires redesigning the application logic +* _Sending individual requests_, while configuring the OpenVINO to collect and perform inference on the requests in batch [automatically](../OV_Runtime_UG/automatic_batching.md). +In both cases, optimal batch size is very device-specific. Also as explained below, the optimal batch size depends on the model, inference precision and other factors. + +## Choosing the Batch Size and Number of Streams +Predicting the inference performance is difficult and finding optimal execution parameters requires direct experiments with measurements. +One possible throughput optimization strategy is to **set an upper bound for latency and then increase the batch size or number of the streams until that tail latency is met (or the throughput is not growing anymore)**. +Also, consider [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) that builds handy latency vs throughput charts, iterating over possible values of the batch size and number of streams. + +Different devices behave differently with the batch sizes. The optimal batch size depends on the model, inference precision and other factors. Similarly, different devices require different number of execution streams to maximize the throughput. +Below are general recommendations: +* For the **CPU always prefer the streams** over the batching + * Create as many streams as you application runs the requests simultaneously + * Number of streams should be enough to meet the _average_ parallel slack rather than the peak load + * _Maximum number of streams_ equals **total number of CPU cores** + * As explained in the [CPU streams internals](dldt_deployment_optimization_internals.md), the CPU cores are evenly distributed between streams, so one core per stream is the finest-grained configuration +* For the **GPU**: + * When the parallel slack is small (e.g. only 2-4 requests executed simultaneously), then using the streams for the GPU may suffice + * Notice that the GPU runs 2 request per stream + * _Maximum number of streams_ is usually 2, for more portability consider using the `ov::streams::AUTO` (`GPU_THROUGHPUT_AUTO` in the pre-OpenVINO 2.0 parlance) + * Typically, for 4 and more requests the batching delivers better throughput for the GPUs + * Batch size can be calculated as "number of inference requests executed _in parallel_" divided by the "number of requests that the streams consume" + * E.g. if you process 16 cameras (by 16 requests inferenced _simultaneously_) with 2 GPU streams (each can process 2 requests), the batch size per request is 16/(2*2)=4 + +> **NOTE**: When playing with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md) use only the streams (no batching), as they tolerate individual requests having different shapes. + +> **NOTE**: Using the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md) explained in the next section, is the most portable and future-proof option, allowing the OpenVINO to find best combination of streams and batching for a given scenario and model. + +## OpenVINO Hints: Selecting Optimal Execution and Parameters **Automatically** Overall, the latency-throughput is not linearly dependent and very _device_ specific. It is also tightly integrated with _model_ characteristics. As for the possible inference devices the scenery had already become pretty diverse, the OpenVINO has introduced the dedicated notion of the high-level performance configuration "hints" to describe the target application scenarios. The hints are described [here](./dldt_deployment_optimization_hints.md). -**NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. +The hints also obviates the need for explicit (application-side) batching. With the hints, the only requirement for the application is to run multiple individual requests using [Async API](./dldt_deployment_optimization_common.md) and let the OpenVINO decide whether to collect the requests and execute them in batch, streams, or both. -The rest of the document provides low-level details on the OpenVINO's low-level ways to optimize the throughput. +> **NOTE**: [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof. -## Low-Level Implementation Details -### OpenVINO Streams -As detailed in the section OpenVINO Async API running multiple inference requests asynchronously is important for general application efficiency. -Additionally, most devices support running multiple inference requests in parallel in order to improve the device utilization. The _level_ of the parallelism (i.e. how many requests are really executed in parallel on the device) is commonly referred as a number of 'streams'. Some devices run several requests per stream to amortize the host-side costs. -Notice that streams (that can be considered as independent queues) are really executing the requests in parallel, but not in the lock step (as e.g. the batching does), this makes the streams much more compatible with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md) when individual requests can have different shapes. +## Multi-Device Execution +OpenVINO offers _automatic_, [scalable multi-device inference](../OV_Runtime_UG/multi_device.md). This is simple _application-transparent_ way to improve the throughput. No need to re-architecture existing applications for any explicit multi-device support: no explicit network loading to each device, no separate per-device queues, no additional logic to balance the inference requests between devices, etc. From the application point of view, it is communicating to the single device that internally handles the actual machinery. +Just like with other throughput-oriented scenarios, there are two major pre-requisites for optimal multi-device performance: +* Using the [Asynchronous API](@ref openvino_docs_deployment_optimization_guide_common) and [callbacks](../OV_Runtime_UG/ov_infer_request.md) in particular +* Providing the multi-device (and hence the underlying devices) with enough data to crunch. As the inference requests are naturally independent data pieces, the multi-device performs load-balancing at the “requests” (outermost) level to minimize the scheduling overhead. -Also, notice that for efficient asynchronous execution, the streams are actually handling inference with special pool of the threads. -So each time you start inference requests (potentially from different application threads), they are actually muxed into a inference queue of the particular `ov:compiled_model`. -If there is a vacant stream, it pops the request from the queue and actually expedites that to the on-device execution. - -The usage of multiple streams is an inherently throughput-oriented approach, as every stream requires a dedicated memory to operate in parallel to the rest streams (read-only data like weights are usually shared between all streams). -Also, the streams inflate the load/compilation time. -This is why the [latency hint](./dldt_deployment_optimization_hints.md) governs a device to create a bare minimum of streams (usually just one). - -Finally, the streams are always preferable compared to creating multiple instances of the same model, as weights memory is shared across streams, reducing possible memory consumption. - -### Throughput on the CPU: Internals -In order to best serve multiple inference requests simultaneously, the inference threads are grouped/pinned to the particular CPU cores, constituting the CPU streams. -This provides much better performance for the networks than batching especially for the many-core machines: -![](../img/cpu_streams_explained_1.png) - -Compared with the batching, the parallelism is somewhat transposed (i.e. performed over inputs, with much less synchronization within CNN ops): -![](../img/cpu_streams_explained.png) - -Notice that [high-level performance hints](../OV_Runtime_UG/performance_hints.md) allows the implementation to select the optimal number of the streams, _depending on the model compute demands_ and CPU capabilities (including [int8 inference](../OV_Runtime_UG/Int8Inference.md) hardware acceleration, number of cores, etc). - -### Automatic Batching Internals -While the GPU plugin fully supports general notion of the streams, the associated performance (throughput) improvements are usually modest. -The primary reason is that, while the streams allow to hide the communication overheads and hide certain bubbles in device utilization, running multiple OpenCL kernels on the GPU simultaneously is less efficient, compared to calling a kernel on the multiple inputs at once. - -When the parallel slack is small (e.g. only 2-4 requests executed simultaneously), then using the streams for the GPU may suffice. Also streams are fully compatible with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md) when individual requests can have different shapes. -Typically, for 4 and more requests the batching delivers better throughput for the GPUs. Using the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md) is the most portable and future-proof option, allowing the OpenVINO to find best combination of streams and batching for a given scenario. -As explained in the section on the [automatic batching](../OV_Runtime_UG/automatic_batching.md), the feature performs on-the-fly grouping of the inference requests to improve device utilization. -The Automatic Batching relaxes the requirement for an application to saturate devices like GPU by _explicitly_ using a large batch. It performs transparent inputs gathering from -individual inference requests followed by the actual batched execution, with no programming effort from the user: -![](../img/BATCH_device.PNG) - -Essentially, the Automatic Batching shifts the asynchronousity from the individual requests to the groups of requests that constitute the batches. Thus, for the execution to be efficient it is very important that the requests arrive timely, without causing a batching timeout. -Normally, the timeout should never be hit. It is rather a graceful way to handle the application exit (when the inputs are not arriving anymore, so the full batch is not possible to collect). - -So if your workload experiences the timeouts (resulting in the performance drop, as the timeout value adds itself to the latency of every request), consider balancing the timeout value vs the batch size. For example in many cases having smaller timeout value and batch size may yield better performance than large batch size, but coupled with the timeout value that cannot guarantee accommodating the full number of the required requests. - -Finally, following the "get_tensor idiom" section from the [general optimizations](./dldt_deployment_optimization_common.md) helps the Automatic Batching to save on inputs/outputs copies. Thus, in your application always prefer the "get" versions of the tensor data access APIs. +Notice that the resulting performance is usually a fraction of the “ideal” (plain sum) value, when the devices compete for a certain resources, like the memory-bandwidth which is shared between CPU and iGPU. +> **NOTE**: While the legacy approach of optimizing the parameters of each device separately works, the [OpenVINO performance hints](./dldt_deployment_optimization_hints.md) allow to configure all devices (that are part of the specific multi-device configuration) at once. diff --git a/docs/optimization_guide/dldt_optimization_guide.md b/docs/optimization_guide/dldt_optimization_guide.md index a90f744ff2b..dbb7c1fad82 100644 --- a/docs/optimization_guide/dldt_optimization_guide.md +++ b/docs/optimization_guide/dldt_optimization_guide.md @@ -9,15 +9,16 @@ Generally, performance means how fast the model processes the live data. Two key ![](../img/LATENCY_VS_THROUGHPUT.svg) -Latency measures inference time (ms) required to process a single input. When it comes to the executing multiple inputs executed simultaneously (e.g. via batching) then the overall throughput (inferences per second, or frames per second, FPS, in the specific case of visual processing) is usually of more concern. -To calculate throughput, divide number of frames that were processed by the processing time. +**Latency** measures inference time (ms) required to process a single input. When it comes to the executing multiple inputs executed simultaneously (e.g. via batching) then the overall throughput (inferences per second, or frames per second, FPS, in the specific case of visual processing) is usually of more concern. +To calculate **throughput**, divide number of inputs that were processed by the processing time. +## End-to-End Application Performance It is important to separate the "pure" inference time of a neural network and the end-to-end application performance. For example data transfers between the host and a device may unintentionally affect the performance when a host input tensor is processed on the accelerator like dGPU. Similarly, the image-preprocessing may also contribute significantly to the to inference time. As detailed in the [getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md) section, when drilling into _inference_ performance, one option is to measure all such items separately. -For the end-to-end scenario though, consider the image pre-processing thru the OpenVINO and the asynchronous execution is a way to amortize the communication costs like data transfers. You can find further details in the [general optimizations document](./dldt_deployment_optimization_common.md). +For the **end-to-end scenario** though, consider the image pre-processing thru the OpenVINO and the asynchronous execution as a way to amortize the communication costs like data transfers. You can find further details in the [general optimizations document](./dldt_deployment_optimization_common.md). -"First-inference latency" is another specific case (e.g. when fast application start-up is required) where the resulting performance may be well dominated by the model loading time. Consider [model caching](../OV_Runtime_UG/Model_caching_overview.md) as a way to improve model loading/compilation time. +**First-inference latency** is another specific case (e.g. when fast application start-up is required) where the resulting performance may be well dominated by the model loading time. Consider [model caching](../OV_Runtime_UG/Model_caching_overview.md) as a way to improve model loading/compilation time. -Finally, memory footprint restrictions is another possible concern when designing an application. While this is a motivation for the _model_ optimization techniques referenced in the next section, notice that the the throughput-oriented execution is usually much more memory-hungry, as detailed in the [Deployment Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md). +Finally, **memory footprint** restrictions is another possible concern when designing an application. While this is a motivation for the _model_ optimization techniques referenced in the next section, notice that the the throughput-oriented execution is usually much more memory-hungry, as detailed in the [Runtime Inference Optimizations](../optimization_guide/dldt_deployment_optimization_guide.md). > **NOTE**: To get performance numbers for OpenVINO, as well as tips how to measure it and compare with native framework, check [Getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md) page. @@ -28,9 +29,9 @@ Finally, memory footprint restrictions is another possible concern when designin With the OpenVINO there are two primary ways of improving the inference performance, namely model- and runtime-level optimizations. **These two optimizations directions are fully compatible**. -- **Model optimization** includes model modification, such as quantization, pruning, optimization of preprocessing, etc. Fore more details, refer to this [document](./model_optimization_guide.md). +- **Model optimizations** includes model modification, such as quantization, pruning, optimization of preprocessing, etc. Fore more details, refer to this [document](./model_optimization_guide.md). -- **Runtime (Deployment) optimization** includes tuning of model _execution_ parameters. To read more visit [Deployment Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md). +- **Runtime (Deployment) optimizations** includes tuning of model _execution_ parameters. To read more visit the [Runtime Inference Optimizations](../optimization_guide/dldt_deployment_optimization_guide.md). ## Performance benchmarks To estimate the performance and compare performance numbers, measured on various supported devices, a wide range of public models are available at [Performance benchmarks](../benchmarks/performance_benchmarks.md) section. \ No newline at end of file diff --git a/docs/snippets/CMakeLists.txt b/docs/snippets/CMakeLists.txt index b9908896c8b..dd128b85f10 100644 --- a/docs/snippets/CMakeLists.txt +++ b/docs/snippets/CMakeLists.txt @@ -4,6 +4,13 @@ set(TARGET_NAME ie_docs_snippets) +if(CMAKE_COMPILER_IS_GNUCXX OR OV_COMPILER_IS_CLANG) + ie_add_compiler_flags(-Wno-unused-variable) + if(CMAKE_COMPILER_IS_GNUCXX) + ie_add_compiler_flags(-Wno-unused-variable -Wno-unused-but-set-variable) + endif() +endif() + file(GLOB SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/*.cpp" "${CMAKE_CURRENT_SOURCE_DIR}/gpu/*.cpp") @@ -57,9 +64,9 @@ endif() # remove OpenCV related sources if (ENABLE_OPENCV) - find_package(OpenCV QUIET) + find_package(OpenCV QUIET) else() - set(OpenCV_FOUND FALSE) + set(OpenCV_FOUND OFF) endif() if(NOT OpenCV_FOUND) @@ -102,30 +109,25 @@ if(ENABLE_OV_ONNX_FRONTEND) target_link_libraries(${TARGET_NAME} PRIVATE openvino_onnx_frontend) endif() -if(NOT MSVC) - target_compile_options(${TARGET_NAME} PRIVATE -Wno-unused-variable) - if(CMAKE_COMPILER_IS_GNUCXX) - target_compile_options(${TARGET_NAME} PRIVATE -Wno-unused-but-set-variable) - endif() -endif() - target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime openvino::runtime::dev) +# ov_ncc_naming_style(FOR_TARGET "${TARGET_NAME}" +# SOURCE_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" +# ADDITIONAL_INCLUDE_DIRECTORIES +# $) + +# +# Example +# + set(TARGET_NAME "ov_integration_snippet") # [cmake:integration_example] cmake_minimum_required(VERSION 3.10) set(CMAKE_CXX_STANDARD 11) - find_package(OpenVINO REQUIRED) add_executable(${TARGET_NAME} src/main.cpp) target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime) # [cmake:integration_example] -if(NOT MSVC) - target_compile_options(${TARGET_NAME} PRIVATE -Wno-unused-variable) - if(CMAKE_COMPILER_IS_GNUCXX) - target_compile_options(${TARGET_NAME} PRIVATE -Wno-unused-but-set-variable) - endif() -endif() diff --git a/docs/snippets/Graph_debug_capabilities0.cpp b/docs/snippets/Graph_debug_capabilities0.cpp deleted file mode 100644 index 02c6a2c153b..00000000000 --- a/docs/snippets/Graph_debug_capabilities0.cpp +++ /dev/null @@ -1,13 +0,0 @@ -#include -#include -#include - -int main() { -using namespace InferenceEngine; -//! [part0] -std::shared_ptr model; -// ... -ov::pass::VisualizeTree("after.png").run_on_model(model); // Visualize the nGraph function to an image -//! [part0] -return 0; -} diff --git a/docs/snippets/Graph_debug_capabilities1.cpp b/docs/snippets/Graph_debug_capabilities1.cpp deleted file mode 100644 index 5649ed5abfb..00000000000 --- a/docs/snippets/Graph_debug_capabilities1.cpp +++ /dev/null @@ -1,13 +0,0 @@ -#include -#include - -int main() { -using namespace InferenceEngine; -//! [part1] -std::shared_ptr nGraph; -// ... -CNNNetwork network(nGraph); -network.serialize("test_ir.xml", "test_ir.bin"); -//! [part1] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI0.cpp b/docs/snippets/InferenceEngine_QueryAPI0.cpp deleted file mode 100644 index ebd1fdf30a3..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI0.cpp +++ /dev/null @@ -1,10 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part0] -InferenceEngine::Core core; -std::vector availableDevices = core.GetAvailableDevices(); -//! [part0] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI1.cpp b/docs/snippets/InferenceEngine_QueryAPI1.cpp deleted file mode 100644 index 26c75cab072..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI1.cpp +++ /dev/null @@ -1,10 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part1] -InferenceEngine::Core core; -bool dumpDotFile = core.GetConfig("HETERO", HETERO_CONFIG_KEY(DUMP_GRAPH_DOT)).as(); -//! [part1] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI2.cpp b/docs/snippets/InferenceEngine_QueryAPI2.cpp deleted file mode 100644 index 473f217fa1f..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI2.cpp +++ /dev/null @@ -1,10 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part2] -InferenceEngine::Core core; -std::string cpuDeviceName = core.GetMetric("GPU", METRIC_KEY(FULL_DEVICE_NAME)).as(); -//! [part2] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI3.cpp b/docs/snippets/InferenceEngine_QueryAPI3.cpp deleted file mode 100644 index afd9f36948d..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI3.cpp +++ /dev/null @@ -1,12 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part3] -InferenceEngine::Core core; -auto network = core.ReadNetwork("sample.xml"); -auto exeNetwork = core.LoadNetwork(network, "CPU"); -auto nireq = exeNetwork.GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as(); -//! [part3] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI4.cpp b/docs/snippets/InferenceEngine_QueryAPI4.cpp deleted file mode 100644 index ee7476a76ee..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI4.cpp +++ /dev/null @@ -1,12 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part4] -InferenceEngine::Core core; -auto network = core.ReadNetwork("sample.xml"); -auto exeNetwork = core.LoadNetwork(network, "MYRIAD"); -float temperature = exeNetwork.GetMetric(METRIC_KEY(DEVICE_THERMAL)).as(); -//! [part4] -return 0; -} diff --git a/docs/snippets/InferenceEngine_QueryAPI5.cpp b/docs/snippets/InferenceEngine_QueryAPI5.cpp deleted file mode 100644 index 4297c886699..00000000000 --- a/docs/snippets/InferenceEngine_QueryAPI5.cpp +++ /dev/null @@ -1,12 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part5] -InferenceEngine::Core core; -auto network = core.ReadNetwork("sample.xml"); -auto exeNetwork = core.LoadNetwork(network, "CPU"); -auto ncores = exeNetwork.GetConfig(PluginConfigParams::KEY_CPU_THREADS_NUM).as(); -//! [part5] -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide1.cpp b/docs/snippets/dldt_optimization_guide1.cpp deleted file mode 100644 index 91b44081351..00000000000 --- a/docs/snippets/dldt_optimization_guide1.cpp +++ /dev/null @@ -1,16 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part1] -Core ie; -auto netReader = ie.ReadNetwork("sample.xml"); -InferenceEngine::InputsDataMap info(netReader.getInputsInfo()); -auto& inputInfoFirst = info.begin()->second; -for (auto& it : info) { - it.second->setPrecision(Precision::U8); -} -//! [part1] - -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide2.cpp b/docs/snippets/dldt_optimization_guide2.cpp deleted file mode 100644 index 97f6e28e3ee..00000000000 --- a/docs/snippets/dldt_optimization_guide2.cpp +++ /dev/null @@ -1,14 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part2] -//Lock Intel MSS surface -mfxFrameSurface1 *frame_in; //Input MSS surface. -mfxFrameAllocator* pAlloc = &m_mfxCore.FrameAllocator(); -pAlloc->Lock(pAlloc->pthis, frame_in->Data.MemId, &frame_in->Data); -//Inference Engine code -//! [part2] - -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide3.cpp b/docs/snippets/dldt_optimization_guide3.cpp deleted file mode 100644 index e3be0da706d..00000000000 --- a/docs/snippets/dldt_optimization_guide3.cpp +++ /dev/null @@ -1,22 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part3] -InferenceEngine::SizeVector dims_src = { - 1 /* batch, N*/, - (size_t) frame_in->Info.Height /* Height */, - (size_t) frame_in->Info.Width /* Width */, - 3 /*Channels,*/, - }; -InferenceEngine::TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NHWC); -/* wrapping the surface data, as RGB is interleaved, need to pass only ptr to the R, notice that this wouldn’t work with planar formats as these are 3 separate planes/pointers*/ -InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*) frame_in->Data.R); -inferRequest.SetBlob("input", p); -inferRequest.Infer(); -//Make sure to unlock the surface upon inference completion, to return the ownership back to the Intel MSS -pAlloc->Unlock(pAlloc->pthis, frame_in->Data.MemId, &frame_in->Data); -//! [part3] - -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide4.cpp b/docs/snippets/dldt_optimization_guide4.cpp deleted file mode 100644 index 52396aa268b..00000000000 --- a/docs/snippets/dldt_optimization_guide4.cpp +++ /dev/null @@ -1,20 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part4] -InferenceEngine::SizeVector dims_src = { - 1 /* batch, N*/, - 3 /*Channels,*/, - (size_t) frame_in->Info.Height /* Height */, - (size_t) frame_in->Info.Width /* Width */, - }; -TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NCHW); -/* wrapping the RGBP surface data*/ -InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*) frame_in->Data.R); -inferRequest.SetBlob("input", p); -// … -//! [part4] - -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide5.cpp b/docs/snippets/dldt_optimization_guide5.cpp deleted file mode 100644 index a9226ecce3a..00000000000 --- a/docs/snippets/dldt_optimization_guide5.cpp +++ /dev/null @@ -1,30 +0,0 @@ -#include -#include - -int main() { -InferenceEngine::InferRequest inferRequest; -//! [part5] -cv::Mat frame(cv::Size(100, 100), CV_8UC3); // regular CV_8UC3 image, interleaved -// creating blob that wraps the OpenCV’s Mat -// (the data it points should persists until the blob is released): -InferenceEngine::SizeVector dims_src = { - 1 /* batch, N*/, - (size_t)frame.rows /* Height */, - (size_t)frame.cols /* Width */, - (size_t)frame.channels() /*Channels,*/, - }; -InferenceEngine::TensorDesc desc(InferenceEngine::Precision::U8, dims_src, InferenceEngine::NHWC); -InferenceEngine::TBlob::Ptr p = InferenceEngine::make_shared_blob( desc, (uint8_t*)frame.data, frame.step[0] * frame.rows); -inferRequest.SetBlob("input", p); -inferRequest.Infer(); -// … -// similarly, you can wrap the output tensor (let’s assume it is FP32) -// notice that the output should be also explicitly stated as NHWC with setLayout -auto output_blob = inferRequest.GetBlob("output"); -const float* output_data = output_blob->buffer().as(); -auto dims = output_blob->getTensorDesc().getDims(); -cv::Mat res (dims[2], dims[3], CV_32FC3, (void *)output_data); -//! [part5] - -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide6.cpp b/docs/snippets/dldt_optimization_guide6.cpp deleted file mode 100644 index 5e5cd2de485..00000000000 --- a/docs/snippets/dldt_optimization_guide6.cpp +++ /dev/null @@ -1,24 +0,0 @@ -#include - -int main() { -using namespace InferenceEngine; -//! [part6] -InferenceEngine::Core ie; -auto network = ie.ReadNetwork("Model.xml", "Model.bin"); -InferenceEngine::InputsDataMap input_info(network.getInputsInfo()); - -auto executable_network = ie.LoadNetwork(network, "GPU"); -auto infer_request = executable_network.CreateInferRequest(); - -for (auto & item : input_info) { - std::string input_name = item.first; - auto input = infer_request.GetBlob(input_name); - /** Lock/Fill input tensor with data **/ - unsigned char* data = input->buffer().as::value_type*>(); - // ... -} - -infer_request.Infer(); -//! [part6] -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide7.cpp b/docs/snippets/dldt_optimization_guide7.cpp deleted file mode 100644 index c2fdc529c37..00000000000 --- a/docs/snippets/dldt_optimization_guide7.cpp +++ /dev/null @@ -1,15 +0,0 @@ -#include - -int main() { -InferenceEngine::Core core; -auto network0 = core.ReadNetwork("sample.xml"); -auto network1 = core.ReadNetwork("sample.xml"); -//! [part7] -//these two networks go thru same plugin (aka device) and their requests will not overlap. -auto executable_network0 = core.LoadNetwork(network0, "CPU", - {{InferenceEngine::PluginConfigParams::KEY_EXCLUSIVE_ASYNC_REQUESTS, InferenceEngine::PluginConfigParams::YES}}); -auto executable_network1 = core.LoadNetwork(network1, "GPU", - {{InferenceEngine::PluginConfigParams::KEY_EXCLUSIVE_ASYNC_REQUESTS, InferenceEngine::PluginConfigParams::YES}}); -//! [part7] -return 0; -} diff --git a/docs/snippets/dldt_optimization_guide9.cpp b/docs/snippets/dldt_optimization_guide9.cpp index bdab20e7326..2efbde1d69b 100644 --- a/docs/snippets/dldt_optimization_guide9.cpp +++ b/docs/snippets/dldt_optimization_guide9.cpp @@ -6,7 +6,8 @@ while(true) { // capture frame // populate NEXT InferRequest // start NEXT InferRequest //this call is async and returns immediately - // wait for the CURRENT InferRequest //processed in a dedicated thread + + // wait for the CURRENT InferRequest // display CURRENT result // swap CURRENT and NEXT InferRequests } diff --git a/docs/snippets/example_async_infer_request.cpp b/docs/snippets/example_async_infer_request.cpp index 782182f5caa..ed3f880e4a8 100644 --- a/docs/snippets/example_async_infer_request.cpp +++ b/docs/snippets/example_async_infer_request.cpp @@ -12,11 +12,11 @@ class AcceleratorSyncRequest : public IInferRequestInternal { public: using Ptr = std::shared_ptr; - void Preprocess(); - void WriteToDevice(); - void RunOnDevice(); - void ReadFromDevice(); - void PostProcess(); + void preprocess(); + void write_to_device(); + void run_on_device(); + void read_from_device(); + void post_process(); }; // ! [async_infer_request:define_pipeline] @@ -40,19 +40,19 @@ class AcceleratorAsyncInferRequest : public AsyncInferRequestThreadSafeDefault { // Five pipeline stages of synchronous infer request are run by different executors _pipeline = { { _preprocessExecutor , [this] { - _accSyncRequest->Preprocess(); + _accSyncRequest->preprocess(); }}, { _writeToDeviceExecutor , [this] { - _accSyncRequest->WriteToDevice(); + _accSyncRequest->write_to_device(); }}, { _runOnDeviceExecutor , [this] { - _accSyncRequest->RunOnDevice(); + _accSyncRequest->run_on_device(); }}, { _readFromDeviceExecutor , [this] { - _accSyncRequest->ReadFromDevice(); + _accSyncRequest->read_from_device(); }}, { _postProcessExecutor , [this] { - _accSyncRequest->PostProcess(); + _accSyncRequest->post_process(); }}, }; } diff --git a/docs/snippets/movidius-programming-guide.cpp b/docs/snippets/movidius-programming-guide.cpp deleted file mode 100644 index 39f28ae254d..00000000000 --- a/docs/snippets/movidius-programming-guide.cpp +++ /dev/null @@ -1,36 +0,0 @@ -#include - -int main() { -InferenceEngine::Core core; -int numRequests = 42; -int i = 1; -auto network = core.ReadNetwork("sample.xml"); -auto executable_network = core.LoadNetwork(network, "CPU"); -//! [part0] -struct Request { - InferenceEngine::InferRequest inferRequest; - int frameidx; -}; -//! [part0] - -//! [part1] -// numRequests is the number of frames (max size, equal to the number of VPUs in use) -std::vector request(numRequests); -//! [part1] - -//! [part2] -// initialize infer request pointer – Consult IE API for more detail. -request[i].inferRequest = executable_network.CreateInferRequest(); -//! [part2] - -//! [part3] -// Run inference -request[i].inferRequest.StartAsync(); -//! [part3] - -//! [part4] -request[i].inferRequest.SetCompletionCallback([] () {}); -//! [part4] - -return 0; -} diff --git a/docs/snippets/nGraphTutorial.cpp b/docs/snippets/nGraphTutorial.cpp deleted file mode 100644 index e39e783d5eb..00000000000 --- a/docs/snippets/nGraphTutorial.cpp +++ /dev/null @@ -1,38 +0,0 @@ -#include -#include "ngraph/opsets/opset.hpp" -#include "ngraph/opsets/opset3.hpp" - - -int main() { -//! [part0] - -using namespace std; -using namespace ngraph; - -auto arg0 = make_shared(element::f32, Shape{7}); -auto arg1 = make_shared(element::f32, Shape{7}); -// Create an 'Add' operation with two inputs 'arg0' and 'arg1' -auto add0 = make_shared(arg0, arg1); -auto abs0 = make_shared(add0); -// Create a node whose inputs/attributes will be specified later -auto acos0 = make_shared(); -// Create a node using opset factories -auto add1 = shared_ptr(get_opset3().create("Add")); -// Set inputs to nodes explicitly -acos0->set_argument(0, add0); -add1->set_argument(0, acos0); -add1->set_argument(1, abs0); - -// Create a graph with one output (add1) and four inputs (arg0, arg1) -auto ng_function = make_shared(OutputVector{add1}, ParameterVector{arg0, arg1}); -// Run shape inference on the nodes -ng_function->validate_nodes_and_infer_types(); - -//! [part0] - -//! [part1] -InferenceEngine::CNNNetwork net (ng_function); -//! [part1] - -return 0; -} diff --git a/docs/snippets/ov_extensions.cpp b/docs/snippets/ov_extensions.cpp index 0abab9d3bfa..fbf5fa01635 100644 --- a/docs/snippets/ov_extensions.cpp +++ b/docs/snippets/ov_extensions.cpp @@ -2,20 +2,123 @@ // SPDX-License-Identifier: Apache-2.0 // #include -#include +//! [add_extension_header] +//#include +//! [add_extension_header] +//! [add_frontend_extension_header] +#include +//! [add_frontend_extension_header] + +//! [frontend_extension_Identity_header] +#include +//! [frontend_extension_Identity_header] + +//! [frontend_extension_ThresholdedReLU_header] +#include +//! [frontend_extension_ThresholdedReLU_header] + #include +//! [frontend_extension_CustomOperation] +class CustomOperation : public ov::op::Op { + + std::string attr1; + int attr2; + +public: + + OPENVINO_OP("CustomOperation"); + + bool visit_attributes(ov::AttributeVisitor& visitor) override { + visitor.on_attribute("attr1", attr1); + visitor.on_attribute("attr2", attr2); + return true; + } + + // ... implement other required methods + //! [frontend_extension_CustomOperation] + std::shared_ptr clone_with_new_inputs(const ov::OutputVector&) const override { return nullptr; } +}; + int main() { { //! [add_extension] ov::Core core; -// Use operation type to add operation extension + +// Use operation type to add operation extension core.add_extension(); -// or you can add operation extension to this method + +// or you can add operation extension object which is equivalent form core.add_extension(ov::OpExtension()); //! [add_extension] } { +ov::Core core; + +//! [add_frontend_extension] +// Register mapping for new frontends: FW's "TemplateIdentity" operation to TemplateExtension::Identity +core.add_extension(ov::frontend::OpExtension("Identity")); + +// Register more sophisticated mapping with decomposition +core.add_extension(ov::frontend::ConversionExtension( + "Identity", + [](const ov::frontend::NodeContext& context) { + // Arbitrary decomposition code here + // Return a vector of operation outputs + return ov::OutputVector{ std::make_shared(context.get_input(0)) }; + })); +//! [add_frontend_extension] +} +{ +//! [frontend_extension_Identity] +auto extension1 = ov::frontend::OpExtension("Identity"); + +// or even simpler if original FW type and OV type of operations match, that is "Identity" +auto extension2 = ov::frontend::OpExtension(); +//! [frontend_extension_Identity] + +//! [frontend_extension_read_model] +ov::Core core; +// Add arbitrary number of extensions before calling read_model method +core.add_extension(ov::frontend::OpExtension()); +core.read_model("/path/to/model.onnx"); +//! [frontend_extension_read_model] + +//! [frontend_extension_MyRelu] +core.add_extension(ov::frontend::OpExtension<>("Relu", "MyRelu")); +//! [frontend_extension_MyRelu] + +//! [frontend_extension_CustomOperation_as_is] +core.add_extension(ov::frontend::OpExtension()); +//! [frontend_extension_CustomOperation_as_is] + +//! [frontend_extension_CustomOperation_rename] +core.add_extension(ov::frontend::OpExtension( + { {"attr1", "fw_attr1"}, {"attr2", "fw_attr2"} }, + {} +)); +//! [frontend_extension_CustomOperation_rename] + +//! [frontend_extension_CustomOperation_rename_set] +core.add_extension(ov::frontend::OpExtension( + { {"attr1", "fw_attr1"} }, + { {"attr2", 5} } +)); +//! [frontend_extension_CustomOperation_rename_set] + +//! [frontend_extension_ThresholdedReLU] +core.add_extension(ov::frontend::ConversionExtension( + "ThresholdedReLU", + [](const ov::frontend::NodeContext& node) { + auto greater = std::make_shared( + node.get_input(0), + ov::opset8::Constant::create(ov::element::f32, {}, {node.get_attribute("alpha")})); + auto casted = std::make_shared(greater, ov::element::f32); + return ov::OutputVector{ std::make_shared(node.get_input(0), casted) }; + })); +//! [frontend_extension_ThresholdedReLU] +} +{ //! [add_extension_lib] ov::Core core; // Load extensions library to ov::Core diff --git a/docs/snippets/ov_extensions.py b/docs/snippets/ov_extensions.py index 4f53700c746..8694f08442d 100644 --- a/docs/snippets/ov_extensions.py +++ b/docs/snippets/ov_extensions.py @@ -8,8 +8,13 @@ import openvino.runtime as ov # Not implemented #! [add_extension] +#! [add_frontend_extension] +# Not implemented +#! [add_frontend_extension] + #! [add_extension_lib] core = ov.Core() -# Load extensions library to ov::Core +# Load extensions library to ov.Core core.add_extension("openvino_template_extension.so") #! [add_extension_lib] + diff --git a/docs/snippets/ov_properties_api.py b/docs/snippets/ov_properties_api.py index 232a52974a8..a501c0d960b 100644 --- a/docs/snippets/ov_properties_api.py +++ b/docs/snippets/ov_properties_api.py @@ -9,6 +9,10 @@ core = Core() available_devices = core.available_devices # [get_available_devices] +# [hetero_priorities] +device_priorites = core.get_property("HETERO", "MULTI_DEVICE_PRIORITIES") +# [hetero_priorities] + # [cpu_device_name] cpu_device_name = core.get_property("CPU", "FULL_DEVICE_NAME") # [cpu_device_name] @@ -22,7 +26,7 @@ compiled_model = core.compile_model(model, "CPU", config) # [optimal_number_of_infer_requests] compiled_model = core.compile_model(model, "CPU") -nireq = compiled_model.get_property("OPTIMAL_NUMBER_OF_INFER_REQUESTS"); +nireq = compiled_model.get_property("OPTIMAL_NUMBER_OF_INFER_REQUESTS") # [optimal_number_of_infer_requests] diff --git a/docs/template_extension/new/CMakeLists.txt b/docs/template_extension/new/CMakeLists.txt index 10371e33072..d461025c7f7 100644 --- a/docs/template_extension/new/CMakeLists.txt +++ b/docs/template_extension/new/CMakeLists.txt @@ -15,10 +15,4 @@ add_library(${TARGET_NAME} MODULE ${SRC}) target_compile_definitions(${TARGET_NAME} PRIVATE IMPLEMENT_OPENVINO_EXTENSION_API) target_link_libraries(${TARGET_NAME} PRIVATE openvino::runtime) - -# To map custom operation to framework -if(OpenVINO_Frontend_ONNX_FOUND) - target_link_libraries(${TARGET_NAME} PRIVATE openvino::frontend::onnx) - target_compile_definitions(${TARGET_NAME} PRIVATE OPENVINO_ONNX_FRONTEND_ENABLED) -endif() # [cmake:extension] diff --git a/docs/template_extension/new/identity.hpp b/docs/template_extension/new/identity.hpp index b8c5160014d..47a3efefcf9 100644 --- a/docs/template_extension/new/identity.hpp +++ b/docs/template_extension/new/identity.hpp @@ -7,11 +7,6 @@ //! [op:common_include] #include //! [op:common_include] -//! [op:frontend_include] -#ifdef OPENVINO_ONNX_FRONTEND_ENABLED -# include -#endif -//! [op:frontend_include] //! [op:header] namespace TemplateExtension { @@ -20,10 +15,6 @@ class Identity : public ov::op::Op { public: OPENVINO_OP("Identity"); -#ifdef OPENVINO_ONNX_FRONTEND_ENABLED - OPENVINO_FRAMEWORK_MAP(onnx) -#endif - Identity() = default; Identity(const ov::Output& arg); void validate_and_infer_types() override; diff --git a/docs/template_extension/new/ov_extension.cpp b/docs/template_extension/new/ov_extension.cpp index d2fa1e35361..770b57b02a7 100644 --- a/docs/template_extension/new/ov_extension.cpp +++ b/docs/template_extension/new/ov_extension.cpp @@ -4,6 +4,7 @@ #include #include +#include #include "identity.hpp" @@ -11,7 +12,12 @@ //! [ov_extension:entry_point] OPENVINO_CREATE_EXTENSIONS( std::vector({ - std::make_shared>() + + // Register operation itself, required to be read from IR + std::make_shared>(), + + // Register operaton mapping, required when converted from framework model format + std::make_shared>() })); //! [ov_extension:entry_point] // clang-format on diff --git a/samples/c/hello_classification/README.md b/samples/c/hello_classification/README.md index f4456353671..86cffce2c2f 100644 --- a/samples/c/hello_classification/README.md +++ b/samples/c/hello_classification/README.md @@ -41,7 +41,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/c/hello_nv12_input_classification/README.md b/samples/c/hello_nv12_input_classification/README.md index af0898330ce..3e5ac12a564 100644 --- a/samples/c/hello_nv12_input_classification/README.md +++ b/samples/c/hello_nv12_input_classification/README.md @@ -57,7 +57,7 @@ ffmpeg -i cat.jpg -pix_fmt nv12 cat.yuv > model to work with RGB order, you need to reconvert your model using the Model Optimizer tool > with `--reverse_input_channels` argument specified. For more information about the argument, > refer to **When to Reverse Input Channels** section of -> [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > - Before running the sample with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > > - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. diff --git a/samples/cpp/CMakeLists.txt b/samples/cpp/CMakeLists.txt index 3433dee0e5e..9a73582a73e 100644 --- a/samples/cpp/CMakeLists.txt +++ b/samples/cpp/CMakeLists.txt @@ -233,7 +233,7 @@ macro(ie_add_sample) endif() if(COMMAND ov_ncc_naming_style AND NOT c_sample) ov_ncc_naming_style(FOR_TARGET "${IE_SAMPLE_NAME}" - SOURCE_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}") + SOURCE_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}") endif() endmacro() diff --git a/samples/cpp/benchmark_app/README.md b/samples/cpp/benchmark_app/README.md index 921a6c87798..1e9bc3345de 100644 --- a/samples/cpp/benchmark_app/README.md +++ b/samples/cpp/benchmark_app/README.md @@ -10,11 +10,7 @@ Performance can be measured for two inference modes: latency- and throughput-ori Upon start-up, the application reads command-line parameters and loads a network and inputs (images/binary files) to the specified device. - **NOTE**: By default, OpenVINO™ Toolkit Samples, Tools and Demos expect input with BGR channels order. - If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application - or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. - For more information about the argument, refer to **When to Reverse Input Channels** section of - [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> **NOTE**: By default, OpenVINO™ Toolkit Samples, Tools and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). Device-specific execution parameters (number of streams, threads, and so on) can be either explicitly specified through the command line or left default. In the last case, the sample logic will select the values for the optimal throughput. @@ -150,7 +146,7 @@ If a model has mixed input types, input folder should contain all required files To run the tool, you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the OpenVINO IR (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > > The sample accepts models in ONNX format (.onnx) that do not require preprocessing. @@ -171,8 +167,12 @@ This section provides step-by-step instructions on how to run the Benchmark Tool ```sh omz_downloader --name googlenet-v1 -o ``` +<<<<<<< HEAD 3. Convert the model to the OpenVINO IR format. Run the Model Optimizer using the `mo` command with the path to the model, model format and output directory to generate the IR files: +======= +2. Convert the model to the OpenVINO IR format. Run the Model Optimizer using the `mo` command with the path to the model, model format (which must be FP32 for CPU and FPG) and output directory to generate the IR files: +>>>>>>> cf8ccb590a... Removed obsolete code snippets (#11061) ```sh mo --input_model /public/googlenet-v1/googlenet-v1.caffemodel --data_type FP32 --output_dir ``` @@ -243,6 +243,6 @@ Below are fragments of sample output static and dynamic networks: ``` ## See Also -* [Using Inference Engine Samples](../../../docs/OV_Runtime_UG/Samples_Overview.md) +* [Using OpenVINO Runtime Samples](../../../docs/OV_Runtime_UG/Samples_Overview.md) * [Model Optimizer](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) * [Model Downloader](@ref omz_tools_downloader) diff --git a/samples/cpp/benchmark_app/main.cpp b/samples/cpp/benchmark_app/main.cpp index 057296c2a45..21402a16202 100644 --- a/samples/cpp/benchmark_app/main.cpp +++ b/samples/cpp/benchmark_app/main.cpp @@ -88,7 +88,7 @@ static void next_step(const std::string additional_info = "") { static size_t step_id = 0; static const std::map step_names = { {1, "Parsing and validating input arguments"}, - {2, "Loading Inference Engine"}, + {2, "Loading OpenVINO Runtime"}, {3, "Setting device configuration"}, {4, "Reading network files"}, {5, "Resizing network to match image sizes and given batch"}, @@ -203,7 +203,7 @@ int main(int argc, char* argv[]) { /** This vector stores paths to the processed images with input names**/ auto inputFiles = parse_input_arguments(gflags::GetArgvs()); - // ----------------- 2. Loading the Inference Engine + // ----------------- 2. Loading the OpenVINO Runtime // ----------------------------------------------------------- next_step(); @@ -1098,7 +1098,7 @@ int main(int argc, char* argv[]) { if (!FLAGS_dump_config.empty()) { dump_config(FLAGS_dump_config, config); - slog::info << "Inference Engine configuration settings were dumped to " << FLAGS_dump_config << slog::endl; + slog::info << "OpenVINO Runtime configuration settings were dumped to " << FLAGS_dump_config << slog::endl; } if (!FLAGS_exec_graph_path.empty()) { diff --git a/samples/cpp/build_samples.sh b/samples/cpp/build_samples.sh index e3e97c7d599..c1c9536b5e1 100755 --- a/samples/cpp/build_samples.sh +++ b/samples/cpp/build_samples.sh @@ -4,7 +4,7 @@ # SPDX-License-Identifier: Apache-2.0 usage() { - echo "Build inference engine samples" + echo "Build OpenVINO Runtime samples" echo echo "Options:" echo " -h Print the help message" @@ -70,7 +70,7 @@ else fi if ! command -v cmake &>/dev/null; then - printf "\n\nCMAKE is not installed. It is required to build Inference Engine samples. Please install it. \n\n" + printf "\n\nCMAKE is not installed. It is required to build OpenVINO Runtime samples. Please install it. \n\n" exit 1 fi diff --git a/samples/cpp/build_samples_msvc.bat b/samples/cpp/build_samples_msvc.bat index bfa707d958c..259906a9420 100644 --- a/samples/cpp/build_samples_msvc.bat +++ b/samples/cpp/build_samples_msvc.bat @@ -52,7 +52,7 @@ if exist "%SAMPLE_BUILD_DIR%\CMakeCache.txt" del "%SAMPLE_BUILD_DIR%\CMakeCache. cd /d "%ROOT_DIR%" && cmake -E make_directory "%SAMPLE_BUILD_DIR%" && cd /d "%SAMPLE_BUILD_DIR%" && cmake -G "Visual Studio 16 2019" -A %PLATFORM% "%ROOT_DIR%" echo. -echo ###############^|^| Build Inference Engine samples using MS Visual Studio (MSBuild.exe) ^|^|############### +echo ###############^|^| Build OpenVINO Runtime samples using MS Visual Studio (MSBuild.exe) ^|^|############### echo. echo cmake --build . --config Release @@ -65,7 +65,7 @@ echo Done. exit /b :usage -echo Build inference engine samples +echo Build OpenVINO Runtime samples echo. echo Options: echo -h Print the help message diff --git a/samples/cpp/classification_sample_async/README.md b/samples/cpp/classification_sample_async/README.md index a126f1401bb..db6f261f3be 100644 --- a/samples/cpp/classification_sample_async/README.md +++ b/samples/cpp/classification_sample_async/README.md @@ -74,7 +74,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/cpp/hello_classification/README.md b/samples/cpp/hello_classification/README.md index 61106b20807..e0d18b8f117 100644 --- a/samples/cpp/hello_classification/README.md +++ b/samples/cpp/hello_classification/README.md @@ -45,7 +45,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/cpp/hello_nv12_input_classification/README.md b/samples/cpp/hello_nv12_input_classification/README.md index dffe7dcca46..de1a0e64987 100644 --- a/samples/cpp/hello_nv12_input_classification/README.md +++ b/samples/cpp/hello_nv12_input_classification/README.md @@ -61,7 +61,7 @@ ffmpeg -i cat.jpg -pix_fmt nv12 car.yuv > model to work with RGB order, you need to reconvert your model using the Model Optimizer tool > with `--reverse_input_channels` argument specified. For more information about the argument, > refer to **When to Reverse Input Channels** section of -> [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > > - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. diff --git a/samples/cpp/hello_reshape_ssd/README.md b/samples/cpp/hello_reshape_ssd/README.md index 583d8608d44..2e81eb4f0a9 100644 --- a/samples/cpp/hello_reshape_ssd/README.md +++ b/samples/cpp/hello_reshape_ssd/README.md @@ -46,7 +46,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and Demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/cpp/hello_reshape_ssd/main.cpp b/samples/cpp/hello_reshape_ssd/main.cpp index 2ae7d457cc2..0ada1b50cd2 100644 --- a/samples/cpp/hello_reshape_ssd/main.cpp +++ b/samples/cpp/hello_reshape_ssd/main.cpp @@ -36,7 +36,7 @@ int main(int argc, char* argv[]) { const std::string device_name{argv[3]}; // ------------------------------------------------------------------- - // Step 1. Initialize inference engine core + // Step 1. Initialize OpenVINO Runtime core ov::Core core; // ------------------------------------------------------------------- diff --git a/samples/cpp/speech_sample/main.cpp b/samples/cpp/speech_sample/main.cpp index 7ebc4adde8c..861058bf94c 100644 --- a/samples/cpp/speech_sample/main.cpp +++ b/samples/cpp/speech_sample/main.cpp @@ -32,13 +32,13 @@ using namespace ov::preprocess; /** - * @brief The entry point for inference engine automatic speech recognition sample + * @brief The entry point for OpenVINO Runtime automatic speech recognition sample * @file speech_sample/main.cpp * @example speech_sample/main.cpp */ int main(int argc, char* argv[]) { try { - // ------------------------------ Get Inference Engine version ---------------------------------------------- + // ------------------------------ Get OpenVINO Runtime version ---------------------------------------------- slog::info << "OpenVINO runtime: " << ov::get_openvino_version() << slog::endl; // ------------------------------ Parsing and validation of input arguments --------------------------------- @@ -79,7 +79,7 @@ int main(int argc, char* argv[]) { } size_t numInputFiles(inputFiles.size()); - // --------------------------- Step 1. Initialize inference engine core and read model + // --------------------------- Step 1. Initialize OpenVINO Runtime core and read model // ------------------------------------- ov::Core core; slog::info << "Loading model files:" << slog::endl << FLAGS_m << slog::endl; diff --git a/samples/python/classification_sample_async/README.md b/samples/python/classification_sample_async/README.md index 02be3098f3f..2dc96d598c8 100644 --- a/samples/python/classification_sample_async/README.md +++ b/samples/python/classification_sample_async/README.md @@ -60,7 +60,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/python/hello_classification/README.md b/samples/python/hello_classification/README.md index 671a14ed9ea..26810748899 100644 --- a/samples/python/hello_classification/README.md +++ b/samples/python/hello_classification/README.md @@ -38,7 +38,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/python/hello_reshape_ssd/README.md b/samples/python/hello_reshape_ssd/README.md index 6fbad8b1cf6..bc7e2400325 100644 --- a/samples/python/hello_reshape_ssd/README.md +++ b/samples/python/hello_reshape_ssd/README.md @@ -38,7 +38,7 @@ To run the sample, you need specify a model and image: > **NOTES**: > -> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](@ref openvino_docs_MO_DG_Additional_Optimization_Use_Cases). +> - By default, OpenVINO™ Toolkit Samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Embedding Preprocessing Computation](../../../docs/MO_DG/prepare_model/convert_model/Converting_Model.md). > > - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). > diff --git a/samples/python/model_creation_sample/model_creation_sample.py b/samples/python/model_creation_sample/model_creation_sample.py index b652f06fae3..33b796d83da 100755 --- a/samples/python/model_creation_sample/model_creation_sample.py +++ b/samples/python/model_creation_sample/model_creation_sample.py @@ -133,7 +133,7 @@ def main(): device_name = sys.argv[2] labels = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] number_top = 1 - # ---------------------------Step 1. Initialize inference engine core-------------------------------------------------- + # ---------------------------Step 1. Initialize OpenVINO Runtime Core-------------------------------------------------- log.info('Creating OpenVINO Runtime Core') core = Core() diff --git a/src/core/tests/extension.cpp b/src/core/tests/extension.cpp index e1974c42033..187fa3e9481 100644 --- a/src/core/tests/extension.cpp +++ b/src/core/tests/extension.cpp @@ -21,7 +21,7 @@ TEST(extension, load_extension) { TEST(extension, load_extension_and_cast) { std::vector so_extensions = ov::detail::load_extensions(get_extension_path()); - ASSERT_EQ(1, so_extensions.size()); + ASSERT_LE(1, so_extensions.size()); std::vector extensions; std::vector> so; for (const auto& ext : so_extensions) { @@ -31,7 +31,7 @@ TEST(extension, load_extension_and_cast) { } } so_extensions.clear(); - EXPECT_EQ(1, extensions.size()); + EXPECT_LE(1, extensions.size()); EXPECT_NE(nullptr, dynamic_cast(extensions[0].get())); EXPECT_NE(nullptr, std::dynamic_pointer_cast(extensions[0])); extensions.clear(); diff --git a/tools/compile_tool/README.md b/tools/compile_tool/README.md index 787a25a5648..fe8351a898c 100644 --- a/tools/compile_tool/README.md +++ b/tools/compile_tool/README.md @@ -1,19 +1,23 @@ # Compile Tool {#openvino_inference_engine_tools_compile_tool_README} -Compile tool is a C++ application that enables you to compile a network for inference on a specific device and export it to a binary file. -With the Compile Tool, you can compile a network using supported Inference Engine plugins on a machine that doesn't have the physical device connected and then transfer a generated file to any machine with the target inference device available. +Compile tool is a C++ application that enables you to compile a model for inference on a specific device and export the compiled representation to a binary file. +With the Compile Tool, you can compile a model using supported OpenVINO Runtime devices on a machine that doesn't have the physical device connected and then transfer a generated file to any machine with the target inference device available. See the [Features support matrix](../../docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md) to understand which device support import / export functionality. -The tool compiles networks for the following target devices using corresponding Inference Engine plugins: +The tool compiles networks for the following target devices using corresponding OpenVINO Runtime plugins: * Intel® Neural Compute Stick 2 (MYRIAD plugin) - The tool is delivered as an executable file that can be run on both Linux* and Windows*. The tool is located in the `/tools/compile_tool` directory. -The workflow of the Compile tool is as follows: +## Workflow of the Compile tool -1. First, the application reads command-line parameters and loads a network to the Inference Engine device. -2. The application exports a blob with the compiled network and writes it to the output file. +1. First, the application reads command-line parameters and loads a model to the OpenVINO Runtime device. +2. Then the application exports a blob with the compiled model and writes it to the output file. + +Also, the compile_tool supports the following capabilities: +- Embedding [layout](../../docs/OV_Runtime_UG/layout_overview.md) and precision conversions (see [Optimize Preprocessing](../../docs/OV_Runtime_UG/preprocessing_overview.md)). To compile the model with advanced preprocessing capabilities, refer to [Use Case - Integrate and Save Preprocessing Steps Into IR](../../docs/OV_Runtime_UG/preprocessing_usecase_save.md) which shows how to have all the preprocessing in the compiled blob. +- Compile blobs for OpenVINO Runtime API 2.0 by default or for Inference Engine API with explicit option `-ov_api_1_0` +- Accepts device specific options for customizing the compilation process ## Run the Compile Tool @@ -85,5 +89,5 @@ To import a blob with the network from a generated file into your application, u ```cpp ov::Core ie; std::ifstream file{"model_name.blob"}; -ov::CompiledModel compiled_model = ie.import_model(file, "MYRIAD", {}); +ov::CompiledModel compiled_model = ie.import_model(file, "MYRIAD"); ``` diff --git a/tools/deployment_manager/configs/darwin.json b/tools/deployment_manager/configs/darwin.json index e4be1041860..72f8468fa0b 100644 --- a/tools/deployment_manager/configs/darwin.json +++ b/tools/deployment_manager/configs/darwin.json @@ -25,6 +25,7 @@ "runtime/lib/intel64/Release/libopenvino_ir_frontend.dylib", "runtime/lib/intel64/Release/libopenvino_onnx_frontend.dylib", "runtime/lib/intel64/Release/libopenvino_paddle_frontend.dylib", + "runtime/lib/intel64/Release/libopenvino_tensorflow_fe.dylib", "runtime/lib/intel64/Release/plugins.xml", "runtime/3rdparty/tbb" ] diff --git a/tools/deployment_manager/configs/linux.json b/tools/deployment_manager/configs/linux.json index 995fd0975ae..ca8c8ab5273 100644 --- a/tools/deployment_manager/configs/linux.json +++ b/tools/deployment_manager/configs/linux.json @@ -31,6 +31,7 @@ "runtime/lib/intel64/libopenvino_ir_frontend.so", "runtime/lib/intel64/libopenvino_onnx_frontend.so", "runtime/lib/intel64/libopenvino_paddle_frontend.so", + "runtime/lib/intel64/libopenvino_tensorflow_fe.so", "runtime/lib/intel64/plugins.xml", "runtime/3rdparty/tbb" ] diff --git a/tools/deployment_manager/configs/windows.json b/tools/deployment_manager/configs/windows.json index c7e677a51a6..8369fa042cd 100644 --- a/tools/deployment_manager/configs/windows.json +++ b/tools/deployment_manager/configs/windows.json @@ -25,6 +25,7 @@ "runtime/bin/intel64/Release/openvino_ir_frontend.dll", "runtime/bin/intel64/Release/openvino_onnx_frontend.dll", "runtime/bin/intel64/Release/openvino_paddle_frontend.dll", + "runtime/lib/intel64/Release/openvino_tensorflow_fe.dll", "runtime/bin/intel64/Release/plugins.xml", "runtime/3rdparty/tbb" ]