From 88cb428763ec436fdaa6fef50cef419c6ad40265 Mon Sep 17 00:00:00 2001 From: Jan Iwaszkiewicz Date: Fri, 21 Apr 2023 15:22:33 +0200 Subject: [PATCH] [PyOV][DOCS] Added Python advanced inference documentation (#17090) Co-authored-by: Karol Blaszczak --- docs/OV_Runtime_UG/Python_API_exclusives.md | 54 ++++++++--- docs/OV_Runtime_UG/Python_API_inference.md | 89 +++++++++++++++++++ .../integrate_with_your_application.md | 1 + docs/snippets/ov_python_exclusives.py | 20 ++++- docs/snippets/ov_python_inference.py | 69 ++++++++++++++ src/bindings/python/README.md | 1 + 6 files changed, 219 insertions(+), 15 deletions(-) create mode 100644 docs/OV_Runtime_UG/Python_API_inference.md create mode 100644 docs/snippets/ov_python_inference.py diff --git a/docs/OV_Runtime_UG/Python_API_exclusives.md b/docs/OV_Runtime_UG/Python_API_exclusives.md index 06d8bce1ddf..7d15c474d35 100644 --- a/docs/OV_Runtime_UG/Python_API_exclusives.md +++ b/docs/OV_Runtime_UG/Python_API_exclusives.md @@ -1,5 +1,7 @@ # OpenVINO™ Python API Exclusives {#openvino_docs_OV_UG_Python_API_exclusives} +@sphinxdirective + OpenVINO™ Runtime Python API offers additional features and helpers to enhance user experience. The main goal of Python API is to provide user-friendly and simple yet powerful tool for Python users. Easier Model Compilation @@ -9,7 +11,7 @@ Easier Model Compilation .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [auto_compilation] @@ -20,7 +22,7 @@ Besides functions aligned to C++ API, some of them have their Python counterpart .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [properties_example] @@ -33,7 +35,7 @@ Python API allows passing data as tensors. The ``Tensor`` object holds a copy of .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [tensor_basics] @@ -44,7 +46,7 @@ Shared Memory Mode .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [tensor_shared_mode] @@ -57,7 +59,7 @@ All infer methods allow users to pass data as popular *numpy* arrays, gathered i .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [passing_numpy_array] @@ -65,7 +67,7 @@ Results from inference can be obtained in various ways: .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [getting_results] @@ -76,10 +78,34 @@ Python API provides different synchronous calls to infer model, which block the .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [sync_infer] +Inference Results - OVDict +++++++++++++++++++++++++++ + + +Synchronous calls return a special data structure called ``OVDict``. It can be compared to a "frozen dictionary". There are various ways of accessing the object's elements: + + +.. doxygensnippet:: docs/snippets/ov_python_exclusives.py + :language: python + :fragment: [ov_dict] + + +.. note:: + + It is possible to convert ``OVDict`` to a native dictionary using the ``to_dict()`` method. + + +.. warning:: + + Using ``to_dict()`` results in losing access via strings and integers. Additionally, + it performs a shallow copy, thus any modifications may affect the original + object as well. + + AsyncInferQueue ++++++++++++++++++++ @@ -91,7 +117,7 @@ The ``start_async`` function call is not required to be synchronized - it waits .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [asyncinferqueue] @@ -102,7 +128,7 @@ After the call to ``wait_all``, jobs and their data can be safely accessed. Acqu .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [asyncinferqueue_access] @@ -115,7 +141,7 @@ The callback of ``AsyncInferQueue`` is uniform for every job. When executed, GIL .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [asyncinferqueue_set_callback] @@ -127,7 +153,7 @@ To create an input tensor with such element types, you may need to pack your dat .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [packing_data] @@ -135,7 +161,7 @@ To extract low precision values from a tensor into the *numpy* array, you can us .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [unpacking] @@ -146,7 +172,7 @@ Some functions in Python API release the Global Lock Interpreter (GIL) while run .. doxygensnippet:: docs/snippets/ov_python_exclusives.py - :language: cpp + :language: python :fragment: [releasing_gil] @@ -178,3 +204,5 @@ List of Functions that Release the GIL * openvino.runtime.InferRequest.query_state * openvino.runtime.Model.reshape * openvino.preprocess.PrePostProcessor.build + +@endsphinxdirective diff --git a/docs/OV_Runtime_UG/Python_API_inference.md b/docs/OV_Runtime_UG/Python_API_inference.md new file mode 100644 index 00000000000..965f56d6ed7 --- /dev/null +++ b/docs/OV_Runtime_UG/Python_API_inference.md @@ -0,0 +1,89 @@ +# OpenVINO™ Runtime Python API Advanced Inference {#openvino_docs_OV_UG_Python_API_inference} + +@sphinxdirective + +.. warning:: + + All mentioned methods are very dependent on a specific hardware and software set-up. + Consider conducting your own experiments with various models and different input/output + sizes. The methods presented here are not universal, they may or may not apply to the + specific pipeline. Please consider all tradeoffs and avoid premature optimizations. + + +Direct Inference with ``CompiledModel`` +####################################### + +The ``CompiledModel`` class provides the ``__call__`` method that runs a single synchronous inference using the given model. In addition to a compact code, all future calls to ``CompiledModel.__call__`` will result in less overhead, as the object reuses the already created ``InferRequest``. + + +.. doxygensnippet:: docs/snippets/ov_python_inference.py + :language: python + :fragment: [direct_inference] + + +Shared Memory on Inputs +####################### + +While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``, +OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory". +Specify the ``shared_memory`` flag to enable or disable this feature. +The "Shared Memory" mode may be beneficial when inputs are large and copying +data is considered an expensive operation. This feature creates shared ``Tensor`` +instances with the "zero-copy" approach, reducing overhead of setting inputs +to minimum. Example usage: + + +.. doxygensnippet:: docs/snippets/ov_python_inference.py + :language: python + :fragment: [shared_memory_inference] + + +.. note:: + + "Shared Memory" is enabled by default in ``CompiledModel.__call__``. + For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``, + it is required to set the flag to ``True`` manually. + +.. warning:: + + When data is being shared, all modifications may affect inputs of the inference! + Use this feature with caution, especially in multi-threaded/parallel code, + where data can be modified outside of the function's control flow. + + +Hiding Latency with Asynchronous Calls +###################################### + +Asynchronous calls allow to hide latency to optimize overall runtime of a codebase. +For example, ``InferRequest.start_async`` releases the GIL and provides non-blocking call. +It is beneficial to process other calls while waiting to finish compute-intensive inference. +Example usage: + +.. doxygensnippet:: docs/snippets/ov_python_inference.py + :language: python + :fragment: [hiding_latency] + + +.. note:: + + It is up to the user/developer to optimize the flow in a codebase to benefit from potential parallelization. + + +"Postponed Return" with Asynchronous Calls +########################################## + +"Postponed Return" is a practice to omit overhead of ``OVDict``, which is always returned from +synchronous calls. "Postponed Return" could be applied when: +* only a part of output data is required. For example, only one specific output is significant + in a given pipeline step and all outputs are large, thus, expensive to copy. +* data is not required "now". For example, it can be later extracted inside the pipeline as + a part of latency hiding. +* data return is not required at all. For example, models are being chained with the pure ``Tensor`` interface. + + +.. doxygensnippet:: docs/snippets/ov_python_inference.py + :language: python + :fragment: [no_return_inference] + +@endsphinxdirective + diff --git a/docs/OV_Runtime_UG/integrate_with_your_application.md b/docs/OV_Runtime_UG/integrate_with_your_application.md index 858c0525f9b..80a439d99a1 100644 --- a/docs/OV_Runtime_UG/integrate_with_your_application.md +++ b/docs/OV_Runtime_UG/integrate_with_your_application.md @@ -8,6 +8,7 @@ openvino_docs_OV_UG_Model_Representation openvino_docs_OV_UG_Infer_request + openvino_docs_OV_UG_Python_API_inference openvino_docs_OV_UG_Python_API_exclusives openvino_docs_MO_DG_TensorFlow_Frontend diff --git a/docs/snippets/ov_python_exclusives.py b/docs/snippets/ov_python_exclusives.py index a816acc9d4e..a8c35285969 100644 --- a/docs/snippets/ov_python_exclusives.py +++ b/docs/snippets/ov_python_exclusives.py @@ -12,10 +12,11 @@ compiled_model = ov.compile_model("model.xml") #! [properties_example] core = ov.Core() -input_a = ov.opset8.parameter([8]) -res = ov.opset8.absolute(input_a) +input_a = ov.opset11.parameter([8], name="input_a") +res = ov.opset11.absolute(input_a) model = ov.Model(res, [input_a]) compiled = core.compile_model(model, "CPU") +model.outputs[0].tensor.set_names({"result_0"}) # Add name for Output print(model.inputs) print(model.outputs) @@ -78,6 +79,21 @@ results = infer_request.infer(inputs={0: data}) results = compiled_model(inputs={0: data}) #! [sync_infer] +#! [ov_dict] +results = compiled_model(inputs={0: data}) + +# Access via string +_ = results["result_0"] +# Access via index +_ = results[0] +# Access via output port +_ = results[compiled_model.outputs[0]] +# Use iterator over keys +_ = results[next(iter(results))] +# Iterate over values +_ = next(iter(results.values())) +#! [ov_dict] + #! [asyncinferqueue] core = ov.Core() diff --git a/docs/snippets/ov_python_inference.py b/docs/snippets/ov_python_inference.py new file mode 100644 index 00000000000..8801271e2a4 --- /dev/null +++ b/docs/snippets/ov_python_inference.py @@ -0,0 +1,69 @@ +# Copyright (C) 2018-2023 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import numpy as np +import openvino.runtime as ov + +INPUT_SIZE = 1_000_000 # Use bigger values if necessary, i.e.: 300_000_000 + +input_0 = ov.opset11.parameter([INPUT_SIZE], name="input_0") +input_1 = ov.opset11.parameter([INPUT_SIZE], name="input_1") +add_inputs = ov.opset11.add(input_0, input_1) +res = ov.opset11.reduce_sum(add_inputs, reduction_axes=0, name="reduced") +model = ov.Model(res, [input_0, input_1], name="my_model") +model.outputs[0].tensor.set_names({"reduced_result"}) # Add name for Output + +core = ov.Core() +compiled_model = core.compile_model(model, device_name="CPU") + +data_0 = np.array([0.1] * INPUT_SIZE, dtype=np.float32) +data_1 = np.array([-0.1] * INPUT_SIZE, dtype=np.float32) + +data_2 = np.array([0.2] * INPUT_SIZE, dtype=np.float32) +data_3 = np.array([-0.2] * INPUT_SIZE, dtype=np.float32) + +#! [direct_inference] +# Calling CompiledModel creates and saves InferRequest object +results_0 = compiled_model({"input_0": data_0, "input_1": data_1}) +# Second call reuses previously created InferRequest object +results_1 = compiled_model({"input_0": data_2, "input_1": data_3}) +#! [direct_inference] + +request = compiled_model.create_infer_request() + +#! [shared_memory_inference] +# Data can be shared +_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True) +_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True) +#! [shared_memory_inference] + +time_in_sec = 2.0 + +#! [hiding_latency] +import time + +# Long running function +def run(time_in_sec): + time.sleep(time_in_sec) + +# No latency hiding +results = request.infer({"input_0": data_0, "input_1": data_1})[0] +run(time_in_sec) + +# Hiding latency +request.start_async({"input_0": data_0, "input_1": data_1}) +run() +request.wait() +results = request.get_output_tensor(0).data # Gather data from InferRequest +#! [hiding_latency] + +#! [no_return_inference] +# Standard approach +results = request.infer({"input_0": data_0, "input_1": data_1})[0] + +# "Postponed Return" approach +request.start_async({"input_0": data_0, "input_1": data_1}) +request.wait() +results = request.get_output_tensor(0).data # Gather data "on demand" from InferRequest +#! [no_return_inference] + diff --git a/src/bindings/python/README.md b/src/bindings/python/README.md index 8ecbd9dbbfb..76a6f646f81 100644 --- a/src/bindings/python/README.md +++ b/src/bindings/python/README.md @@ -43,6 +43,7 @@ If you want to contribute to OpenVINO Python API, here is the list of learning m * [OpenVINO™ README](../../../README.md) * [OpenVINO™ Core Components](../../README.md) * [OpenVINO™ Python API Reference](https://docs.openvino.ai/latest/api/ie_python_api/api.html) +* [OpenVINO™ Python API Advanced Inference](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_inference.html) * [OpenVINO™ Python API Exclusives](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_exclusives.html) * [pybind11 repository](https://github.com/pybind/pybind11) * [pybind11 documentation](https://pybind11.readthedocs.io/en/stable/)