[PyOV][DOCS] Added Python advanced inference documentation (#17090)
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
This commit is contained in:
parent
c4b155edc2
commit
88cb428763
@ -1,5 +1,7 @@
|
|||||||
# OpenVINO™ Python API Exclusives {#openvino_docs_OV_UG_Python_API_exclusives}
|
# OpenVINO™ Python API Exclusives {#openvino_docs_OV_UG_Python_API_exclusives}
|
||||||
|
|
||||||
|
@sphinxdirective
|
||||||
|
|
||||||
OpenVINO™ Runtime Python API offers additional features and helpers to enhance user experience. The main goal of Python API is to provide user-friendly and simple yet powerful tool for Python users.
|
OpenVINO™ Runtime Python API offers additional features and helpers to enhance user experience. The main goal of Python API is to provide user-friendly and simple yet powerful tool for Python users.
|
||||||
|
|
||||||
Easier Model Compilation
|
Easier Model Compilation
|
||||||
@ -9,7 +11,7 @@ Easier Model Compilation
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [auto_compilation]
|
:fragment: [auto_compilation]
|
||||||
|
|
||||||
|
|
||||||
@ -20,7 +22,7 @@ Besides functions aligned to C++ API, some of them have their Python counterpart
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [properties_example]
|
:fragment: [properties_example]
|
||||||
|
|
||||||
|
|
||||||
@ -33,7 +35,7 @@ Python API allows passing data as tensors. The ``Tensor`` object holds a copy of
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [tensor_basics]
|
:fragment: [tensor_basics]
|
||||||
|
|
||||||
|
|
||||||
@ -44,7 +46,7 @@ Shared Memory Mode
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [tensor_shared_mode]
|
:fragment: [tensor_shared_mode]
|
||||||
|
|
||||||
|
|
||||||
@ -57,7 +59,7 @@ All infer methods allow users to pass data as popular *numpy* arrays, gathered i
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [passing_numpy_array]
|
:fragment: [passing_numpy_array]
|
||||||
|
|
||||||
|
|
||||||
@ -65,7 +67,7 @@ Results from inference can be obtained in various ways:
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [getting_results]
|
:fragment: [getting_results]
|
||||||
|
|
||||||
|
|
||||||
@ -76,10 +78,34 @@ Python API provides different synchronous calls to infer model, which block the
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [sync_infer]
|
:fragment: [sync_infer]
|
||||||
|
|
||||||
|
|
||||||
|
Inference Results - OVDict
|
||||||
|
++++++++++++++++++++++++++
|
||||||
|
|
||||||
|
|
||||||
|
Synchronous calls return a special data structure called ``OVDict``. It can be compared to a "frozen dictionary". There are various ways of accessing the object's elements:
|
||||||
|
|
||||||
|
|
||||||
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
|
:language: python
|
||||||
|
:fragment: [ov_dict]
|
||||||
|
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
It is possible to convert ``OVDict`` to a native dictionary using the ``to_dict()`` method.
|
||||||
|
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
Using ``to_dict()`` results in losing access via strings and integers. Additionally,
|
||||||
|
it performs a shallow copy, thus any modifications may affect the original
|
||||||
|
object as well.
|
||||||
|
|
||||||
|
|
||||||
AsyncInferQueue
|
AsyncInferQueue
|
||||||
++++++++++++++++++++
|
++++++++++++++++++++
|
||||||
|
|
||||||
@ -91,7 +117,7 @@ The ``start_async`` function call is not required to be synchronized - it waits
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [asyncinferqueue]
|
:fragment: [asyncinferqueue]
|
||||||
|
|
||||||
|
|
||||||
@ -102,7 +128,7 @@ After the call to ``wait_all``, jobs and their data can be safely accessed. Acqu
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [asyncinferqueue_access]
|
:fragment: [asyncinferqueue_access]
|
||||||
|
|
||||||
|
|
||||||
@ -115,7 +141,7 @@ The callback of ``AsyncInferQueue`` is uniform for every job. When executed, GIL
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [asyncinferqueue_set_callback]
|
:fragment: [asyncinferqueue_set_callback]
|
||||||
|
|
||||||
|
|
||||||
@ -127,7 +153,7 @@ To create an input tensor with such element types, you may need to pack your dat
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [packing_data]
|
:fragment: [packing_data]
|
||||||
|
|
||||||
|
|
||||||
@ -135,7 +161,7 @@ To extract low precision values from a tensor into the *numpy* array, you can us
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [unpacking]
|
:fragment: [unpacking]
|
||||||
|
|
||||||
|
|
||||||
@ -146,7 +172,7 @@ Some functions in Python API release the Global Lock Interpreter (GIL) while run
|
|||||||
|
|
||||||
|
|
||||||
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
|
||||||
:language: cpp
|
:language: python
|
||||||
:fragment: [releasing_gil]
|
:fragment: [releasing_gil]
|
||||||
|
|
||||||
|
|
||||||
@ -178,3 +204,5 @@ List of Functions that Release the GIL
|
|||||||
* openvino.runtime.InferRequest.query_state
|
* openvino.runtime.InferRequest.query_state
|
||||||
* openvino.runtime.Model.reshape
|
* openvino.runtime.Model.reshape
|
||||||
* openvino.preprocess.PrePostProcessor.build
|
* openvino.preprocess.PrePostProcessor.build
|
||||||
|
|
||||||
|
@endsphinxdirective
|
||||||
|
89
docs/OV_Runtime_UG/Python_API_inference.md
Normal file
89
docs/OV_Runtime_UG/Python_API_inference.md
Normal file
@ -0,0 +1,89 @@
|
|||||||
|
# OpenVINO™ Runtime Python API Advanced Inference {#openvino_docs_OV_UG_Python_API_inference}
|
||||||
|
|
||||||
|
@sphinxdirective
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
All mentioned methods are very dependent on a specific hardware and software set-up.
|
||||||
|
Consider conducting your own experiments with various models and different input/output
|
||||||
|
sizes. The methods presented here are not universal, they may or may not apply to the
|
||||||
|
specific pipeline. Please consider all tradeoffs and avoid premature optimizations.
|
||||||
|
|
||||||
|
|
||||||
|
Direct Inference with ``CompiledModel``
|
||||||
|
#######################################
|
||||||
|
|
||||||
|
The ``CompiledModel`` class provides the ``__call__`` method that runs a single synchronous inference using the given model. In addition to a compact code, all future calls to ``CompiledModel.__call__`` will result in less overhead, as the object reuses the already created ``InferRequest``.
|
||||||
|
|
||||||
|
|
||||||
|
.. doxygensnippet:: docs/snippets/ov_python_inference.py
|
||||||
|
:language: python
|
||||||
|
:fragment: [direct_inference]
|
||||||
|
|
||||||
|
|
||||||
|
Shared Memory on Inputs
|
||||||
|
#######################
|
||||||
|
|
||||||
|
While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``,
|
||||||
|
OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory".
|
||||||
|
Specify the ``shared_memory`` flag to enable or disable this feature.
|
||||||
|
The "Shared Memory" mode may be beneficial when inputs are large and copying
|
||||||
|
data is considered an expensive operation. This feature creates shared ``Tensor``
|
||||||
|
instances with the "zero-copy" approach, reducing overhead of setting inputs
|
||||||
|
to minimum. Example usage:
|
||||||
|
|
||||||
|
|
||||||
|
.. doxygensnippet:: docs/snippets/ov_python_inference.py
|
||||||
|
:language: python
|
||||||
|
:fragment: [shared_memory_inference]
|
||||||
|
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
"Shared Memory" is enabled by default in ``CompiledModel.__call__``.
|
||||||
|
For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``,
|
||||||
|
it is required to set the flag to ``True`` manually.
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
When data is being shared, all modifications may affect inputs of the inference!
|
||||||
|
Use this feature with caution, especially in multi-threaded/parallel code,
|
||||||
|
where data can be modified outside of the function's control flow.
|
||||||
|
|
||||||
|
|
||||||
|
Hiding Latency with Asynchronous Calls
|
||||||
|
######################################
|
||||||
|
|
||||||
|
Asynchronous calls allow to hide latency to optimize overall runtime of a codebase.
|
||||||
|
For example, ``InferRequest.start_async`` releases the GIL and provides non-blocking call.
|
||||||
|
It is beneficial to process other calls while waiting to finish compute-intensive inference.
|
||||||
|
Example usage:
|
||||||
|
|
||||||
|
.. doxygensnippet:: docs/snippets/ov_python_inference.py
|
||||||
|
:language: python
|
||||||
|
:fragment: [hiding_latency]
|
||||||
|
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
It is up to the user/developer to optimize the flow in a codebase to benefit from potential parallelization.
|
||||||
|
|
||||||
|
|
||||||
|
"Postponed Return" with Asynchronous Calls
|
||||||
|
##########################################
|
||||||
|
|
||||||
|
"Postponed Return" is a practice to omit overhead of ``OVDict``, which is always returned from
|
||||||
|
synchronous calls. "Postponed Return" could be applied when:
|
||||||
|
* only a part of output data is required. For example, only one specific output is significant
|
||||||
|
in a given pipeline step and all outputs are large, thus, expensive to copy.
|
||||||
|
* data is not required "now". For example, it can be later extracted inside the pipeline as
|
||||||
|
a part of latency hiding.
|
||||||
|
* data return is not required at all. For example, models are being chained with the pure ``Tensor`` interface.
|
||||||
|
|
||||||
|
|
||||||
|
.. doxygensnippet:: docs/snippets/ov_python_inference.py
|
||||||
|
:language: python
|
||||||
|
:fragment: [no_return_inference]
|
||||||
|
|
||||||
|
@endsphinxdirective
|
||||||
|
|
@ -8,6 +8,7 @@
|
|||||||
|
|
||||||
openvino_docs_OV_UG_Model_Representation
|
openvino_docs_OV_UG_Model_Representation
|
||||||
openvino_docs_OV_UG_Infer_request
|
openvino_docs_OV_UG_Infer_request
|
||||||
|
openvino_docs_OV_UG_Python_API_inference
|
||||||
openvino_docs_OV_UG_Python_API_exclusives
|
openvino_docs_OV_UG_Python_API_exclusives
|
||||||
openvino_docs_MO_DG_TensorFlow_Frontend
|
openvino_docs_MO_DG_TensorFlow_Frontend
|
||||||
|
|
||||||
|
@ -12,10 +12,11 @@ compiled_model = ov.compile_model("model.xml")
|
|||||||
#! [properties_example]
|
#! [properties_example]
|
||||||
core = ov.Core()
|
core = ov.Core()
|
||||||
|
|
||||||
input_a = ov.opset8.parameter([8])
|
input_a = ov.opset11.parameter([8], name="input_a")
|
||||||
res = ov.opset8.absolute(input_a)
|
res = ov.opset11.absolute(input_a)
|
||||||
model = ov.Model(res, [input_a])
|
model = ov.Model(res, [input_a])
|
||||||
compiled = core.compile_model(model, "CPU")
|
compiled = core.compile_model(model, "CPU")
|
||||||
|
model.outputs[0].tensor.set_names({"result_0"}) # Add name for Output
|
||||||
|
|
||||||
print(model.inputs)
|
print(model.inputs)
|
||||||
print(model.outputs)
|
print(model.outputs)
|
||||||
@ -78,6 +79,21 @@ results = infer_request.infer(inputs={0: data})
|
|||||||
results = compiled_model(inputs={0: data})
|
results = compiled_model(inputs={0: data})
|
||||||
#! [sync_infer]
|
#! [sync_infer]
|
||||||
|
|
||||||
|
#! [ov_dict]
|
||||||
|
results = compiled_model(inputs={0: data})
|
||||||
|
|
||||||
|
# Access via string
|
||||||
|
_ = results["result_0"]
|
||||||
|
# Access via index
|
||||||
|
_ = results[0]
|
||||||
|
# Access via output port
|
||||||
|
_ = results[compiled_model.outputs[0]]
|
||||||
|
# Use iterator over keys
|
||||||
|
_ = results[next(iter(results))]
|
||||||
|
# Iterate over values
|
||||||
|
_ = next(iter(results.values()))
|
||||||
|
#! [ov_dict]
|
||||||
|
|
||||||
#! [asyncinferqueue]
|
#! [asyncinferqueue]
|
||||||
core = ov.Core()
|
core = ov.Core()
|
||||||
|
|
||||||
|
69
docs/snippets/ov_python_inference.py
Normal file
69
docs/snippets/ov_python_inference.py
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
# Copyright (C) 2018-2023 Intel Corporation
|
||||||
|
# SPDX-License-Identifier: Apache-2.0
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import openvino.runtime as ov
|
||||||
|
|
||||||
|
INPUT_SIZE = 1_000_000 # Use bigger values if necessary, i.e.: 300_000_000
|
||||||
|
|
||||||
|
input_0 = ov.opset11.parameter([INPUT_SIZE], name="input_0")
|
||||||
|
input_1 = ov.opset11.parameter([INPUT_SIZE], name="input_1")
|
||||||
|
add_inputs = ov.opset11.add(input_0, input_1)
|
||||||
|
res = ov.opset11.reduce_sum(add_inputs, reduction_axes=0, name="reduced")
|
||||||
|
model = ov.Model(res, [input_0, input_1], name="my_model")
|
||||||
|
model.outputs[0].tensor.set_names({"reduced_result"}) # Add name for Output
|
||||||
|
|
||||||
|
core = ov.Core()
|
||||||
|
compiled_model = core.compile_model(model, device_name="CPU")
|
||||||
|
|
||||||
|
data_0 = np.array([0.1] * INPUT_SIZE, dtype=np.float32)
|
||||||
|
data_1 = np.array([-0.1] * INPUT_SIZE, dtype=np.float32)
|
||||||
|
|
||||||
|
data_2 = np.array([0.2] * INPUT_SIZE, dtype=np.float32)
|
||||||
|
data_3 = np.array([-0.2] * INPUT_SIZE, dtype=np.float32)
|
||||||
|
|
||||||
|
#! [direct_inference]
|
||||||
|
# Calling CompiledModel creates and saves InferRequest object
|
||||||
|
results_0 = compiled_model({"input_0": data_0, "input_1": data_1})
|
||||||
|
# Second call reuses previously created InferRequest object
|
||||||
|
results_1 = compiled_model({"input_0": data_2, "input_1": data_3})
|
||||||
|
#! [direct_inference]
|
||||||
|
|
||||||
|
request = compiled_model.create_infer_request()
|
||||||
|
|
||||||
|
#! [shared_memory_inference]
|
||||||
|
# Data can be shared
|
||||||
|
_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
|
||||||
|
_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
|
||||||
|
#! [shared_memory_inference]
|
||||||
|
|
||||||
|
time_in_sec = 2.0
|
||||||
|
|
||||||
|
#! [hiding_latency]
|
||||||
|
import time
|
||||||
|
|
||||||
|
# Long running function
|
||||||
|
def run(time_in_sec):
|
||||||
|
time.sleep(time_in_sec)
|
||||||
|
|
||||||
|
# No latency hiding
|
||||||
|
results = request.infer({"input_0": data_0, "input_1": data_1})[0]
|
||||||
|
run(time_in_sec)
|
||||||
|
|
||||||
|
# Hiding latency
|
||||||
|
request.start_async({"input_0": data_0, "input_1": data_1})
|
||||||
|
run()
|
||||||
|
request.wait()
|
||||||
|
results = request.get_output_tensor(0).data # Gather data from InferRequest
|
||||||
|
#! [hiding_latency]
|
||||||
|
|
||||||
|
#! [no_return_inference]
|
||||||
|
# Standard approach
|
||||||
|
results = request.infer({"input_0": data_0, "input_1": data_1})[0]
|
||||||
|
|
||||||
|
# "Postponed Return" approach
|
||||||
|
request.start_async({"input_0": data_0, "input_1": data_1})
|
||||||
|
request.wait()
|
||||||
|
results = request.get_output_tensor(0).data # Gather data "on demand" from InferRequest
|
||||||
|
#! [no_return_inference]
|
||||||
|
|
@ -43,6 +43,7 @@ If you want to contribute to OpenVINO Python API, here is the list of learning m
|
|||||||
* [OpenVINO™ README](../../../README.md)
|
* [OpenVINO™ README](../../../README.md)
|
||||||
* [OpenVINO™ Core Components](../../README.md)
|
* [OpenVINO™ Core Components](../../README.md)
|
||||||
* [OpenVINO™ Python API Reference](https://docs.openvino.ai/latest/api/ie_python_api/api.html)
|
* [OpenVINO™ Python API Reference](https://docs.openvino.ai/latest/api/ie_python_api/api.html)
|
||||||
|
* [OpenVINO™ Python API Advanced Inference](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_inference.html)
|
||||||
* [OpenVINO™ Python API Exclusives](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_exclusives.html)
|
* [OpenVINO™ Python API Exclusives](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_exclusives.html)
|
||||||
* [pybind11 repository](https://github.com/pybind/pybind11)
|
* [pybind11 repository](https://github.com/pybind/pybind11)
|
||||||
* [pybind11 documentation](https://pybind11.readthedocs.io/en/stable/)
|
* [pybind11 documentation](https://pybind11.readthedocs.io/en/stable/)
|
||||||
|
Loading…
Reference in New Issue
Block a user