[PyOV][DOCS] Update inference documentation with shared memory flags (#18561)

This commit is contained in:
Jan Iwaszkiewicz
2023-07-18 13:15:10 +02:00
committed by GitHub
parent d21296bcc1
commit ec26537b3e
2 changed files with 17 additions and 11 deletions

View File

@@ -26,16 +26,17 @@ The ``CompiledModel`` class provides the ``__call__`` method that runs a single
:fragment: [direct_inference]
Shared Memory on Inputs
#######################
Shared Memory on Inputs and Outputs
###################################
While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``,
OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory".
Specify the ``shared_memory`` flag to enable or disable this feature.
The "Shared Memory" mode may be beneficial when inputs are large and copying
data is considered an expensive operation. This feature creates shared ``Tensor``
Specify the ``share_inputs`` and ``share_outputs`` flag to enable or disable this feature.
The "Shared Memory" mode may be beneficial when inputs or outputs are large and copying data is considered an expensive operation.
This feature creates shared ``Tensor``
instances with the "zero-copy" approach, reducing overhead of setting inputs
to minimum. Example usage:
to minimum. For outputs this feature creates numpy views on data. Example usage:
.. doxygensnippet:: docs/snippets/ov_python_inference.py
@@ -45,13 +46,14 @@ to minimum. Example usage:
.. note::
"Shared Memory" is enabled by default in ``CompiledModel.__call__``.
"Shared Memory" on inputs is enabled by default in ``CompiledModel.__call__``.
For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``,
it is required to set the flag to ``True`` manually.
"Shared Memory" on outputs is disabled by default in all sequential inference methods (``CompiledModel.__call__`` and ``InferRequest.infer``). It is required to set the flag to ``True`` manually.
.. warning::
When data is being shared, all modifications may affect inputs of the inference!
When data is being shared, all modifications (including subsequent inference calls) may affect inputs and outputs of the inference!
Use this feature with caution, especially in multi-threaded/parallel code,
where data can be modified outside of the function's control flow.

View File

@@ -32,9 +32,13 @@ results_1 = compiled_model({"input_0": data_2, "input_1": data_3})
request = compiled_model.create_infer_request()
#! [shared_memory_inference]
# Data can be shared
_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
# Data can be shared only on inputs
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=True)
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_inputs=True)
# Data can be shared only on outputs
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_outputs=True)
# Or both flags can be combined to achieve desired behavior
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=False, share_outputs=True)
#! [shared_memory_inference]
time_in_sec = 2.0