[PyOV][DOCS] Update inference documentation with shared memory flags (#18561)
This commit is contained in:
@@ -26,16 +26,17 @@ The ``CompiledModel`` class provides the ``__call__`` method that runs a single
|
||||
:fragment: [direct_inference]
|
||||
|
||||
|
||||
Shared Memory on Inputs
|
||||
#######################
|
||||
Shared Memory on Inputs and Outputs
|
||||
###################################
|
||||
|
||||
While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``,
|
||||
OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory".
|
||||
Specify the ``shared_memory`` flag to enable or disable this feature.
|
||||
The "Shared Memory" mode may be beneficial when inputs are large and copying
|
||||
data is considered an expensive operation. This feature creates shared ``Tensor``
|
||||
Specify the ``share_inputs`` and ``share_outputs`` flag to enable or disable this feature.
|
||||
The "Shared Memory" mode may be beneficial when inputs or outputs are large and copying data is considered an expensive operation.
|
||||
|
||||
This feature creates shared ``Tensor``
|
||||
instances with the "zero-copy" approach, reducing overhead of setting inputs
|
||||
to minimum. Example usage:
|
||||
to minimum. For outputs this feature creates numpy views on data. Example usage:
|
||||
|
||||
|
||||
.. doxygensnippet:: docs/snippets/ov_python_inference.py
|
||||
@@ -45,13 +46,14 @@ to minimum. Example usage:
|
||||
|
||||
.. note::
|
||||
|
||||
"Shared Memory" is enabled by default in ``CompiledModel.__call__``.
|
||||
"Shared Memory" on inputs is enabled by default in ``CompiledModel.__call__``.
|
||||
For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``,
|
||||
it is required to set the flag to ``True`` manually.
|
||||
"Shared Memory" on outputs is disabled by default in all sequential inference methods (``CompiledModel.__call__`` and ``InferRequest.infer``). It is required to set the flag to ``True`` manually.
|
||||
|
||||
.. warning::
|
||||
|
||||
When data is being shared, all modifications may affect inputs of the inference!
|
||||
When data is being shared, all modifications (including subsequent inference calls) may affect inputs and outputs of the inference!
|
||||
Use this feature with caution, especially in multi-threaded/parallel code,
|
||||
where data can be modified outside of the function's control flow.
|
||||
|
||||
|
||||
@@ -32,9 +32,13 @@ results_1 = compiled_model({"input_0": data_2, "input_1": data_3})
|
||||
request = compiled_model.create_infer_request()
|
||||
|
||||
#! [shared_memory_inference]
|
||||
# Data can be shared
|
||||
_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
|
||||
_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
|
||||
# Data can be shared only on inputs
|
||||
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=True)
|
||||
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_inputs=True)
|
||||
# Data can be shared only on outputs
|
||||
_ = request.infer({"input_0": data_0, "input_1": data_1}, share_outputs=True)
|
||||
# Or both flags can be combined to achieve desired behavior
|
||||
_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=False, share_outputs=True)
|
||||
#! [shared_memory_inference]
|
||||
|
||||
time_in_sec = 2.0
|
||||
|
||||
Reference in New Issue
Block a user