[PyOV][DOCS] Update inference documentation with shared memory flags (#18561)

2023-07-18 13:15:10 +02:00
parent d21296bcc1
commit ec26537b3e
2 changed files with 17 additions and 11 deletions
--- a/docs/OV_Runtime_UG/Python_API_inference.md
+++ b/docs/OV_Runtime_UG/Python_API_inference.md
@@ -26,16 +26,17 @@ The ``CompiledModel`` class provides the ``__call__`` method that runs a single
   :fragment: [direct_inference]


-Shared Memory on Inputs
-#######################
+Shared Memory on Inputs and Outputs
+###################################

 While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``, 
 OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory". 
-Specify the ``shared_memory`` flag to enable or disable this feature. 
-The "Shared Memory" mode may be beneficial when inputs are large and copying 
-data is considered an expensive operation. This feature creates shared ``Tensor`` 
+Specify the ``share_inputs`` and ``share_outputs`` flag to enable or disable this feature. 
+The "Shared Memory" mode may be beneficial when inputs or outputs are large and copying data is considered an expensive operation.
+
+This feature creates shared ``Tensor`` 
 instances with the "zero-copy" approach, reducing overhead of setting inputs 
-to minimum. Example usage:
+to minimum. For outputs this feature creates numpy views on data. Example usage:


 .. doxygensnippet:: docs/snippets/ov_python_inference.py
@@ -45,13 +46,14 @@ to minimum. Example usage:

 .. note:: 

-   "Shared Memory" is enabled by default in ``CompiledModel.__call__``. 
+   "Shared Memory" on inputs is enabled by default in ``CompiledModel.__call__``. 
   For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``, 
   it is required to set the flag to ``True`` manually.
+   "Shared Memory" on outputs is disabled by default in all sequential inference methods (``CompiledModel.__call__`` and ``InferRequest.infer``). It is required to set the flag to ``True`` manually.

 .. warning:: 

-   When data is being shared, all modifications may affect inputs of the inference! 
+   When data is being shared, all modifications (including subsequent inference calls) may affect inputs and outputs of the inference! 
   Use this feature with caution, especially in multi-threaded/parallel code,
   where data can be modified outside of the function's control flow.

--- a/docs/snippets/ov_python_inference.py
+++ b/docs/snippets/ov_python_inference.py
@@ -32,9 +32,13 @@ results_1 = compiled_model({"input_0": data_2, "input_1": data_3})
 request = compiled_model.create_infer_request()

 #! [shared_memory_inference]
-# Data can be shared
-_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
-_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
+# Data can be shared only on inputs
+_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=True)
+_ = request.infer({"input_0": data_0, "input_1": data_1}, share_inputs=True)
+# Data can be shared only on outputs
+_ = request.infer({"input_0": data_0, "input_1": data_1}, share_outputs=True)
+# Or both flags can be combined to achieve desired behavior
+_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=False, share_outputs=True)
 #! [shared_memory_inference]

 time_in_sec = 2.0