From 88cb428763ec436fdaa6fef50cef419c6ad40265 Mon Sep 17 00:00:00 2001
From: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
Date: Fri, 21 Apr 2023 15:22:33 +0200
Subject: [PATCH] [PyOV][DOCS] Added Python advanced inference documentation 
 (#17090)

Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
---
 docs/OV_Runtime_UG/Python_API_exclusives.md   | 54 ++++++++---
 docs/OV_Runtime_UG/Python_API_inference.md    | 89 +++++++++++++++++++
 .../integrate_with_your_application.md        |  1 +
 docs/snippets/ov_python_exclusives.py         | 20 ++++-
 docs/snippets/ov_python_inference.py          | 69 ++++++++++++++
 src/bindings/python/README.md                 |  1 +
 6 files changed, 219 insertions(+), 15 deletions(-)
 create mode 100644 docs/OV_Runtime_UG/Python_API_inference.md
 create mode 100644 docs/snippets/ov_python_inference.py

diff --git a/docs/OV_Runtime_UG/Python_API_exclusives.md b/docs/OV_Runtime_UG/Python_API_exclusives.md
index 06d8bce1ddf..7d15c474d35 100644
--- a/docs/OV_Runtime_UG/Python_API_exclusives.md
+++ b/docs/OV_Runtime_UG/Python_API_exclusives.md
@@ -1,5 +1,7 @@
 # OpenVINO™ Python API Exclusives {#openvino_docs_OV_UG_Python_API_exclusives}
 
+@sphinxdirective
+
 OpenVINO™ Runtime Python API offers additional features and helpers to enhance user experience. The main goal of Python API is to provide user-friendly and simple yet powerful tool for Python users.
 
 Easier Model Compilation
@@ -9,7 +11,7 @@ Easier Model Compilation
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [auto_compilation]
 
 
@@ -20,7 +22,7 @@ Besides functions aligned to C++ API, some of them have their Python counterpart
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [properties_example]
 
 
@@ -33,7 +35,7 @@ Python API allows passing data as tensors. The ``Tensor`` object holds a copy of
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [tensor_basics]
 
 
@@ -44,7 +46,7 @@ Shared Memory Mode
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [tensor_shared_mode]
 
 
@@ -57,7 +59,7 @@ All infer methods allow users to pass data as popular *numpy* arrays, gathered i
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [passing_numpy_array]
 
 
@@ -65,7 +67,7 @@ Results from inference can be obtained in various ways:
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [getting_results]
 
 
@@ -76,10 +78,34 @@ Python API provides different synchronous calls to infer model, which block the
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [sync_infer]
 
 
+Inference Results - OVDict
+++++++++++++++++++++++++++
+
+
+Synchronous calls return a special data structure called ``OVDict``. It can be compared to a "frozen dictionary". There are various ways of accessing the object's elements:
+
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: python
+   :fragment: [ov_dict]
+
+
+.. note:: 
+   
+   It is possible to convert ``OVDict`` to a native dictionary using the ``to_dict()`` method.
+
+
+.. warning:: 
+
+   Using ``to_dict()`` results in losing access via strings and integers. Additionally, 
+   it performs a shallow copy, thus any modifications may affect the original 
+   object as well.
+
+
 AsyncInferQueue
 ++++++++++++++++++++
 
@@ -91,7 +117,7 @@ The ``start_async`` function call is not required to be synchronized - it waits
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [asyncinferqueue]
 
 
@@ -102,7 +128,7 @@ After the call to ``wait_all``, jobs and their data can be safely accessed. Acqu
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [asyncinferqueue_access]
 
 
@@ -115,7 +141,7 @@ The callback of ``AsyncInferQueue`` is uniform for every job. When executed, GIL
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [asyncinferqueue_set_callback]
 
 
@@ -127,7 +153,7 @@ To create an input tensor with such element types, you may need to pack your dat
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [packing_data]
 
 
@@ -135,7 +161,7 @@ To extract low precision values from a tensor into the *numpy* array, you can us
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [unpacking]
 
 
@@ -146,7 +172,7 @@ Some functions in Python API release the Global Lock Interpreter (GIL) while run
 
 
 .. doxygensnippet:: docs/snippets/ov_python_exclusives.py
-   :language: cpp
+   :language: python
    :fragment: [releasing_gil]
 
 
@@ -178,3 +204,5 @@ List of Functions that Release the GIL
 * openvino.runtime.InferRequest.query_state
 * openvino.runtime.Model.reshape
 * openvino.preprocess.PrePostProcessor.build
+
+@endsphinxdirective
diff --git a/docs/OV_Runtime_UG/Python_API_inference.md b/docs/OV_Runtime_UG/Python_API_inference.md
new file mode 100644
index 00000000000..965f56d6ed7
--- /dev/null
+++ b/docs/OV_Runtime_UG/Python_API_inference.md
@@ -0,0 +1,89 @@
+# OpenVINO™ Runtime Python API Advanced Inference {#openvino_docs_OV_UG_Python_API_inference}
+
+@sphinxdirective
+
+.. warning:: 
+   
+   All mentioned methods are very dependent on a specific hardware and software set-up. 
+   Consider conducting your own experiments with various models and different input/output
+   sizes. The methods presented here are not universal, they may or may not apply to the 
+   specific pipeline. Please consider all tradeoffs and avoid premature optimizations. 
+
+
+Direct Inference with ``CompiledModel``
+#######################################
+
+The ``CompiledModel`` class provides the ``__call__`` method that runs a single synchronous inference using the given model. In addition to a compact code, all future calls to ``CompiledModel.__call__`` will result in less overhead, as the object reuses the already created ``InferRequest``.
+
+
+.. doxygensnippet:: docs/snippets/ov_python_inference.py
+   :language: python
+   :fragment: [direct_inference]
+
+
+Shared Memory on Inputs
+#######################
+
+While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``, 
+OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory". 
+Specify the ``shared_memory`` flag to enable or disable this feature. 
+The "Shared Memory" mode may be beneficial when inputs are large and copying 
+data is considered an expensive operation. This feature creates shared ``Tensor`` 
+instances with the "zero-copy" approach, reducing overhead of setting inputs 
+to minimum. Example usage:
+
+
+.. doxygensnippet:: docs/snippets/ov_python_inference.py
+   :language: python
+   :fragment: [shared_memory_inference]
+
+
+.. note:: 
+
+   "Shared Memory" is enabled by default in ``CompiledModel.__call__``. 
+   For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``, 
+   it is required to set the flag to ``True`` manually.
+
+.. warning:: 
+
+   When data is being shared, all modifications may affect inputs of the inference! 
+   Use this feature with caution, especially in multi-threaded/parallel code,
+   where data can be modified outside of the function's control flow.
+
+
+Hiding Latency with Asynchronous Calls
+######################################
+
+Asynchronous calls allow to hide latency to optimize overall runtime of a codebase. 
+For example, ``InferRequest.start_async`` releases the GIL and provides non-blocking call. 
+It is beneficial to process other calls while waiting to finish compute-intensive inference.
+Example usage:
+
+.. doxygensnippet:: docs/snippets/ov_python_inference.py
+   :language: python
+   :fragment: [hiding_latency]
+
+
+.. note:: 
+   
+   It is up to the user/developer to optimize the flow in a codebase to benefit from potential parallelization.
+
+
+"Postponed Return" with Asynchronous Calls
+##########################################
+
+"Postponed Return" is a practice to omit overhead of ``OVDict``, which is always returned from
+synchronous calls. "Postponed Return" could be applied when:
+* only a part of output data is required. For example, only one specific output is significant
+  in a given pipeline step and all outputs are large, thus, expensive to copy.
+* data is not required "now". For example, it can be later extracted inside the pipeline as
+  a part of latency hiding.
+* data return is not required at all. For example, models are being chained with the pure ``Tensor`` interface.
+
+
+.. doxygensnippet:: docs/snippets/ov_python_inference.py
+   :language: python
+   :fragment: [no_return_inference]
+   
+@endsphinxdirective
+
diff --git a/docs/OV_Runtime_UG/integrate_with_your_application.md b/docs/OV_Runtime_UG/integrate_with_your_application.md
index 858c0525f9b..80a439d99a1 100644
--- a/docs/OV_Runtime_UG/integrate_with_your_application.md
+++ b/docs/OV_Runtime_UG/integrate_with_your_application.md
@@ -8,6 +8,7 @@
 
    openvino_docs_OV_UG_Model_Representation
    openvino_docs_OV_UG_Infer_request
+   openvino_docs_OV_UG_Python_API_inference
    openvino_docs_OV_UG_Python_API_exclusives
    openvino_docs_MO_DG_TensorFlow_Frontend
 
diff --git a/docs/snippets/ov_python_exclusives.py b/docs/snippets/ov_python_exclusives.py
index a816acc9d4e..a8c35285969 100644
--- a/docs/snippets/ov_python_exclusives.py
+++ b/docs/snippets/ov_python_exclusives.py
@@ -12,10 +12,11 @@ compiled_model = ov.compile_model("model.xml")
 #! [properties_example]
 core = ov.Core()
 
-input_a = ov.opset8.parameter([8])
-res = ov.opset8.absolute(input_a)
+input_a = ov.opset11.parameter([8], name="input_a")
+res = ov.opset11.absolute(input_a)
 model = ov.Model(res, [input_a])
 compiled = core.compile_model(model, "CPU")
+model.outputs[0].tensor.set_names({"result_0"})  # Add name for Output
 
 print(model.inputs)
 print(model.outputs)
@@ -78,6 +79,21 @@ results = infer_request.infer(inputs={0: data})
 results = compiled_model(inputs={0: data})
 #! [sync_infer]
 
+#! [ov_dict]
+results = compiled_model(inputs={0: data})
+
+# Access via string
+_ = results["result_0"]
+# Access via index
+_ = results[0]
+# Access via output port
+_ = results[compiled_model.outputs[0]]
+# Use iterator over keys
+_ = results[next(iter(results))]
+# Iterate over values
+_ = next(iter(results.values()))
+#! [ov_dict]
+
 #! [asyncinferqueue]
 core = ov.Core()
 
diff --git a/docs/snippets/ov_python_inference.py b/docs/snippets/ov_python_inference.py
new file mode 100644
index 00000000000..8801271e2a4
--- /dev/null
+++ b/docs/snippets/ov_python_inference.py
@@ -0,0 +1,69 @@
+# Copyright (C) 2018-2023 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import numpy as np
+import openvino.runtime as ov
+
+INPUT_SIZE = 1_000_000  # Use bigger values if necessary, i.e.: 300_000_000
+
+input_0 = ov.opset11.parameter([INPUT_SIZE], name="input_0")
+input_1 = ov.opset11.parameter([INPUT_SIZE], name="input_1")
+add_inputs = ov.opset11.add(input_0, input_1)
+res = ov.opset11.reduce_sum(add_inputs, reduction_axes=0, name="reduced")
+model = ov.Model(res, [input_0, input_1], name="my_model")
+model.outputs[0].tensor.set_names({"reduced_result"})  # Add name for Output
+
+core = ov.Core()
+compiled_model = core.compile_model(model, device_name="CPU")
+
+data_0 = np.array([0.1] * INPUT_SIZE, dtype=np.float32)
+data_1 = np.array([-0.1] * INPUT_SIZE, dtype=np.float32)
+
+data_2 = np.array([0.2] * INPUT_SIZE, dtype=np.float32)
+data_3 = np.array([-0.2] * INPUT_SIZE, dtype=np.float32)
+
+#! [direct_inference]
+# Calling CompiledModel creates and saves InferRequest object
+results_0 = compiled_model({"input_0": data_0, "input_1": data_1})
+# Second call reuses previously created InferRequest object
+results_1 = compiled_model({"input_0": data_2, "input_1": data_3})
+#! [direct_inference]
+
+request = compiled_model.create_infer_request()
+
+#! [shared_memory_inference]
+# Data can be shared
+_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
+_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
+#! [shared_memory_inference]
+
+time_in_sec = 2.0
+
+#! [hiding_latency]
+import time
+
+# Long running function
+def run(time_in_sec):
+    time.sleep(time_in_sec)
+
+# No latency hiding
+results = request.infer({"input_0": data_0, "input_1": data_1})[0]
+run(time_in_sec)
+
+# Hiding latency
+request.start_async({"input_0": data_0, "input_1": data_1})
+run()
+request.wait()
+results = request.get_output_tensor(0).data  # Gather data from InferRequest
+#! [hiding_latency]
+
+#! [no_return_inference]
+# Standard approach
+results = request.infer({"input_0": data_0, "input_1": data_1})[0]
+
+# "Postponed Return" approach
+request.start_async({"input_0": data_0, "input_1": data_1})
+request.wait()
+results = request.get_output_tensor(0).data  # Gather data "on demand" from InferRequest
+#! [no_return_inference]
+
diff --git a/src/bindings/python/README.md b/src/bindings/python/README.md
index 8ecbd9dbbfb..76a6f646f81 100644
--- a/src/bindings/python/README.md
+++ b/src/bindings/python/README.md
@@ -43,6 +43,7 @@ If you want to contribute to OpenVINO Python API, here is the list of learning m
 * [OpenVINO™ README](../../../README.md)
 * [OpenVINO™ Core Components](../../README.md)
 * [OpenVINO™ Python API Reference](https://docs.openvino.ai/latest/api/ie_python_api/api.html)
+* [OpenVINO™ Python API Advanced Inference](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_inference.html)
 * [OpenVINO™ Python API Exclusives](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Python_API_exclusives.html)
 * [pybind11 repository](https://github.com/pybind/pybind11)
 * [pybind11 documentation](https://pybind11.readthedocs.io/en/stable/)