[Auto PLUGIN] update Auto docs (#10889)

* update Auto docs Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * update python snippets Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * remove vpu, fix a mistaken in python code Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * update MYRIAD device full name Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * update API name old API use name Inference Engine API NEW API usen name OpenVINO Runtime API 2.0 Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * update tab name, and code format Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * fix AUTO4 format issue Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * update set_property code Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * auto draft Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * mv code into .cpp and .py modify the devicelist part accoding to the review Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * remove priority list in code and document modify the begning of the document remove perfomance data remove old API use compile_model instead of set_property add a image about cpu accelerate Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * fix mis print and code is not match document Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * try to fix doc build issue Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com> * fix snippets code compile issue Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
2022-03-19 23:25:35 +08:00
parent 76fde1f7b0
commit 72e8661157
10 changed files with 346 additions and 332 deletions
--- a/docs/OV_Runtime_UG/auto_device_selection.md
+++ b/docs/OV_Runtime_UG/auto_device_selection.md
@@ -10,41 +10,41 @@

@endsphinxdirective

-The Auto-Device plugin, or AUTO, is a virtual device which automatically selects the processing unit to use for inference with OpenVINO™. It chooses from a list of available devices defined by the user and aims at finding the most suitable hardware for the given model. The best device is chosen using the following logic: 
+Auto Device (or `AUTO` in short) is a new special "virtual" or "proxy" device in the OpenVINO toolkit, it doesn’t bind to a specific type of HW device. AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device.  It does this by self-discovering all available accelerators & capabilities in the system, matching to the user’s performance requirements by respecting new “hints” configuration API to dynamically optimize for latency or throughput respectively. Developer can write application once and deploy anywhere.
+For developer who want to limit inference on specific HW candidates, AUTO also provide device priority list as optional property. After developer set device priority list, AUTO will not discover all available accelerators in the system and only try device in list with priority order.

-1. Check which supported devices are available. 
-2. Check the precision of the input model (for detailed information on precisions read more on the [OPTIMIZATION_CAPABILITIES metric](../IE_PLUGIN_DG/Plugin.md)) 
-3. From the priority list, select the first device capable of supporting the given precision. 
-4. If the network’s precision is FP32 but there is no device capable of supporting it, offload the network to a device supporting FP16. 
+AUTO always choose the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds.
+If priority list is set, AUTO only select devices according to the list.

-@sphinxdirective
-+----------+-------------------------------------------------+-------------------------------------+
-| Choice   | | Supported                                     | | Supported                         |
-| Priority | | Device                                        | | model precision                   |
-+==========+=================================================+=====================================+
-| 1        | | dGPU                                          | FP32, FP16, INT8, BIN               |
-|          | | (e.g. Intel® Iris® Xe MAX)                    |                                     |
-+----------+-------------------------------------------------+-------------------------------------+
-| 2        | | VPUX                                          | INT8                                |
-|          | | (e.g. Intel® Movidius® VPU 3700VE)            |                                     |
-+----------+-------------------------------------------------+-------------------------------------+
-| 3        | | iGPU                                          | FP32, FP16, BIN,                    |
-|          | | (e.g. Intel® UHD Graphics 620 (iGPU))         |                                     |
-+----------+-------------------------------------------------+-------------------------------------+
-| 4        | | Intel® Neural Compute Stick 2 (Intel® NCS2)   | FP16                                |
-|          |                                                 |                                     |
-+----------+-------------------------------------------------+-------------------------------------+
-| 5        | | Intel® CPU                                    | FP32, FP16, INT8, BIN               |
-|          | | (e.g. Intel® Core™ i7-1165G7)                 |                                     |
-+----------+-------------------------------------------------+-------------------------------------+
-@endsphinxdirective
+The best device is chosen using the following logic:

-To put it simply, when loading the network to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds. For example: 
-If you have dGPU in your system, it will be selected for most jobs (first on the priority list and supports multiple precisions). But if you want to run a WINOGRAD-enabled IR, your CPU will be selected (WINOGRAD optimization is not supported by dGPU). If you have Myriad and IA CPU in your system, Myriad will be selected for FP16 models, but IA CPU will be chosen for FP32 ones.  
+1. Check which supported devices are available.
+2. Check the precision of the input model (for detailed information on precisions read more on the `ov::device::capabilities`) 
+3. Select the first device capable of supporting the given precision, as presented in the table below.
+4. If the model’s precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.

-What is important, **AUTO always starts inference with the CPU**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in loading the network, GPU being the best example, do not impede inference at its initial stages. 
+----------+------------------------------------------------------+-------------------------------------+
+| Choice   || Supported                                           || Supported                          |
+| Priority || Device                                              || model precision                    |
+==========+======================================================+=====================================+
+| 1        || dGPU                                                | FP32, FP16, INT8, BIN               |
+|          || (e.g. Intel® Iris® Xe MAX)                          |                                     |
+----------+------------------------------------------------------+-------------------------------------+
+| 2        || iGPU                                                | FP32, FP16, BIN                     |
+|          || (e.g. Intel® UHD Graphics 620 (iGPU))               |                                     |
+----------+------------------------------------------------------+-------------------------------------+
+| 3        || Intel® Movidius™ Myriad™ X VPU                      | FP16                                |
+|          || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2))  |                                     |
+----------+------------------------------------------------------+-------------------------------------+
+| 4        || Intel® CPU                                          | FP32, FP16, INT8, BIN               |
+|          || (e.g. Intel® Core™ i7-1165G7)                       |                                     |
+----------+------------------------------------------------------+-------------------------------------+

-This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to load the network and perform the first inference) is reduced when using AUTO. For example: 
+What is important, **AUTO starts inference with the CPU by default except the priority list is set and there is no CPU in it**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in compile the model, GPU being the best example, do not impede inference at its initial stages. 
+
+![autoplugin_accelerate]
+
+This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example: 

@sphinxdirective
 .. code-block:: sh
@@ -52,15 +52,13 @@ This mechanism can be easily observed in our Benchmark Application sample ([see
   ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128
@endsphinxdirective 

-first-inference latency: **2594.29 ms + 9.21 ms** 
-
@sphinxdirective
 .. code-block:: sh

-   ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO:CPU,GPU -niter 128
+   ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO -niter 128
@endsphinxdirective 

-first-inference latency: **173.13 ms + 13.20 ms**
+Assume there are CPU and GPU on the machine, first-inference latency of "AUTO" will be better than "GPU".

@sphinxdirective
 .. note::
@@ -69,45 +67,32 @@ first-inference latency: **173.13 ms + 13.20 ms**

 ## Using the Auto-Device Plugin 

-Inference with AUTO is configured similarly to other plugins: first you configure devices, then load a network to the plugin, and finally, execute inference. 
+Inference with AUTO is configured similarly to other plugins: compile the model on the plugin whth configuration, and finally, execute inference. 

 Following the OpenVINO™ naming convention, the Auto-Device plugin is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options: 

@sphinxdirective
-+-------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| Property                | Property values                               | Description                                               |
-+=========================+===============================================+===========================================================+
-| <device candidate list> | | AUTO: <device names>                        | | Lists the devices available for selection.              |
-|                         | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
-|                         | |                                             | | from high to low.                                       |
-|                         | |                                             | | If not specified, “AUTO” will be used as default        |
-|                         | |                                             | | and all devices will be included.                       |
-+-------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::device:priorities   | | device names                                | | Specifies the devices for Auto-Device plugin to select. |
-|                         | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
-|                         | |                                             | | from high to low.                                       |
-|                         | |                                             | | This configuration is optional.                         |
-+-------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::hint                | | THROUGHPUT                                  | | Specifies the performance mode preferred                |
-|                         | | LATENCY                                     | | by the application.                                     |
-+-------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::hint:model_priority | | MODEL_PRIORITY_HIGH                         | | Indicates the priority for a network.                   |
-|                         | | MODEL_PRIORITY_MED                          | | Importantly!                                            |
-|                         | | MODEL_PRIORITY_LOW                          | | This property is still not fully supported              |
-+-------------------------+-----------------------------------------------+-----------------------------------------------------------+
-@endsphinxdirective
-
-@sphinxdirective
-.. dropdown:: Click for information on Legacy APIs 
-
-   For legacy APIs like LoadNetwork/SetConfig/GetConfig/GetMetric:
-   
-   - replace {ov::device:priorities, "GPU,CPU"} with {"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}
-   - replace {ov::hint:model_priority, "LOW"} with {"MODEL_PRIORITY", "LOW"}
-   - InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES is defined as same string "MULTI_DEVICE_PRIORITIES"
-   - CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU is equal to "GPU,CPU"
-   - InferenceEngine::PluginConfigParams::KEY_MODEL_PRIORITY is defined as same string "MODEL_PRIORITY"
-   - InferenceEngine::PluginConfigParams::MODEL_PRIORITY_LOW is defined as same string "LOW"
+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
+| Property                  | Property values                               | Description                                               |
+===========================+===============================================+===========================================================+
+| <device candidate list>   | | AUTO: <device names>                        | | Lists the devices available for selection.              |
+|                           | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
+|                           | |                                             | | from high to low.                                       |
+|                           | |                                             | | If not specified, “AUTO” will be used as default        |
+|                           | |                                             | | and all devices will be included.                       |
+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
+| ov::device:priorities     | | device names                                | | Specifies the devices for Auto-Device plugin to select. |
+|                           | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
+|                           | |                                             | | from high to low.                                       |
+|                           | |                                             | | This configuration is optional.                         |
+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
+| ov::hint::performance_mode| | ov::hint::PerformanceMode::LATENCY          | | Specifies the performance mode preferred                |
+|                           | | ov::hint::PerformanceMode::THROUGHPUT       | | by the application.                                     |
+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
+| ov::hint::model_priority  | | ov::hint::Priority::HIGH                    | | Indicates the priority for a model.                     |
+|                           | | ov::hint::Priority::MEDIUM                  | | Importantly!                                            |
+|                           | | ov::hint::Priority::LOW                     | | This property is still not fully supported              |
+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
@endsphinxdirective

 ### Device candidate list
@@ -115,117 +100,31 @@ The device candidate list allows users to customize the priority and limit the c
 The following commands are accepted by the API: 

@sphinxdirective
-.. tab:: C++ API

-   .. code-block:: cpp
+.. tab:: C++

-      /*** With Inference Engine 2.0 API ***/
-      ov::Core core; 
+    .. doxygensnippet:: docs/snippets/AUTO0.cpp
+       :language: cpp
+       :fragment: [part0]

-      // Read a network in IR, PaddlePaddle, or ONNX format:
-      std::shared_ptr<ov::Model> model = core.read_model("sample.xml");    
+.. tab:: Python

-      // Load a network to AUTO using the default list of device candidates.
-      // The following lines are equivalent:
-      ov::CompiledModel model0 = core.compile_model(model);
-      ov::CompiledModel model1 = core.compile_model(model, "AUTO");
-      ov::CompiledModel model2 = core.compile_model(model, "AUTO", {});      
-
-      // You can also specify the devices to be used by AUTO in its selection process.
-      // The following lines are equivalent:
-      ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
-	   ov::CompiledModel model4 = core.compile_model(model, "AUTO", {{ov::device::priorities.name(), "GPU,CPU"}});
-
-      // the AUTO plugin is pre-configured (globally) with the explicit option:
-      core.set_property("AUTO", ov::device::priorities("GPU,CPU"));       
-
-.. tab:: C++ legacy API
-
-   .. code-block:: cpp
-
-      /*** With API Prior to 2022.1 Release ***/
-      InferenceEngine::Core ie;      
-
-      // Read a network in IR, PaddlePaddle, or ONNX format:
-      InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");  
-
-      // Load a network to AUTO using the default list of device candidates.
-      // The following lines are equivalent:
-      InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network);
-      InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO");
-      InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO", {});      
-      
-      // You can also specify the devices to be used by AUTO in its selection process.
-      // The following lines are equivalent:
-      InferenceEngine::ExecutableNetwork exec3 = ie.LoadNetwork(network, "AUTO:GPU,CPU");
-	   InferenceEngine::ExecutableNetwork exec4 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}});      
-      
-      // the AUTO plugin is pre-configured (globally) with the explicit option:
-      ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}, "AUTO");
-
-.. tab:: Python API
-
-   .. code-block:: python
-
-      ### New IE 2.0 API ###
-	  
-      from openvino.runtime import Core
-      core = Core()
-      
-      # Read a network in IR, PaddlePaddle, or ONNX format:
-      model = core.read_model(model_path)
-      
-      # Load a network to AUTO using the default list of device candidates.
-      # The following lines are equivalent:
-      model = core.compile_model(model=model) 
-      compiled_model = core.compile_model(model=model, device_name="AUTO")
-      compiled_model = core.compile_model(model=model, device_name="AUTO", config={})
-      
-      # You can also specify the devices to be used by AUTO in its selection process.
-      # The following lines are equivalent:
-      compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU")
-      compiled_model = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"})
-      
-      # the AUTO plugin is pre-configured (globally) with the explicit option:
-      core.set_config(config={"MULTI_DEVICE_PRIORITIES":"CPU,GPU"}, device_name="AUTO")
-    
-.. tab:: Python legacy API
-
-   .. code-block:: python
-
-      ### API before 2022.1 ###
-      from openvino.inference_engine import IECore
-      ie = IECore()
-      
-      # Read a network in IR, PaddlePaddle, or ONNX format:
-      net = ie.read_network(model=path_to_model)
-      
-      # Load a network to AUTO using the default list of device candidates.
-      # The following lines are equivalent:
-      exec_net = ie.load_network(network=net)
-      exec_net = ie.load_network(network=net, device_name="AUTO")
-      exec_net = ie.load_network(network=net, device_name="AUTO", config={})
-      
-      # You can also specify the devices to be used by AUTO in its selection process.
-      # The following lines are equivalent:
-      exec_net = ie.load_network(network=net, device_name="AUTO:CPU,GPU")
-      exec_net = ie.load_network(network=net, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"})
-      
-      # the AUTO plugin is pre-configured (globally) with the explicit option:
-      ie.SetConfig(config={"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}, device_name="AUTO");
+    .. doxygensnippet:: docs/snippets/ov_auto.py
+       :language: python
+       :fragment: [part0]

@endsphinxdirective

-To check what devices are present in the system, you can use Device API:
+To check what devices are present in the system, you can use Device API. For information on how to do it, check [Query device properties and configuration](supported_plugins/config_properties.md)

-For C++ API
+For C++
@sphinxdirective
 .. code-block:: sh

   ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample)
@endsphinxdirective

-For Python API
+For Python
@sphinxdirective
 .. code-block:: sh

@@ -234,7 +133,7 @@ For Python API


 ### Performance Hints
-The `ov::hint` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases.
+The `ov::hint::performance_mode` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases.

 #### ov::hint::PerformanceMode::THROUGHPUT
 This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images.
@@ -243,140 +142,59 @@ This mode prioritizes high throughput, balancing between latency and power. It i
 This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
 Note that currently the `ov::hint` property is supported by CPU and GPU devices only.

-To enable Performance Hints for your application, use the following code: 
+To enable performance hints for your application, use the following code: 
@sphinxdirective
-.. tab:: C++ API

-   .. code-block:: cpp
+.. tab:: C++

-      ov::Core core;
-
-      // Read a network in IR, PaddlePaddle, or ONNX format:
-      std::shared_ptr<ov::Model> model = core.read_model("sample.xml");      
-      
-      // Load a network to AUTO with Performance Hints enabled:
-      // To use the “throughput” mode:
-      ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "THROUGHPUT"}});
-      
-      // or the “latency” mode:
-      ov::CompiledModel compiledModel1 = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "LATENCY"}});
+    .. doxygensnippet:: docs/snippets/AUTO3.cpp
+       :language: cpp
+       :fragment: [part3]
 
-.. tab:: Python API
+.. tab:: Python

-   .. code-block:: python
+    .. doxygensnippet:: docs/snippets/ov_auto.py
+       :language: python
+       :fragment: [part3]

-      from openvino.runtime import Core
-      
-      core = Core()
-      
-      # Read a network in IR, PaddlePaddle, or ONNX format:
-      model = core.read_model(model_path)
-      
-      # Load a network to AUTO with Performance Hints enabled:
-      # To use the “throughput” mode:
-      compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"THROUGHPUT"})
-      
-      # or the “latency” mode:
-      compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"LATENCY"})
@endsphinxdirective

 ### ov::hint::model_priority
-The property enables you to control the priorities of networks in the Auto-Device plugin. A high-priority network will be loaded to a supported high-priority device. A lower-priority network will not be loaded to a device that is occupied by a higher-priority network.
+The property enables you to control the priorities of models in the Auto-Device plugin. A high-priority model will be loaded to a supported high-priority device. A lower-priority model will not be loaded to a device that is occupied by a higher-priority model.

@sphinxdirective
-.. tab:: C++ API

-   .. code-block:: cpp
+.. tab:: C++

-      // Example 1
-      // Compile and load networks:
-      ov::CompiledModel compiled_model0 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "HIGH"}});
-	   ov::CompiledModel compiled_model1 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}});
-	   ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}});
-      
-      /************
-        Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
-        	  Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
-       ************/
-      
-      // Example 2
-      // Compile and load networks:
-      ov::CompiledModel compiled_model3 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}});
-	   ov::CompiledModel compiled_model4 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}});
-	   ov::CompiledModel compiled_model5 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}});
-      
-      /************
-        Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
-        Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
-       ************/
-      
-.. tab:: Python API
+    .. doxygensnippet:: docs/snippets/AUTO4.cpp
+       :language: cpp
+       :fragment: [part4]
+ 
+.. tab:: Python

-   .. code-block:: python
+    .. doxygensnippet:: docs/snippets/ov_auto.py
+       :language: python
+       :fragment: [part4]

-      # Example 1
-      # Compile and load networks:
-      compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"0"})
-      compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"})
-      compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
-
-      # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
-      # Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model3 will use CPU.
-      
-      # Example 2
-      # Compile and load networks:
-      compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
-      compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"})
-      compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
-
-      # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
-      # Result: compiled_model0 will use GPU, compiled_model1 will use GPU, compiled_model3 will use MYRIAD.
@endsphinxdirective

 ## Configuring Individual Devices and Creating the Auto-Device plugin on Top
-Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes VPUX or Myriad (devices uncapable of utilizing the Performance Hints option). 
+Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes Myriad (devices uncapable of utilizing the Performance Hints option). 

@sphinxdirective
-.. tab:: C++ API

-   .. code-block:: cpp
+.. tab:: C++

-      ovCore core;
+    .. doxygensnippet:: docs/snippets/AUTO5.cpp
+       :language: cpp
+       :fragment: [part5]
+ 
+.. tab:: Python

-      // Read a network in IR, PaddlePaddle, or ONNX format
-      stdshared_ptrovModel model = core.read_model(sample.xml);
+    .. doxygensnippet:: docs/snippets/ov_auto.py
+       :language: python
+       :fragment: [part5]

-      // Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin
-      set VPU config
-      core.set_property(VPUX, {});
-
-      // set MYRIAD config
-      core.set_property(MYRIAD, {});
-      ovCompiledModel compiled_model = core.compile_model(model, AUTO);
-
-.. tab:: Python API
-
-   .. code-block:: python
-
-      from openvino.runtime import Core
-      
-      core = Core()
-      
-      # Read a network in IR, PaddlePaddle, or ONNX format:
-      model = core.read_model(model_path)
-      
-      # Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin:
-      core.set_config(config=vpux_config, device_name="VPUX")
-      core.set_config (config=vpux_config, device_name="MYRIAD")
-      compiled_model = core.compile_model(model=model)
-      
-      # Alternatively, you can combine the individual device settings into one configuration and load the network.
-      # The AUTO plugin will parse and apply the settings to the right devices.
-      # The 'device_name' of "AUTO:VPUX,MYRIAD" will configure auto-device to use devices.
-      compiled_model = core.compile_model(model=model, device_name=device_name, config=full_config)
-      
-      # To query the optimization capabilities:
-      device_cap = core.get_metric("CPU", "OPTIMIZATION_CAPABILITIES")
@endsphinxdirective

 <a name="Benchmark App Info"></a>
@@ -388,8 +206,8 @@ For unlimited device choice:
 .. code-block:: sh

   ./benchmark_app –d AUTO –m <model> -i <input> -niter 1000
-@endsphinxdirective 
-  
+@endsphinxdirective
+
 For limited device choice:
@sphinxdirective
 .. code-block:: sh
@@ -398,13 +216,16 @@ For limited device choice:
@endsphinxdirective

 For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.md) or [Python](../../tools/benchmark_tool/README.md) version instructions.	
-	
+
@sphinxdirective
 .. note::

   The default CPU stream is 1 if using “-d AUTO”.

   You can use the FP16 IR to work with auto-device.
-   
+
   No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on.
@endsphinxdirective
+
+
+[autoplugin_accelerate]: ../img/autoplugin_accelerate.png
--- a/docs/img/autoplugin_accelerate.png
+++ b/docs/img/autoplugin_accelerate.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ba092e65d9c5c6fb585c4a394ba3e6a913bf4f129a386b7a8664b94aeb47878b
+size 61218
--- a/docs/snippets/AUTO0.cpp
+++ b/docs/snippets/AUTO0.cpp
@@ -1,12 +1,28 @@
-#include <ie_core.hpp>
+#include <openvino/openvino.hpp>

 int main() {
+{
 //! [part0]
-    InferenceEngine::Core ie;
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    // these 2 lines below are equivalent
-    InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO");
-    InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "");
+ov::Core core;
+
+// Read a network in IR, PaddlePaddle, or ONNX format:
+std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
+
+// compile a model on AUTO using the default list of device candidates.
+// The following lines are equivalent:
+ov::CompiledModel model0 = core.compile_model(model);
+ov::CompiledModel model1 = core.compile_model(model, "AUTO");
+
+// Optional
+// You can also specify the devices to be used by AUTO.
+// The following lines are equivalent:
+ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
+ov::CompiledModel model4 = core.compile_model(model, "AUTO", ov::device::priorities("GPU,CPU"));
+
+//Optional
+// the AUTO plugin is pre-configured (globally) with the explicit option:
+core.set_property("AUTO", ov::device::priorities("GPU,CPU"));
 //! [part0]
-return 0;
+}
+    return 0;
 }
--- a/docs/snippets/AUTO1.cpp
+++ b/docs/snippets/AUTO1.cpp
@@ -1,15 +1,30 @@
 #include <ie_core.hpp>

 int main() {
+{
 //! [part1]
-    InferenceEngine::Core ie;
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    // "AUTO" plugin is (globally) pre-configured with the explicit option:
-    ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}}, "AUTO");
-    // the below 3 lines are equivalent (the first line leverages the pre-configured AUTO, while second and third explicitly pass the same settings)
-    InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO", {});
-    InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}});
-    InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO:CPU,GPU");
+// Inference Engine API
+InferenceEngine::Core ie;
+
+// Read a network in IR, PaddlePaddle, or ONNX format:
+InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
+
+// Load a network to AUTO using the default list of device candidates.
+// The following lines are equivalent:
+InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network);
+InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO");
+InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO", {});
+
+// Optional
+// You can also specify the devices to be used by AUTO in its selection process.
+// The following lines are equivalent:
+InferenceEngine::ExecutableNetwork exec3 = ie.LoadNetwork(network, "AUTO:GPU,CPU");
+InferenceEngine::ExecutableNetwork exec4 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}});
+
+// Optional
+// the AUTO plugin is pre-configured (globally) with the explicit option:
+ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}, "AUTO");
 //! [part1]
-return 0;
+}
+    return 0;
 }
--- a/docs/snippets/AUTO2.cpp
+++ b/docs/snippets/AUTO2.cpp
@@ -1,10 +1,12 @@
 #include <ie_core.hpp>

 int main() {
+{
 //! [part2]
-    InferenceEngine::Core ie;
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
+InferenceEngine::Core ie;
+InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
+InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
 //! [part2]
-return 0;
+}
+    return 0;
 }
--- a/docs/snippets/AUTO3.cpp
+++ b/docs/snippets/AUTO3.cpp
@@ -1,10 +1,22 @@
-#include <ie_core.hpp>
+#include <openvino/openvino.hpp>

 int main() {
+{
+
 //! [part3]
-    InferenceEngine::Core ie;
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO:CPU,GPU");
+ov::Core core;
+
+// Read a network in IR, PaddlePaddle, or ONNX format:
+std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
+
+// compile a model on AUTO with Performance Hints enabled:
+// To use the “throughput” mode:
+ov::CompiledModel compiled_model = core.compile_model(model, "AUTO",
+    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
+// or the “latency” mode:
+ov::CompiledModel compiled_mode2 = core.compile_model(model, "AUTO",
+    ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY));
 //! [part3]
-return 0;
+}
+    return 0;
 }
--- a/docs/snippets/AUTO4.cpp
+++ b/docs/snippets/AUTO4.cpp
@@ -1,19 +1,36 @@
-#include <ie_core.hpp>
+#include <openvino/openvino.hpp>

 int main() {
-    const std::map<std::string, std::string> cpu_config  = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } };
-    const std::map<std::string, std::string> gpu_config = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } };
-    //! [part4]
-    InferenceEngine::Core ie; 
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    // configure the CPU device first
-    ie.SetConfig(cpu_config, "CPU"); 
-    // configure the GPU device
-    ie.SetConfig(gpu_config, "GPU"); 
-    // load the network to the auto-device
-    InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
-    // new metric allows to query the optimization capabilities
-    std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES));
-    //! [part4]
+    ov::Core core;
+
+    // Read a network in IR, PaddlePaddle, or ONNX format:
+    std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
+{
+//! [part4]
+// Example 1
+ov::CompiledModel compiled_model0 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::HIGH));
+ov::CompiledModel compiled_model1 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::MEDIUM));
+ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::LOW));
+/************
+  Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
+  Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
+ ************/
+
+// Example 2
+ov::CompiledModel compiled_model3 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::LOW));
+ov::CompiledModel compiled_model4 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::MEDIUM));
+ov::CompiledModel compiled_model5 = core.compile_model(model, "AUTO",
+    ov::hint::model_priority(ov::hint::Priority::LOW));
+/************
+  Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
+  Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
+ ************/
+//! [part4]
+}
    return 0;
 }
--- a/docs/snippets/AUTO5.cpp
+++ b/docs/snippets/AUTO5.cpp
@@ -1,15 +1,18 @@
-#include <ie_core.hpp>
+#include <openvino/openvino.hpp>

 int main() {
-    std::string device_name = "AUTO:CPU,GPU";
-    const std::map< std::string, std::string > full_config = {};
-    //! [part5]
-    InferenceEngine::Core ie; 
-    InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
-    // 'device_name' can be "AUTO:CPU,GPU" to configure the auto-device to use CPU and GPU
-    InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, device_name, full_config);
-    // new metric allows to query the optimization capabilities
-    std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES));
-    //! [part5]
+ov::AnyMap cpu_config = {};
+ov::AnyMap myriad_config = {};
+//! [part5]
+ov::Core core;
+
+// Read a network in IR, PaddlePaddle, or ONNX format:
+std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
+
+// Configure  CPU and the MYRIAD devices when compiled model
+ov::CompiledModel compiled_model = core.compile_model(model, "AUTO",
+    ov::device::properties("CPU", cpu_config),
+    ov::device::properties("MYRIAD", myriad_config));
+//! [part5]
    return 0;
 }
--- a/docs/snippets/AUTO6.cpp
+++ b/docs/snippets/AUTO6.cpp
@@ -0,0 +1,20 @@
+#include <openvino/openvino.hpp>
+
+int main() {
+{
+//! [part6]
+ov::Core core;
+
+// read a network in IR, PaddlePaddle, or ONNX format
+std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
+
+// load a network to AUTO and set log level to debug
+ov::CompiledModel compiled_model = core.compile_model(model, "AUTO", ov::log::level(ov::log::Level::DEBUG));
+
+// or set log level with set_config and load network
+core.set_property("AUTO", ov::log::level(ov::log::Level::DEBUG));
+ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO");
+//! [part6]
+}
+    return 0;
+}
--- a/docs/snippets/ov_auto.py
+++ b/docs/snippets/ov_auto.py
@@ -0,0 +1,105 @@
+import sys
+from openvino.runtime import Core
+from openvino.inference_engine import IECore
+model_path = "/openvino_CI_CD/result/install_pkg/tests/test_model_zoo/core/models/ir/add_abc.xml"
+path_to_model = "/openvino_CI_CD/result/install_pkg/tests/test_model_zoo/core/models/ir/add_abc.xml"
+
+def part0():
+#! [part0]
+    core = Core()
+
+    # Read a network in IR, PaddlePaddle, or ONNX format:
+    model = core.read_model(model_path)
+
+    #  compile a model on AUTO using the default list of device candidates.
+    #  The following lines are equivalent:
+    compiled_model = core.compile_model(model=model)
+    compiled_model = core.compile_model(model=model, device_name="AUTO")
+
+    # Optional
+    # You can also specify the devices to be used by AUTO.
+    # The following lines are equivalent:
+    compiled_model = core.compile_model(model=model, device_name="AUTO:GPU,CPU")
+    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
+
+    # Optional
+    # the AUTO plugin is pre-configured (globally) with the explicit option:
+    core.set_property(device_name="AUTO", properties={"MULTI_DEVICE_PRIORITIES":"GPU,CPU"})
+#! [part0]
+
+def part1():
+#! [part1]
+    ### IE API ###
+    ie = IECore()
+
+    # Read a network in IR, PaddlePaddle, or ONNX format:
+    net = ie.read_network(model=path_to_model)
+
+    # Load a network to AUTO using the default list of device candidates.
+    # The following lines are equivalent:
+    exec_net = ie.load_network(network=net)
+    exec_net = ie.load_network(network=net, device_name="AUTO")
+    exec_net = ie.load_network(network=net, device_name="AUTO", config={})
+
+    # Optional
+    # You can also specify the devices to be used by AUTO in its selection process.
+    # The following lines are equivalent:
+    exec_net = ie.load_network(network=net, device_name="AUTO:GPU,CPU")
+    exec_net = ie.load_network(network=net, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
+
+    # Optional
+    # the AUTO plugin is pre-configured (globally) with the explicit option:
+    ie.set_config(config={"MULTI_DEVICE_PRIORITIES":"GPU,CPU"}, device_name="AUTO");
+#! [part1]
+
+def part3():
+#! [part3]
+    core = Core()
+    # Read a network in IR, PaddlePaddle, or ONNX format:
+    model = core.read_model(model_path)
+    # compile a model on AUTO with Performance Hints enabled:
+    # To use the “throughput” mode:
+    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})
+    # or the “latency” mode:
+    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
+#! [part3]
+
+def part4():
+#! [part4]
+    core = Core()
+    model = core.read_model(model_path)
+
+    # Example 1
+    compiled_model0 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"HIGH"})
+    compiled_model1 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"MEDIUM"})
+    compiled_model2 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"LOW"})
+    # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
+    # Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
+
+    # Example 2
+    compiled_model3 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"HIGH"})
+    compiled_model4 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"MEDIUM"})
+    compiled_model5 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"LOW"})
+    # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
+    # Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
+#! [part4]
+
+def part5():
+#! [part5]
+    core = Core()
+    model = core.read_model(model_path)
+    core.set_property(device_name="CPU", properties={})
+    core.set_property(device_name="MYRIAD", properties={})
+    compiled_model = core.compile_model(model=model)
+    compiled_model = core.compile_model(model=model, device_name="AUTO")
+#! [part5]
+
+def main():
+    part0()
+    part1()
+    part3()
+    part4()
+    part5()
+
+if __name__ == '__main__':
+    sys.exit(main())