[docs] python snippets for devices master (#11176)

* Update CPU docs

* update GPU docs

* update with sphinxtab

* Fix docs

* Add preprocessig snippet

* Fix path

Co-authored-by: Anastasia Kuporosova <anastasia.kuporosova@intel.com>
This commit is contained in:
Alexey Lebedev 2022-03-29 11:01:53 +03:00 committed by GitHub
parent bc9f140bb4
commit 8f88889876
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 314 additions and 11 deletions

View File

@ -9,7 +9,17 @@ There are two options for using the custom operation configuration file:
* Include a section with your kernels into the automatically-loaded `<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file.
* Call the `ov::Core::set_property()` method from your application with the `"CONFIG_FILE"` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
@snippet snippets/gpu/custom_kernels_api.cpp part0
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/custom_kernels_api.cpp part0
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/custom_kernels_api.py part0
@endsphinxtab
@endsphinxtabset
All OpenVINO samples, except the trivial `hello_classification`, and most Open Model Zoo demos
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:

View File

@ -15,7 +15,17 @@ For the CPU plugin `"CPU"` device name is used, and even though there can be mor
On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically.
In order to use CPU for inference the device name should be passed to `ov::Core::compile_model()` method:
@snippet snippets/cpu/compile_model.cpp compile_model_default
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/compile_model.cpp compile_model_default
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/compile_model.py compile_model_default
@endsphinxtab
@endsphinxtabset
## Supported inference data types
CPU plugin supports the following data types as inference precision of internal primitives:
@ -55,7 +65,17 @@ Using bf16 precision provides the following performance benefits:
To check if the CPU device can support the bfloat16 data type use the [query device properties interface](./config_properties.md) to query ov::device::capabilities property, which should contain `BF16` in the list of CPU capabilities:
@snippet snippets/cpu/Bfloat16Inference0.cpp part0
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/Bfloat16Inference0.cpp part0
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/Bfloat16Inference.py part0
@endsphinxtab
@endsphinxtabset
In case if the model was converted to bf16, ov::hint::inference_precision is set to ov::element::bf16 and can be checked via ov::CompiledModel::get_property call. The code below demonstrates how to get the element type:
@ -63,7 +83,17 @@ In case if the model was converted to bf16, ov::hint::inference_precision is set
To infer the model in f32 precision instead of bf16 on targets with native bf16 support, set the ov::hint::inference_precision to ov::element::f32.
@snippet snippets/cpu/Bfloat16Inference2.cpp part2
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/Bfloat16Inference2.cpp part2
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/Bfloat16Inference.py part2
@endsphinxtab
@endsphinxtabset
Bfloat16 software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, one have to explicitly set ov::hint::inference_precision to ov::element::bf16.
@ -78,7 +108,17 @@ To enable the simulation, one have to explicitly set ov::hint::inference_precisi
If a machine has OpenVINO supported devices other than CPU (for example integrated GPU), then any supported model can be executed on CPU and all the other devices simultaneously.
This can be achieved by specifying `"MULTI:CPU,GPU.0"` as a target device in case of simultaneous usage of CPU and GPU.
@snippet snippets/cpu/compile_model.cpp compile_model_multi
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/compile_model.cpp compile_model_multi
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/compile_model.py compile_model_multi
@endsphinxtab
@endsphinxtabset
See [Multi-device execution page](../multi_device.md) for more details.
@ -103,7 +143,17 @@ The most flexible configuration is the fully undefined shape, when we do not app
But reducing the level of uncertainty will bring performance gains.
We can reduce memory consumption through memory reuse, and as a result achieve better cache locality, which in its turn leads to better inference performance, if we explicitly set dynamic shapes with defined upper bounds.
@snippet snippets/cpu/dynamic_shape.cpp defined_upper_bound
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/dynamic_shape.cpp defined_upper_bound
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/dynamic_shape.py defined_upper_bound
@endsphinxtab
@endsphinxtabset
> **NOTE**: Using fully undefined shapes may result in significantly higher memory consumption compared to inferring the same model with static shapes.
> If the memory consumption is unacceptable but dynamic shapes are still required, one can reshape the model using shapes with defined upper bound to reduce memory footprint.
@ -111,7 +161,17 @@ We can reduce memory consumption through memory reuse, and as a result achieve b
Some runtime optimizations works better if the model shapes are known in advance.
Therefore, if the input data shape is not changed between inference calls, it is recommended to use a model with static shapes or reshape the existing model with the static input shape to get the best performance.
@snippet snippets/cpu/dynamic_shape.cpp static_shape
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/dynamic_shape.cpp static_shape
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/dynamic_shape.py static_shape
@endsphinxtab
@endsphinxtabset
See [dynamic shapes guide](../ov_dynamic_shapes.md) for more details.
@ -211,4 +271,3 @@ For some performance-critical DL operations, the CPU plugin uses optimized imple
* [Supported Devices](Supported_Devices.md)
* [Optimization guide](@ref openvino_docs_optimization_guide_dldt_optimization_guide)
* [СPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/wiki/CPUPluginDevelopersDocs)

View File

@ -48,19 +48,49 @@ Then device name can be passed to `ov::Core::compile_model()` method:
@sphinxtab{Running on default device}
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_default_gpu
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_default_gpu
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@sphinxtab{Running on specific GPU}
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_gpu_with_id
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_gpu_with_id
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@sphinxtab{Running on specific tile}
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_gpu_with_id_and_tile
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_gpu_with_id_and_tile
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@ -93,7 +123,17 @@ Floating-point precision of a GPU primitive is selected based on operation preci
If a machine has multiple GPUs (for example integrated GPU and discrete Intel GPU), then any supported model can be executed on all GPUs simultaneously.
This can be achieved by specifying `"MULTI:GPU.1,GPU.0"` as a target device.
@snippet snippets/gpu/compile_model.cpp compile_model_multi
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_multi
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_multi
@endsphinxtab
@endsphinxtabset
See [Multi-device execution page](../multi_device.md) for more details.
@ -106,13 +146,33 @@ Alternatively it can be enabled explicitly via the device notion, e.g. `"BATCH:G
@sphinxtab{Batching via BATCH plugin}
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_batch_plugin
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_batch_plugin
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@sphinxtab{Bacthing via throughput hint}
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/compile_model.cpp compile_model_auto_batch
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/compile_model.py compile_model_auto_batch
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@ -141,7 +201,17 @@ For example, batch size 33 may be executed via 2 internal networks with batch si
The code snippet below demonstrates how to use dynamic batch in simple scenarios:
@snippet snippets/gpu/dynamic_batch.cpp dynamic_batch
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/dynamic_batch.cpp dynamic_batch
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/dynamic_batch.py dynamic_batch
@endsphinxtab
@endsphinxtabset
See [dynamic shapes guide](../ov_dynamic_shapes.md) for more details.
@ -149,7 +219,17 @@ See [dynamic shapes guide](../ov_dynamic_shapes.md) for more details.
GPU plugin has the following additional preprocessing options:
- `ov::intel_gpu::memory_type::surface` and `ov::intel_gpu::memory_type::buffer` values for `ov::preprocess::InputTensorInfo::set_memory_type()` preprocessing method. These values are intended to be used to provide a hint for the plugin on the type of input Tensors that will be set in runtime to generate proper kernels.
@snippet snippets/gpu/preprocessing.cpp init_preproc
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/preprocessing.cpp init_preproc
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/preprocessing.py init_preproc
@endsphinxtab
@endsphinxtabset
With such preprocessing GPU plugin will expect `ov::intel_gpu::ocl::ClImage2DTensor` (or derived) to be passed for each NV12 plane via `ov::InferRequest::set_tensor()` or `ov::InferRequest::set_tensors()` methods.

View File

@ -0,0 +1,23 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from openvino.runtime import Core
#! [part0]
core = Core()
cpu_optimization_capabilities = core.get_property("CPU", "OPTIMIZATION_CAPABILITIES")
#! [part0]
# TODO: enable part1 when property api will be supported in python
#! [part1]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "CPU")
inference_precision = core.get_property("CPU", "INFERENCE_PRECISION_HINT")
#! [part1]
#! [part2]
core = Core()
core.set_property("CPU", {"INFERENCE_PRECISION_HINT": "f32"})
#! [part2]

View File

@ -0,0 +1,17 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#! [compile_model_default]
from openvino.runtime import Core
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "CPU")
#! [compile_model_default]
#! [compile_model_multi]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "MULTI:CPU,GPU.0")
#! [compile_model_multi]

View File

@ -0,0 +1,17 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from openvino.runtime import Core
#! [defined_upper_bound]
core = Core()
model = core.read_model("model.xml")
model.reshape([(1, 10), (1, 20), (1, 30), (1, 40)])
#! [defined_upper_bound]
#! [static_shape]
core = Core()
model = core.read_model("model.xml")
model.reshape([10, 20, 30, 40])
#! [static_shape]

View File

@ -0,0 +1,41 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from openvino.runtime import Core
#! [compile_model_default_gpu]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "GPU")
#! [compile_model_default_gpu]
#! [compile_model_gpu_with_id]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "GPU.1")
#! [compile_model_gpu_with_id]
#! [compile_model_gpu_with_id_and_tile]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "GPU.1.0")
#! [compile_model_gpu_with_id_and_tile]
#! [compile_model_multi]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "MULTI:GPU.1,GPU.0")
#! [compile_model_multi]
#! [compile_model_batch_plugin]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "BATCH:GPU")
#! [compile_model_batch_plugin]
#! [compile_model_auto_batch]
core = Core()
model = core.read_model("model.xml")
compiled_model = core.compile_model(model, "GPU", {"PERFORMANCE_HINT": "THROUGHPUT"})
#! [compile_model_auto_batch]

View File

@ -0,0 +1,10 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from openvino.runtime import Core
#! [part0]
core = Core()
core.set_property("GPU", {"CONFIG_FILE": "<path_to_the_xml_file>"})
#! [part0]

View File

@ -0,0 +1,28 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
import openvino.runtime as ov
#! [dynamic_batch]
core = ov.Core()
C = 3
H = 224
W = 224
model = core.read_model("model.xml")
model.reshape([(1, 10), C, H, W])
# compile model and create infer request
compiled_model = core.compile_model(model, "GPU")
infer_request = compiled_model.create_infer_request()
# create input tensor with specific batch size
input_tensor = ov.Tensor(model.input().element_type, [2, C, H, W])
# ...
infer_request.infer([input_tensor])
#! [dynamic_batch]

View File

@ -0,0 +1,18 @@
# Copyright (C) 2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#! [init_preproc]
from openvino.runtime import Core, Type, Layout
from openvino.preprocess import PrePostProcessor, ColorFormat
core = Core()
model = core.read_model("model.xml")
p = PrePostProcessor(model)
p.input().tensor().set_element_type(Type.u8) \
.set_color_format(ColorFormat.NV12_TWO_PLANES, ["y", "uv"]) \
.set_memory_type("GPU_SURFACE")
p.input().preprocess().convert_color(ColorFormat.BGR)
p.input().model().set_layout(Layout("NCHW"))
model_with_preproc = p.build()
#! [init_preproc]