update NV12 (#15370)

* update NV12 docs and snippets

add single-plane input information
create single-plane cpp snippet
menu fix
update formatting for sphinx directives

Co-Authored-By: Ilya Churaev <ilyachur@gmail.com>
Co-Authored-By: Vladimir Paramuzov <vladimir.paramuzov@intel.com>

* additional snippet fixes

---------

Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
This commit is contained in:
Karol Blaszczak 2023-02-28 14:58:08 +01:00 committed by GitHub
parent 4dff2d1c60
commit f9a8d9132d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 292 additions and 187 deletions

View File

@ -222,11 +222,11 @@ The GPU plugin has the following additional preprocessing options:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gpu/preprocessing.cpp init_preproc
@snippet docs/snippets/gpu/preprocessing_nv12_two_planes.cpp init_preproc
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gpu/preprocessing.py init_preproc
@snippet docs/snippets/gpu/preprocessing_nv12_two_planes.py init_preproc
@endsphinxtab
@endsphinxtabset

View File

@ -3,8 +3,11 @@
The GPU plugin implementation of the `ov::RemoteContext` and `ov::RemoteTensor` interfaces supports GPU
pipeline developers who need video memory sharing and interoperability with existing native APIs,
such as OpenCL, Microsoft DirectX, or VAAPI.
Using these interfaces allows you to avoid any memory copy overhead when plugging OpenVINO™ inference
into an existing GPU pipeline. It also enables OpenCL kernels to participate in the pipeline to become
The `ov::RemoteContext` and `ov::RemoteTensor` interface implementation targets the need for memory sharing and
interoperability with existing native APIs, such as OpenCL, Microsoft DirectX, and VAAPI.
They allow you to avoid any memory copy overhead when plugging OpenVINO™ inference
into an existing GPU pipeline. They also enable OpenCL kernels to participate in the pipeline to become
native buffer consumers or producers of the OpenVINO™ inference.
There are two interoperability scenarios supported by the Remote Tensor API:
@ -23,7 +26,7 @@ and functions that consume or produce native handles directly.
## Context Sharing Between Application and GPU Plugin
GPU plugin classes that implement the `ov::RemoteContext` interface are responsible for context sharing.
Obtaining a context object is the first step of sharing pipeline objects.
Obtaining a context object is the first step in sharing pipeline objects.
The context object of the GPU plugin directly wraps OpenCL context, setting a scope for sharing the
`ov::CompiledModel` and `ov::RemoteTensor` objects. The `ov::RemoteContext` object can be either created on top of
an existing handle from a native API or retrieved from the GPU plugin.
@ -37,60 +40,49 @@ additional parameter.
To create the `ov::RemoteContext` object for user context, explicitly provide the context to the plugin using constructor for one
of `ov::RemoteContext` derived classes.
@sphinxtabset
@sphinxdirective
@sphinxtab{Linux}
.. tab:: Linux
@sphinxtabset
.. tab:: Create from cl_context
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_cl_context
@sphinxtab{Create from cl_context}
.. tab:: Create from cl_queue
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_cl_queue
@endsphinxtab
.. tab:: Create from VADisplay
@sphinxtab{Create from cl_queue}
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_va_display
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue
.. tab:: Windows
@endsphinxtab
.. tab:: Create from cl_context
@sphinxtab{Create from VADisplay}
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_cl_context
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_va_display
.. tab:: Create from cl_queue
@endsphinxtab
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_cl_queue
@endsphinxtabset
@endsphinxtab
@sphinxtab{Windows}
@sphinxtabset
@sphinxtab{Create from cl_context}
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context
@endsphinxtab
@sphinxtab{Create from cl_queue}
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue
@endsphinxtab
@sphinxtab{Create from ID3D11Device}
@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_d3d_device
@endsphinxtab
@endsphinxtabset
@endsphinxtabset
.. tab:: Create from ID3D11Device
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: context_from_d3d_device
@endsphinxdirective
### Getting RemoteContext from the Plugin
If you do not provide any user context, the plugin uses its default internal context.
@ -100,21 +92,21 @@ Once the plugin options have been changed, the internal context is replaced by t
To request the current default context of the plugin, use one of the following methods:
@sphinxtabset
@sphinxdirective
@sphinxtab{Get context from Core}
.. tab:: Get context from Core
@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_core
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: default_context_from_core
@endsphinxtab
.. tab:: Get context from compiled model
@sphinxtab{Batching via throughput hint}
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: default_context_from_model
@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_model
@endsphinxtab
@endsphinxtabset
@endsphinxdirective
## Memory Sharing Between Application and GPU Plugin
@ -126,108 +118,153 @@ of the `ov::RemoteContext` sub-classes.
`ov::intel_gpu::ocl::ClContext` has multiple overloads of `create_tensor` methods which allow to wrap pre-allocated native handles with the `ov::RemoteTensor`
object or request plugin to allocate specific device memory. For more details, see the code snippets below:
@sphinxtabset
@sphinxdirective
@sphinxtab{Wrap native handles}
.. tab:: Wrap native handles
@sphinxtabset
.. tab:: USM pointer
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: wrap_usm_pointer
@sphinxtab{USM pointer}
.. tab:: cl_mem
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: wrap_cl_mem
@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_usm_pointer
.. tab:: cl::Buffer
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: wrap_cl_buffer
@endsphinxtab
.. tab:: cl::Image2D
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: wrap_cl_image
@sphinxtab{cl_mem}
.. tab:: biplanar NV12 surface
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: wrap_nv12_surface
@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_mem
.. tab:: Allocate device memory
@endsphinxtab
.. tab:: USM host memory
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: allocate_usm_host
@sphinxtab{cl::Buffer}
.. tab:: USM device memory
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: allocate_usm_device
@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_buffer
.. tab:: cl::Buffer
.. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
:language: cpp
:fragment: allocate_cl_buffer
@endsphinxtab
@sphinxtab{cl::Image2D}
@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_image
@endsphinxtab
@sphinxtab{biplanar NV12 surface}
@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_nv12_surface
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@sphinxtab{Allocate device memory}
@sphinxtabset
@sphinxtab{USM host memory}
@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_host
@endsphinxtab
@sphinxtab{USM device memory}
@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_device
@endsphinxtab
@sphinxtab{cl::Buffer}
@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_cl_buffer
@endsphinxtab
@endsphinxtabset
@endsphinxtab
@endsphinxtabset
@endsphinxdirective
The `ov::intel_gpu::ocl::D3DContext` and `ov::intel_gpu::ocl::VAContext` classes are derived from `ov::intel_gpu::ocl::ClContext`.
Therefore, they provide the functionality described above and extend it
to allow creation of `ov::RemoteTensor` objects from `ID3D11Buffer`, `ID3D11Texture2D` pointers or the `VASurfaceID` handle respectively.
## Direct NV12 Video Surface Input
To support the direct consumption of a hardware video decoder output, the plugin accepts two-plane video
surfaces as arguments for the `create_tensor_nv12()` function, which creates a pair of `ov::RemoteTensor`
objects which represent the Y and UV planes.
To support the direct consumption of a hardware video decoder output, the GPU plugin accepts:
To ensure that the plugin generates the correct execution graph for the NV12 dual-plane input, static preprocessing
* Two-plane NV12 video surface input - calling the `create_tensor_nv12()` function creates
a pair of `ov::RemoteTensor` objects, representing the Y and UV planes.
* Single-plane NV12 video surface input - calling the `create_tensor()` function creates one
`ov::RemoteTensor` object, representing the Y and UV planes at once (Y elements before UV elements).
* NV12 to Grey video surface input conversion - calling the `create_tensor()` function creates one
`ov::RemoteTensor` object, representing only the Y plane.
To ensure that the plugin generates a correct execution graph, static preprocessing
should be added before model compilation:
@snippet snippets/gpu/preprocessing.cpp init_preproc
@sphinxdirective
Since the `ov::intel_gpu::ocl::ClImage2DTensor` and its derived classes do not support batched surfaces, if batching and surface sharing are required
at the same time, inputs need to be set via the `ov::InferRequest::set_tensors` method with vector of shared surfaces for each plane:
.. tab:: two-plane
@sphinxtabset
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
:language: cpp
:fragment: [init_preproc]
@sphinxtab{Single batch}
.. tab:: single-plane
@snippet docs/snippets/gpu/preprocessing.cpp single_batch
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
:language: cpp
:fragment: [init_preproc]
@endsphinxtab
.. tab:: NV12 to Grey
@sphinxtab{Multiple batches}
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
:language: cpp
:fragment: [init_preproc]
@snippet docs/snippets/gpu/preprocessing.cpp batched_case
@endsphinxdirective
@endsphinxtab
@endsphinxtabset
Since the `ov::intel_gpu::ocl::ClImage2DTensor` and its derived classes do not support batched surfaces,
if batching and surface sharing are required at the same time,
inputs need to be set via the `ov::InferRequest::set_tensors` method with vector of shared surfaces for each plane:
@sphinxdirective
.. tab:: Single Batch
.. tab:: two-plane
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
:language: cpp
:fragment: single_batch
.. tab:: single-plane
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
:language: cpp
:fragment: single_batch
.. tab:: NV12 to Grey
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
:language: cpp
:fragment: single_batch
.. tab:: Multiple Batches
.. tab:: two-plane
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
:language: cpp
:fragment: batched_case
.. tab:: single-plane
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
:language: cpp
:fragment: batched_case
.. tab:: NV12 to Grey
.. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
:language: cpp
:fragment: batched_case
@endsphinxdirective
I420 color format can be processed in a similar way
## Context & Queue Sharing
@ -242,18 +279,12 @@ This sharing mechanism allows performing pipeline synchronization on the app sid
on waiting for the completion of inference. The pseudo-code may look as follows:
@sphinxdirective
.. raw:: html
<div class="collapsible-section" data-title="Queue and context sharing example">
.. dropdown:: Queue and context sharing example
@endsphinxdirective
@snippet snippets/gpu/queue_sharing.cpp queue_sharing
@sphinxdirective
.. raw:: html
</div>
.. doxygensnippet:: docs/snippets/gpu/queue_sharing.cpp
:language: cpp
:fragment: queue_sharing
@endsphinxdirective
@ -282,60 +313,34 @@ For possible low-level properties and their description, refer to the `openvino/
To see pseudo-code of usage examples, refer to the sections below.
> **NOTE**: For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above.
@sphinxdirective
.. raw:: html
<div class="collapsible-section" data-title="OpenCL Kernel Execution on a Shared Buffer">
.. NOTE::
For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above.
.. dropdown:: OpenCL Kernel Execution on a Shared Buffer
This example uses the OpenCL context obtained from a compiled model object.
.. doxygensnippet:: docs/snippets/gpu/context_sharing.cpp
:language: cpp
:fragment: context_sharing_get_from_ov
.. dropdown:: Running GPU Plugin Inference within User-Supplied Shared Context
.. doxygensnippet:: docs/snippets/gpu/context_sharing.cpp
:language: cpp
:fragment: context_sharing_user_handle
.. dropdown:: Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux
.. doxygensnippet:: docs/snippets/gpu/context_sharing_va.cpp
:language: cpp
:fragment: context_sharing_va
@endsphinxdirective
This example uses the OpenCL context obtained from a compiled model object.
@snippet snippets/gpu/context_sharing.cpp context_sharing_get_from_ov
@sphinxdirective
.. raw:: html
</div>
@endsphinxdirective
@sphinxdirective
.. raw:: html
<div class="collapsible-section" data-title="Running GPU Plugin Inference within User-Supplied Shared Context">
@endsphinxdirective
@snippet snippets/gpu/context_sharing.cpp context_sharing_user_handle
@sphinxdirective
.. raw:: html
</div>
@endsphinxdirective
@sphinxdirective
.. raw:: html
<div class="collapsible-section" data-title="Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux">
@endsphinxdirective
@snippet snippets/gpu/context_sharing_va.cpp context_sharing_va
@sphinxdirective
.. raw:: html
</div>
@endsphinxdirective
## See Also

View File

@ -79,7 +79,7 @@ html_theme = "openvino_sphinx_theme"
html_theme_path = ['_themes']
html_theme_options = {
"navigation_depth": 6,
"navigation_depth": 8,
"show_nav_level": 2,
"use_edit_page_button": True,
"github_url": "https://github.com/openvinotoolkit/openvino",

View File

@ -24,7 +24,9 @@ file(GLOB SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/*.cpp"
if (NOT TARGET OpenCL::OpenCL)
list(REMOVE_ITEM SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/gpu/context_sharing_va.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/context_sharing.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_two_planes.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_single_plane.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_to_gray.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/queue_sharing.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/gpu/remote_objects_creation.cpp")
endif()

View File

@ -0,0 +1,48 @@
#include <openvino/runtime/core.hpp>
#define OV_GPU_USE_OPENCL_HPP
#include <openvino/runtime/intel_gpu/ocl/ocl.hpp>
#include <openvino/runtime/intel_gpu/properties.hpp>
#include <openvino/core/preprocess/pre_post_process.hpp>
ov::intel_gpu::ocl::ClImage2DTensor get_yuv_tensor();
int main() {
ov::Core core;
auto model = core.read_model("model.xml");
//! [init_preproc]
using namespace ov::preprocess;
auto p = PrePostProcessor(model);
p.input().tensor().set_element_type(ov::element::u8)
.set_color_format(ColorFormat::NV12_SINGLE_PLANE)
.set_memory_type(ov::intel_gpu::memory_type::surface);
p.input().preprocess().convert_color(ov::preprocess::ColorFormat::BGR);
p.input().model().set_layout("NCHW");
auto model_with_preproc = p.build();
//! [init_preproc]
auto compiled_model = core.compile_model(model_with_preproc, "GPU");
auto context = compiled_model.get_context().as<ov::intel_gpu::ocl::ClContext>();
auto infer_request = compiled_model.create_infer_request();
{
//! [single_batch]
auto input_yuv = model_with_preproc->input(0);
ov::intel_gpu::ocl::ClImage2DTensor yuv_tensor = get_yuv_tensor();
infer_request.set_tensor(input_yuv.get_any_name(), yuv_tensor);
infer_request.infer();
//! [single_batch]
}
{
auto yuv_tensor_0 = get_yuv_tensor();
auto yuv_tensor_1 = get_yuv_tensor();
//! [batched_case]
auto input_yuv = model_with_preproc->input(0);
std::vector<ov::Tensor> yuv_tensors = {yuv_tensor_0, yuv_tensor_1};
infer_request.set_tensors(input_yuv.get_any_name(), yuv_tensors);
infer_request.infer();
//! [batched_case]
}
return 0;
}

View File

@ -0,0 +1,50 @@
#define OV_GPU_USE_OPENCL_HPP
#include <openvino/runtime/intel_gpu/ocl/ocl.hpp>
#include <openvino/runtime/intel_gpu/properties.hpp>
#include <openvino/core/preprocess/pre_post_process.hpp>
ov::intel_gpu::ocl::ClImage2DTensor get_y_tensor();
ov::intel_gpu::ocl::ClImage2DTensor get_uv_tensor();
int main() {
ov::Core core;
auto model = core.read_model("model.xml");
//! [init_preproc]
using namespace ov::preprocess;
auto p = PrePostProcessor(model);
p.input().tensor().set_element_type(ov::element::u8)
.set_layout("NHWC")
.set_memory_type(ov::intel_gpu::memory_type::surface);
p.input().model().set_layout("NCHW");
auto model_with_preproc = p.build();
//! [init_preproc]
auto compiled_model = core.compile_model(model_with_preproc, "GPU");
auto remote_context = compiled_model.get_context().as<ov::intel_gpu::ocl::ClContext>();
auto input = model->input(0);
auto infer_request = compiled_model.create_infer_request();
{
//! [single_batch]
cl::Image2D img_y_plane;
auto input_y = model_with_preproc->input(0);
auto remote_y_tensor = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane);
infer_request.set_tensor(input_y.get_any_name(), remote_y_tensor);
infer_request.infer();
//! [single_batch]
}
{
//! [batched_case]
cl::Image2D img_y_plane_0, img_y_plane_l;
auto input_y = model_with_preproc->input(0);
auto remote_y_tensor_0 = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane_0);
auto remote_y_tensor_1 = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane_l);
std::vector<ov::Tensor> y_tensors = {remote_y_tensor_0, remote_y_tensor_1};
infer_request.set_tensors(input_y.get_any_name(), y_tensors);
infer_request.infer();
//! [batched_case]
}
return 0;
}