update NV12 (#15370)

* update NV12 docs and snippets add single-plane input information create single-plane cpp snippet menu fix update formatting for sphinx directives Co-Authored-By: Ilya Churaev <ilyachur@gmail.com> Co-Authored-By: Vladimir Paramuzov <vladimir.paramuzov@intel.com> * additional snippet fixes --------- Co-authored-by: Ilya Churaev <ilyachur@gmail.com> Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
2023-02-28 14:58:08 +01:00 · 2023-02-28 14:58:08 +01:00 · f9a8d9132d
commit f9a8d9132d
parent 4dff2d1c60
8 changed files with 292 additions and 187 deletions
--- a/docs/OV_Runtime_UG/supported_plugins/GPU.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GPU.md
@ -222,11 +222,11 @@ The GPU plugin has the following additional preprocessing options:
@sphinxtabset

@sphinxtab{C++}
-@snippet docs/snippets/gpu/preprocessing.cpp init_preproc
+@snippet docs/snippets/gpu/preprocessing_nv12_two_planes.cpp init_preproc
@endsphinxtab

@sphinxtab{Python}
-@snippet docs/snippets/gpu/preprocessing.py init_preproc
+@snippet docs/snippets/gpu/preprocessing_nv12_two_planes.py init_preproc
@endsphinxtab

@endsphinxtabset
--- a/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
@ -3,8 +3,11 @@
 The GPU plugin implementation of the `ov::RemoteContext` and `ov::RemoteTensor` interfaces supports GPU
 pipeline developers who need video memory sharing and interoperability with existing native APIs, 
 such as OpenCL, Microsoft DirectX, or VAAPI.
-Using these interfaces allows you to avoid any memory copy overhead when plugging OpenVINO™ inference
-into an existing GPU pipeline. It also enables OpenCL kernels to participate in the pipeline to become
+
+The `ov::RemoteContext` and `ov::RemoteTensor` interface implementation targets the need for memory sharing and
+interoperability with existing native APIs, such as OpenCL, Microsoft DirectX, and VAAPI.
+They allow you to avoid any memory copy overhead when plugging OpenVINO™ inference
+into an existing GPU pipeline. They also enable OpenCL kernels to participate in the pipeline to become
 native buffer consumers or producers of the OpenVINO™ inference.

 There are two interoperability scenarios supported by the Remote Tensor API:
@ -23,7 +26,7 @@ and functions that consume or produce native handles directly.
 ## Context Sharing Between Application and GPU Plugin

 GPU plugin classes that implement the `ov::RemoteContext` interface are responsible for context sharing.
-Obtaining a context object is the first step of sharing pipeline objects.
+Obtaining a context object is the first step in sharing pipeline objects.
 The context object of the GPU plugin directly wraps OpenCL context, setting a scope for sharing the
 `ov::CompiledModel` and `ov::RemoteTensor` objects. The `ov::RemoteContext` object can be either created on top of
 an existing handle from a native API or retrieved from the GPU plugin.
@ -37,60 +40,49 @@ additional parameter.
 To create the `ov::RemoteContext` object for user context, explicitly provide the context to the plugin using constructor for one
 of `ov::RemoteContext` derived classes.

-@sphinxtabset
+@sphinxdirective

-@sphinxtab{Linux}
+.. tab:: Linux

-@sphinxtabset
+   .. tab:: Create from cl_context
+ 
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_cl_context

-@sphinxtab{Create from cl_context}
+   .. tab:: Create from cl_queue

-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_cl_queue

-@endsphinxtab
+   .. tab:: Create from VADisplay

-@sphinxtab{Create from cl_queue}
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_va_display

-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue
+.. tab:: Windows

-@endsphinxtab
+   .. tab:: Create from cl_context

-@sphinxtab{Create from VADisplay}
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_cl_context

-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_va_display
+   .. tab:: Create from cl_queue

-@endsphinxtab
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_cl_queue

-@endsphinxtabset
-
-@endsphinxtab
-
-@sphinxtab{Windows}
-
-@sphinxtabset
-
-@sphinxtab{Create from cl_context}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_context
-
-@endsphinxtab
-
-@sphinxtab{Create from cl_queue}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_cl_queue
-
-@endsphinxtab
-
-@sphinxtab{Create from ID3D11Device}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp context_from_d3d_device
-
-@endsphinxtab
-
-@endsphinxtabset
-
-@endsphinxtabset
+   .. tab:: Create from ID3D11Device
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: context_from_d3d_device

+@endsphinxdirective

 ### Getting RemoteContext from the Plugin
 If you do not provide any user context, the plugin uses its default internal context.
@ -100,21 +92,21 @@ Once the plugin options have been changed, the internal context is replaced by t

 To request the current default context of the plugin, use one of the following methods:

-@sphinxtabset
+@sphinxdirective

-@sphinxtab{Get context from Core}
+.. tab:: Get context from Core

-@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_core
+   .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+      :language: cpp
+      :fragment: default_context_from_core

-@endsphinxtab
+.. tab:: Get context from compiled model

-@sphinxtab{Batching via throughput hint}
+   .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+      :language: cpp
+      :fragment: default_context_from_model

-@snippet docs/snippets/gpu/remote_objects_creation.cpp default_context_from_model
-
-@endsphinxtab
-
-@endsphinxtabset
+@endsphinxdirective

 ## Memory Sharing Between Application and GPU Plugin

@ -126,108 +118,153 @@ of the `ov::RemoteContext` sub-classes.
 `ov::intel_gpu::ocl::ClContext` has multiple overloads of `create_tensor` methods which allow to wrap pre-allocated native handles with the `ov::RemoteTensor`
 object or request plugin to allocate specific device memory. For more details, see the code snippets below:

-@sphinxtabset
+@sphinxdirective

-@sphinxtab{Wrap native handles}
+.. tab:: Wrap native handles

-@sphinxtabset
+   .. tab:: USM pointer
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: wrap_usm_pointer

-@sphinxtab{USM pointer}
+   .. tab:: cl_mem
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: wrap_cl_mem

-@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_usm_pointer
+   .. tab:: cl::Buffer
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: wrap_cl_buffer         

-@endsphinxtab
+   .. tab:: cl::Image2D
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: wrap_cl_image   

-@sphinxtab{cl_mem}
+   .. tab:: biplanar NV12 surface
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: wrap_nv12_surface   

-@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_mem
+.. tab:: Allocate device memory

-@endsphinxtab
+   .. tab:: USM host memory
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: allocate_usm_host

-@sphinxtab{cl::Buffer}
+   .. tab:: USM device memory
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: allocate_usm_device

-@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_buffer
+   .. tab:: cl::Buffer
+   
+      .. doxygensnippet:: docs/snippets/gpu/remote_objects_creation.cpp
+         :language: cpp
+         :fragment: allocate_cl_buffer

-@endsphinxtab
-
-@sphinxtab{cl::Image2D}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_cl_image
-
-@endsphinxtab
-
-@sphinxtab{biplanar NV12 surface}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp wrap_nv12_surface
-
-@endsphinxtab
-
-@endsphinxtabset
-@endsphinxtab
-
-@sphinxtab{Allocate device memory}
-
-@sphinxtabset
-
-@sphinxtab{USM host memory}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_host
-
-@endsphinxtab
-
-@sphinxtab{USM device memory}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_usm_device
-
-@endsphinxtab
-
-@sphinxtab{cl::Buffer}
-
-@snippet docs/snippets/gpu/remote_objects_creation.cpp allocate_cl_buffer
-
-@endsphinxtab
-
-@endsphinxtabset
-
-@endsphinxtab
-
-@endsphinxtabset
+@endsphinxdirective

 The `ov::intel_gpu::ocl::D3DContext` and `ov::intel_gpu::ocl::VAContext` classes are derived from `ov::intel_gpu::ocl::ClContext`.
 Therefore, they provide the functionality described above and extend it
 to allow creation of `ov::RemoteTensor` objects from `ID3D11Buffer`, `ID3D11Texture2D` pointers or the `VASurfaceID` handle respectively.

+
 ## Direct NV12 Video Surface Input

-To support the direct consumption of a hardware video decoder output, the plugin accepts two-plane video
-surfaces as arguments for the `create_tensor_nv12()` function, which creates a pair of `ov::RemoteTensor`
-objects which represent the Y and UV planes.
+To support the direct consumption of a hardware video decoder output, the GPU plugin accepts:

-To ensure that the plugin generates the correct execution graph for the NV12 dual-plane input, static preprocessing
+* Two-plane NV12 video surface input - calling the `create_tensor_nv12()` function creates 
+  a pair of `ov::RemoteTensor` objects, representing the Y and UV planes. 
+* Single-plane NV12 video surface input - calling the `create_tensor()` function creates one 
+  `ov::RemoteTensor` object, representing the Y and UV planes at once (Y elements before UV elements).
+* NV12 to Grey video surface input conversion - calling the `create_tensor()` function creates one 
+  `ov::RemoteTensor` object, representing only the Y plane.
+
+To ensure that the plugin generates a correct execution graph, static preprocessing
 should be added before model compilation:

-@snippet snippets/gpu/preprocessing.cpp init_preproc
+@sphinxdirective

-Since the `ov::intel_gpu::ocl::ClImage2DTensor` and its derived classes do not support batched surfaces, if batching and surface sharing are required
-at the same time, inputs need to be set via the `ov::InferRequest::set_tensors` method with vector of shared surfaces for each plane:
+.. tab:: two-plane

-@sphinxtabset
+    .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
+       :language: cpp
+       :fragment: [init_preproc]

-@sphinxtab{Single batch}
+.. tab:: single-plane

-@snippet docs/snippets/gpu/preprocessing.cpp single_batch
+    .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
+       :language: cpp
+       :fragment: [init_preproc]

-@endsphinxtab
+.. tab:: NV12 to Grey

-@sphinxtab{Multiple batches}
+    .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
+       :language: cpp
+       :fragment: [init_preproc]

-@snippet docs/snippets/gpu/preprocessing.cpp batched_case
+@endsphinxdirective

-@endsphinxtab

-@endsphinxtabset
+Since the `ov::intel_gpu::ocl::ClImage2DTensor` and its derived classes do not support batched surfaces, 
+if batching and surface sharing are required at the same time, 
+inputs need to be set via the `ov::InferRequest::set_tensors` method with vector of shared surfaces for each plane:


+@sphinxdirective
+
+.. tab:: Single Batch
+
+   .. tab:: two-plane
+
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
+         :language: cpp
+         :fragment: single_batch
+
+   .. tab:: single-plane
+   
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
+         :language: cpp
+         :fragment: single_batch
+
+   .. tab:: NV12 to Grey
+
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
+         :language: cpp
+         :fragment: single_batch
+
+.. tab:: Multiple Batches
+
+   .. tab:: two-plane
+
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
+         :language: cpp
+         :fragment: batched_case
+
+   .. tab:: single-plane
+                                            
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
+         :language: cpp
+         :fragment: batched_case
+
+   .. tab:: NV12 to Grey
+
+      .. doxygensnippet:: docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
+         :language: cpp
+         :fragment: batched_case
+
+@endsphinxdirective
+
 I420 color format can be processed in a similar way

 ## Context & Queue Sharing
@ -242,18 +279,12 @@ This sharing mechanism allows performing pipeline synchronization on the app sid
 on waiting for the completion of inference. The pseudo-code may look as follows:

@sphinxdirective
-.. raw:: html

-   <div class="collapsible-section" data-title="Queue and context sharing example">
+.. dropdown:: Queue and context sharing example

-@endsphinxdirective
-
-@snippet snippets/gpu/queue_sharing.cpp queue_sharing
-
-@sphinxdirective
-.. raw:: html
-
-   </div>
+   .. doxygensnippet:: docs/snippets/gpu/queue_sharing.cpp
+      :language: cpp
+      :fragment: queue_sharing

@endsphinxdirective

@ -282,60 +313,34 @@ For possible low-level properties and their description, refer to the `openvino/

 To see pseudo-code of usage examples, refer to the sections below.

-> **NOTE**: For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above.
-
-
@sphinxdirective
-.. raw:: html

-   <div class="collapsible-section" data-title="OpenCL Kernel Execution on a Shared Buffer">
+.. NOTE::
+   
+   For low-level parameter usage examples, see the source code of user-side wrappers from the include files mentioned above.
+
+.. dropdown:: OpenCL Kernel Execution on a Shared Buffer
+
+   This example uses the OpenCL context obtained from a compiled model object.
+
+   .. doxygensnippet:: docs/snippets/gpu/context_sharing.cpp
+      :language: cpp
+      :fragment: context_sharing_get_from_ov
+
+.. dropdown:: Running GPU Plugin Inference within User-Supplied Shared Context
+
+   .. doxygensnippet:: docs/snippets/gpu/context_sharing.cpp
+      :language: cpp
+      :fragment: context_sharing_user_handle
+
+.. dropdown:: Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux
+
+   .. doxygensnippet:: docs/snippets/gpu/context_sharing_va.cpp
+      :language: cpp
+      :fragment: context_sharing_va

@endsphinxdirective

-This example uses the OpenCL context obtained from a compiled model object.
-
-@snippet snippets/gpu/context_sharing.cpp context_sharing_get_from_ov
-
-@sphinxdirective
-.. raw:: html
-
-   </div>
-
-@endsphinxdirective
-
-
-@sphinxdirective
-.. raw:: html
-
-   <div class="collapsible-section" data-title="Running GPU Plugin Inference within User-Supplied Shared Context">
-
-@endsphinxdirective
-
-@snippet snippets/gpu/context_sharing.cpp context_sharing_user_handle
-
-@sphinxdirective
-.. raw:: html
-
-   </div>
-
-@endsphinxdirective
-
-
-@sphinxdirective
-.. raw:: html
-
-   <div class="collapsible-section" data-title="Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux">
-
-@endsphinxdirective
-
-@snippet snippets/gpu/context_sharing_va.cpp context_sharing_va
-
-@sphinxdirective
-.. raw:: html
-
-   </div>
-
-@endsphinxdirective

 ## See Also

--- a/docs/conf.py
+++ b/docs/conf.py
@ -79,7 +79,7 @@ html_theme = "openvino_sphinx_theme"
 html_theme_path = ['_themes']

 html_theme_options = {
-    "navigation_depth": 6,
+    "navigation_depth": 8,
    "show_nav_level": 2,
    "use_edit_page_button": True,
    "github_url": "https://github.com/openvinotoolkit/openvino",
--- a/docs/snippets/CMakeLists.txt
+++ b/docs/snippets/CMakeLists.txt
@ -24,7 +24,9 @@ file(GLOB SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/*.cpp"
 if (NOT TARGET OpenCL::OpenCL)
    list(REMOVE_ITEM SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/gpu/context_sharing_va.cpp"
                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/context_sharing.cpp"
-                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing.cpp"
+                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_two_planes.cpp"
+                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_single_plane.cpp"
+                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/preprocessing_nv12_to_gray.cpp"                             
                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/queue_sharing.cpp"
                             "${CMAKE_CURRENT_SOURCE_DIR}/gpu/remote_objects_creation.cpp")
 endif()
--- a/docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
+++ b/docs/snippets/gpu/preprocessing_nv12_single_plane.cpp
@ -0,0 +1,48 @@
+#include <openvino/runtime/core.hpp>
+#define OV_GPU_USE_OPENCL_HPP
+#include <openvino/runtime/intel_gpu/ocl/ocl.hpp>
+#include <openvino/runtime/intel_gpu/properties.hpp>
+#include <openvino/core/preprocess/pre_post_process.hpp>
+
+ov::intel_gpu::ocl::ClImage2DTensor get_yuv_tensor();
+
+int main() {
+    ov::Core core;
+    auto model = core.read_model("model.xml");
+
+    //! [init_preproc]
+    using namespace ov::preprocess;
+    auto p = PrePostProcessor(model);
+    p.input().tensor().set_element_type(ov::element::u8)
+                      .set_color_format(ColorFormat::NV12_SINGLE_PLANE)
+                      .set_memory_type(ov::intel_gpu::memory_type::surface);
+    p.input().preprocess().convert_color(ov::preprocess::ColorFormat::BGR);
+    p.input().model().set_layout("NCHW");
+    auto model_with_preproc = p.build();
+    //! [init_preproc]
+
+    auto compiled_model = core.compile_model(model_with_preproc, "GPU");
+    auto context = compiled_model.get_context().as<ov::intel_gpu::ocl::ClContext>();
+    auto infer_request = compiled_model.create_infer_request();
+
+{
+    //! [single_batch]
+    auto input_yuv = model_with_preproc->input(0);
+    ov::intel_gpu::ocl::ClImage2DTensor yuv_tensor = get_yuv_tensor();
+    infer_request.set_tensor(input_yuv.get_any_name(), yuv_tensor);
+    infer_request.infer();
+    //! [single_batch]
+}
+
+{
+    auto yuv_tensor_0 = get_yuv_tensor();
+    auto yuv_tensor_1 = get_yuv_tensor();
+    //! [batched_case]
+    auto input_yuv = model_with_preproc->input(0);
+    std::vector<ov::Tensor> yuv_tensors = {yuv_tensor_0, yuv_tensor_1};
+    infer_request.set_tensors(input_yuv.get_any_name(), yuv_tensors);
+    infer_request.infer();
+    //! [batched_case]
+}
+    return 0;
+}
--- a/docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
+++ b/docs/snippets/gpu/preprocessing_nv12_to_gray.cpp
@ -0,0 +1,50 @@
+#define OV_GPU_USE_OPENCL_HPP
+#include <openvino/runtime/intel_gpu/ocl/ocl.hpp>
+#include <openvino/runtime/intel_gpu/properties.hpp>
+#include <openvino/core/preprocess/pre_post_process.hpp>
+
+ov::intel_gpu::ocl::ClImage2DTensor get_y_tensor();
+ov::intel_gpu::ocl::ClImage2DTensor get_uv_tensor();
+
+int main() {
+    ov::Core core;
+    auto model = core.read_model("model.xml");
+
+    //! [init_preproc]
+    using namespace ov::preprocess;
+    auto p = PrePostProcessor(model);
+    p.input().tensor().set_element_type(ov::element::u8)
+                      .set_layout("NHWC")
+                      .set_memory_type(ov::intel_gpu::memory_type::surface);
+    p.input().model().set_layout("NCHW");
+    auto model_with_preproc = p.build();
+    //! [init_preproc]
+
+    auto compiled_model = core.compile_model(model_with_preproc, "GPU");
+    auto remote_context = compiled_model.get_context().as<ov::intel_gpu::ocl::ClContext>();
+    auto input = model->input(0);
+    auto infer_request = compiled_model.create_infer_request();
+
+{
+    //! [single_batch]
+    cl::Image2D img_y_plane;
+    auto input_y = model_with_preproc->input(0);
+    auto remote_y_tensor = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane);
+    infer_request.set_tensor(input_y.get_any_name(), remote_y_tensor);
+    infer_request.infer();
+    //! [single_batch]
+}
+
+{
+    //! [batched_case]
+    cl::Image2D img_y_plane_0, img_y_plane_l;
+    auto input_y = model_with_preproc->input(0);
+    auto remote_y_tensor_0 = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane_0);
+    auto remote_y_tensor_1 = remote_context.create_tensor(input_y.get_element_type(), input.get_shape(), img_y_plane_l);
+    std::vector<ov::Tensor> y_tensors = {remote_y_tensor_0, remote_y_tensor_1};
+    infer_request.set_tensors(input_y.get_any_name(), y_tensors);
+    infer_request.infer();
+    //! [batched_case]
+}
+    return 0;
+}
--- a/docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
+++ b/docs/snippets/gpu/preprocessing_nv12_two_planes.cpp
--- a/docs/snippets/gpu/preprocessing_nv12_two_planes.py
+++ b/docs/snippets/gpu/preprocessing_nv12_two_planes.py