To enable operations not supported by OpenVINO™ out of the box, you may need an extension for OpenVINO operation set, and a custom kernel for the device you will target. This article describes custom kernel support for the GPU device.
The GPU codepath abstracts many details about OpenCL. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
* Include a section with your kernels into the automatically-loaded ``<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml`` file.
* Call the ``:ref:`ov::Core::set_property() <doxid-classov_1_1_core_1aa953cb0a1601dbc9a34ef6ba82b8476e>``` method from your application with the ``"CONFIG_FILE"`` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
All OpenVINO samples, except the trivial ``hello_classification``, and most Open Model Zoo demos
feature a dedicated command-line option ``-c`` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
- Name of the file containing OpenCL source code. The path is relative to your executable. Multiple source nodes will have their sources concatenated in order.
The ``Tensor`` node configures a single input or output tensor.
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``arg-index``
- (1)
- 0-based index in the entry function arguments to be bound to.
* - ``type``
- (1)
- ``input`` or ``output``
* - ``port-index``
- (1)
- 0-based index in the operation input/output ports in the OpenVINO IR
* - ``format``
- (0/1)
- Data layout declaration for the tensor. Accepted values: ``BFYX`` , ``BYXF`` , ``YXFB`` , ``FYXB`` , and same values in all lowercase. Default value: ``BFYX``.
CompilerOptions Node and Sub-Node Structure
+++++++++++++++++++++++++++++++++++++++++++
The ``CompilerOptions`` node configures the compilation flags for the OpenCL
- An array of up to three integers or formulas for defining OpenCL work-sizes to be used during execution. The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,%. All operators are evaluated in integer arithmetic. Default value: ``global=”B\*F\*Y\*X” local=””``
* - ``dim``
- (0/1)
- A tensor to take the work-size from. Accepted values: ``input N`` , ``output`` , where ``N`` is an index of input tensor starting with 0. Default value: ``output``
For an example, see `Example Kernel <#example-kernel>`__.
.. list-table::
:header-rows: 1
* - Name
- Value
* - ``NUM_INPUTS``
- Number of the input tensors bound to this kernel
* - ``GLOBAL_WORKSIZE``
- An array of global work sizes used to execute this kernel
* - ``GLOBAL_WORKSIZE_SIZE``
- The size of the ``GLOBAL_WORKSIZE`` array
* - ``LOCAL_WORKSIZE``
- An array of local work sizes used to execute this kernel
* - ``LOCAL_WORKSIZE_SIZE``
- The size of the ``LOCAL_WORKSIZE`` array
* - ``<TENSOR>_DIMS``
- An array of the tensor dimension sizes. Always ordered as ``BFYX``
* - ``<TENSOR>_DIMS_SIZE``
- The size of the ``<TENSOR>_DIMS`` array.
* - ``<TENSOR>_TYPE``
- The datatype of the tensor: ``float`` , ``half`` , or ``char``
* - ``<TENSOR>_FORMAT_<TENSOR_FORMAT>``
- The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with ``#ifdef/#endif`` .
* - ``<TENSOR>_LOWER_PADDING``
- An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.
* - ``<TENSOR>_LOWER_PADDING_SIZE``
- The size of the ``<TENSOR>_LOWER_PADDING`` array
* - ``<TENSOR>_UPPER_PADDING``
- An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX.
* - ``<TENSOR>_UPPER_PADDING_SIZE``
- The size of the ``<TENSOR>_UPPER_PADDING`` array
* - ``<TENSOR>_PITCHES``
- The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.
* - ``<TENSOR>_PITCHES_SIZE``
- The size of the ``<TENSOR>_PITCHES`` array
* - ``<TENSOR>_OFFSET``
- The number of elements from the start of the tensor to the first valid element, bypassing the lower padding.
All ``<TENSOR>`` values are automatically defined for every tensor
bound to this operation, such as ``INPUT0``, ``INPUT1``, and ``OUTPUT0``, as shown
// neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml
output[out_id] = value <0?value*neg_slope:value;
}
.. _debugging-tips:
.. note::
As described in the previous section, all items such as the ``INPUT0_TYPE`` are actually defined as OpenCL (pre-)compiler inputs by OpenVINO for efficiency reasons. See the `Debugging Tips <#debugging-tips>`__ below for information on debugging the results.
Debugging Tips
##############
**Using ``printf`` in the OpenCL™ Kernels**.
To debug the specific values, use ``printf`` in your kernels.