* Move Convolution and ConvolutionBackpropData ref impls into separate files. * Add convolution unit tests. * New convolution reference implementation. * Remove unused convolution ref impl argument. * Fix style. * Revert "Remove unused convolution ref impl argument." This reverts commit739065d0d0. * WA for arm-plugin: additional include with ConvolutionBackpropData. * Style format in Convolution SLT CPU instantiation. * Add 1D Convolution SLT CPU tests. * Add Convolution Serialization SLT. * Update source banners with 2021 date. * Specification review. * Readability improvement in padding detection. * Refactoring regarding Tensor usage. * Iteration over tensor slices made more readable. * Code refactored to use only one convolution implementation. 3D convolution is used to compute also in 1D & 2D case (parameters, inputs and filters shapes are adjusted accordingly). * Removed Tensor abstraction. * Name unnamed namespace as convolution_details. * Refactoring: replaced std::next + negative index with std::prev. * Specification refactoring. * Revert "Name unnamed namespace as convolution_details." This reverts commitcea526ec49. * Added new convolution() overload. * Fix legacy convolution() overload (needed for kmb-plugin). * Reduced number of template type arguments in convolution ref impl. * Added 'output' section in Convolution spec. * Remove floating round type configuration.
6.7 KiB
Convolution
Versioned name: Convolution-1
Category: Convolution
Short description: Computes 1D, 2D or 3D convolution (cross-correlation to be precise) of input and kernel tensors.
Detailed description: Basic building block of convolution is a dot product of input patch and kernel. Whole operation consist of multiple such computations over multiple input patches and kernels. More thorough explanation can be found in Convolutional Neural Networks and Convolution operation.
For the convolutional layer, the number of output features in each dimension is calculated using the formula:
\f[
n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
\f]
The receptive field in each layer is calculated using the formulas:
- Jump in the output feature map:
\f[ j_{out} = j_{in} * s \f] - Size of the receptive field of output feature:
\f[ r_{out} = r_{in} + ( k - 1 ) * j_{in} \f] - Center position of the receptive field of the first output feature:
\f[ start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in} \f] - Output is calculated using the following formula: \f[ out = \sum_{i = 0}^{n}w_{i}x_{i} + b \f]
Attributes:
-
strides
- Description: strides is a distance (in pixels) to slide the filter on the feature map over the
(z, y, x)axes for 3D convolutions and(y, x)axes for 2D convolutions. For example, strides equal4,2,1means sliding the filter 4 pixel at a time over depth dimension, 2 over height dimension and 1 over width dimension. - Range of values: integer values starting from 0
- Type: int[]
- Default value: None
- Required: yes
- Description: strides is a distance (in pixels) to slide the filter on the feature map over the
-
pads_begin
- Description: pads_begin is a number of pixels to add to the beginning along each axis. For example, pads_begin equal
1,2means adding 1 pixel to the top of the input and 2 to the left of the input. - Range of values: integer values starting from 0
- Type: int[]
- Default value: None
- Required: yes
- Note: the attribute is ignored when auto_pad attribute is specified.
- Description: pads_begin is a number of pixels to add to the beginning along each axis. For example, pads_begin equal
-
pads_end
- Description: pads_end is a number of pixels to add to the ending along each axis. For example, pads_end equal
1,2means adding 1 pixel to the bottom of the input and 2 to the right of the input. - Range of values: integer values starting from 0
- Type: int[]
- Default value: None
- Required: yes
- Note: the attribute is ignored when auto_pad attribute is specified.
- Description: pads_end is a number of pixels to add to the ending along each axis. For example, pads_end equal
-
dilations
- Description: dilations denotes the distance in width and height between elements (weights) in the filter. For example, dilation equal
1,1means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. dilation equal2,2means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1. - Range of values: integer value starting from 0
- Type: int[]
- Default value: None
- Required: yes
- Description: dilations denotes the distance in width and height between elements (weights) in the filter. For example, dilation equal
-
auto_pad
- Description: auto_pad how the padding is calculated. Possible values:
- explicit - use explicit padding values from pads_begin and pads_end.
- same_upper - the input is padded to match the output size. In case of odd padding value an extra padding is added at the end.
- same_lower - the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning.
- valid - do not use padding.
- Type: string
- Default value: explicit
- Required: no
- Note: pads_begin and pads_end attributes are ignored when auto_pad is specified.
- Description: auto_pad how the padding is calculated. Possible values:
Inputs:
- 1: Input tensor of type T and rank 3, 4 or 5. Layout is NCZYX (number of batches, number of channels, spatial axes Z, Y, X). Required.
- 2: Kernel tensor of type T and rank 3, 4 or 5. Layout is OIZYX (number of output channels, number of input channels, spatial axes Z, Y, X). Required.
- Note: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute:
- 1D convolution (input tensors rank 3) means that there is only one spatial axis X
- 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X
- 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X
Outputs:
- 1: Output tensor of type T and rank 3, 4 or 5. Layout is NOZYX (number of batches, number of kernel output channels, spatial axes Z, Y, X).
Types:
- T: any floating point type.
Example:
1D Convolution
<layer type="Convolution" ...>
<data dilations="1" pads_begin="0" pads_end="0" strides="2" auto_pad="valid"/>
<input>
<port id="0">
<dim>1</dim>
<dim>5</dim>
<dim>128</dim>
</port>
<port id="1">
<dim>16</dim>
<dim>5</dim>
<dim>4</dim>
</port>
</input>
<output>
<port id="2" precision="FP32">
<dim>1</dim>
<dim>16</dim>
<dim>63</dim>
</port>
</output>
</layer>
2D Convolution
<layer type="Convolution" ...>
<data dilations="1,1" pads_begin="2,2" pads_end="2,2" strides="1,1" auto_pad="explicit"/>
<input>
<port id="0">
<dim>1</dim>
<dim>3</dim>
<dim>224</dim>
<dim>224</dim>
</port>
<port id="1">
<dim>64</dim>
<dim>3</dim>
<dim>5</dim>
<dim>5</dim>
</port>
</input>
<output>
<port id="2" precision="FP32">
<dim>1</dim>
<dim>64</dim>
<dim>224</dim>
<dim>224</dim>
</port>
</output>
</layer>
3D Convolution
<layer type="Convolution" ...>
<data dilations="2,2,2" pads_begin="0,0,0" pads_end="0,0,0" strides="3,3,3" auto_pad="explicit"/>
<input>
<port id="0">
<dim>1</dim>
<dim>7</dim>
<dim>320</dim>
<dim>320</dim>
<dim>320</dim>
</port>
<port id="1">
<dim>32</dim>
<dim>7</dim>
<dim>3</dim>
<dim>3</dim>
<dim>3</dim>
</port>
</input>
<output>
<port id="2" precision="FP32">
<dim>1</dim>
<dim>32</dim>
<dim>106</dim>
<dim>106</dim>
<dim>106</dim>
</port>
</output>
</layer>