In one of the network it was the following pipeline:
```
FullyConnected -> Reshape -> FullyConnected
```
And the output of Reshape wasn't in the same order as input for this
layer. I found that the problem was connected with format of the layers.
During optimization passes this pipeline was transformed to the
following:
```
FullyConnected -> Reorder -> Reshape -> Reorder -> FullyConnected
```
Both `FullyConnected` layers works with `yxfb` format. This is why
Reorder layer after the Reshape has output layout with format `yxfb` and
`reshape_in_layout.format` returns `yxfb` format. But in this case we
have to convert Reshape to `bfyx` format because in this case we won't
change the order of elements.
I replaced `reshape_in_layout.format` (which returns `yxfb`) and
explicitly set `bfyx` format.
JIRA: 35288
This extends resample optimization for 8-bit types that uses feature
packed to mode to process multiple features in one work-item to features
not being multiple of packing factor.
For nearest resampling it is safe to copy extra feature padding for
blocked formats, so this change only removes this condition.
The ExtractImagePatches operation collects patches from the input
tensor, as if applying a convolution. All extracted patches are stacked
in the depth dimension of the output.
JIRA: 30055
* Removed back phase transformations related to IRv7
* Fixed setting value for the input port using the 'set_value' method
* Removed front and middle phase transformations related to IRv7
* Cleanup the rest of the Model Optimizer transformations from IRv7 specific transformations
* Final cleanup of the deprecated IR v7 related code
* Removed 'blobs_as_input' usage in the Model Optimizer.
* Removed function '_fuse_add' from the Model Optimizer since it is not used anymore.
* Removed 'keep_in_IR' node attribute for FakeQuantize ops in the MO
* Disabled failing gpu_engine.user_context test
For b_fs_yx_fsv16 format in reference kernel features for dispatch are
rounded to multiple of 16. This change adds correct check in kernel to
return work-items that are inside this dispatch padding.
Previously those work-items could corrupt memory expected to be filled
with 0s, and for parametrized activation due to bounds checking with
modulo operator they could have been corrupting actual layer output.
Issue: CVS-27672
This change:
- extends concat in-place optimization for resample on input
- adds resample primitive int8 support for bilinear mode
- fixes some potential issues with offset calculations with in8
Implemented three operations: EmbeddingBagPackedSum,
EmbeddingBagOffsetsSum and EmbeddingSegmentsSum. These operations do
the same work but have a different format of inputs.
adds fusing support to all available pooling kernels
tests all possible input type/output type configurations
fixes minor bug in max pooling in pooling_gpu_test.cpp
fixed minor bug with yxbf format in pooling_gpu_ref and pooling_gpu_int8_ref kernels
fixes bug with b_fs_yx_fsv32 format in pooling_gpu kernel
resolves bug with max pooling accuracy missmatch in case of non zero pad end layer parameter
resolves average pooling accuracy missmatch in case of non zero pad end layer parameter
CumSum performs cumulative summation of the input elements along the given axis.
Details:
By default, it will do the sum inclusively meaning the first element is
copied as is. Through an "exclusive" attribute, this behavior can change
to exclude the first element. It can also perform summation in the
opposite direction of the axis. For that, set "reverse" attribute to
true.
JIRA: 29994