If gemm input dimensions are not multiple of 16 and any of transpose_a/transpose_b attribute is set - cldnn picks 'gemm_ref' kernel in favor of faster 'gemm_tiled_opt'. By emplacing explicit permute operation on the gemm input that it requires, we make cldnn to pick 'gemm_tiled_opt', which in result improves performance. For some input shapes, transpose(s) + gemm_tiled_opt can be slower than just gemm_ref. Based on benchmarks - the cutoff point was set for inputs shapes > (64, 64). Ticket: 67271
OpenVINO Plugins
OpenVINO Plugins provide support for hardware devices.
The list of supported plugins: