* [GPU] Use 4-dim directly for onednn in gemm
We were collapsing n-dim into 3d for onednn gemm, But it is not necessary, up to 4d.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* [GPU] Added shape agnostic kernel for fully_connected_gpu_imad
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add fully_connected_gpu_imad shape agnostic TCs for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply comments
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* fix 1
* fix 2-10
* fixed code style
* fixed win plugin
* fixed linux plugin
* fixed a part of tests
* fixed test fot linux
* fixed pooling_gpu_test fot linux
* fixed pooling_gpu_test fot linux
* fix after review and enable wd4267 in makefile
* fix after review
* errors of unit test are fixed
* IR serialization for dynamic models
* added ShapeOf1To3 transformation pass
* fixed input output type mismatch
* removed unnecessary codes
* moved ConvertShapeOf1To3 from common to GPU plugin
* updated copyright year
* fixed build errors
* Reduce the number of validate and infer types in ConvertPrecision
Currently, ConvertPrecision pass frequently runs validate and infer types.
This is due to the fact that it iterates over every precision pair, then over
the whole model followed by validate and infer types.
The proposed solution is to iterate over the model: for each node iterate
over precisions array, update the node if required followed by validate and
infer types.
Ticket: 81311
* use map
* clang format
* move enum hasher
* fix gpu
* revalidate
* reinvalidate if node has changed
* remove validate for input prec changes
* fix gpu
* review
* find
* fix pytorch case
* revalidate
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* Ability to provide several source dirs for ncc-style checks
* Fixed include headers; added NCC to TF common
* Fixed NCC for frontends
* Fixed NCC for frontends
* Extra fixes
* Fixest push --f
* Clang-format
* Apply comments
* Add an option to specify required clang-format version
* Update src/frontends/tensorflow/src/decoder_proto.cpp
* Update src/frontends/tensorflow/src/decoder_proto.cpp
* [GPU] Change lws to avoid synchronization issue in nonzero_count (#16116)
* [GPU] Add unit test (#16116)
* [GPU] update count_nonzero_ref kernel(#16116)
- Support the case total data size exceed max work group size
- Add dynamic shape test case
* [GPU] Change input indexing calculation and add random input generator in unit test (#16116)
* [GPU] update random generation input funciton in nonzero_count (#16116)
* [GPU] update unit test (#16116)
* [GPU] cldnn unit test: update random generation function for other test failure (fusings_gpu/conv_fp32_multi_eltwise_quantization.basic/0) (#16116)
* [GPU] Enabled ComparisonLayerTest in single layer tests.
It seems that before, these tests were disabled cause of some failures. Now I cannot see any errors, so I just enabled all of them.
* [GPU] Run clang format for comparison single layer tests.
* [GPU] Added handling of f16 type to IsInfLayerTest.
* [GPU] Added single-layer tests for IsFinite and IsNaN operations.
* [GPU] Added single-layer test for IsInf operation.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions.
But notice that currently, the activation kernel support only the same output data type as the input data type. So an additional reorder would be needed to convert to the correct output data type for these ops. Also worth noting is that activation functions are fused in reorder kernel. But for now, it's not working for these ops because in reorder activation call, there is a hard conversion of input data to output data type before activation. I don't know why it's added there, but it breaks fusion. So need to fix this activation fusion or disable this fusion for these ops.
* Revert "[GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions."
This reverts commit 3f9ffe617ecddce6dbbcdeab9584a7ddeb6d1845.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as eltwise op.
* [GPU] Changed CLDNN_ERROR_MESSAGE to OPENVINO_ASSERT in check_inputs_count method.
* [GPU] Minor fix for dynamic bert-base-uncased-qqp
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix to check full tensor only for static shape during creating onednn gemm
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
- Previously, PR15386 changed allocation of memory of primitives which are to be used as shape infer dep to host memory, for better shape infer perf.
- However this causes cache coherence issue in dGPU.
- Reverting this change so that the memory will be allocated to devicet
* [dGPU] Enable stable diffusion
+ Prevent to fuse swish into oneDNN reorder.
+ Makes concat explicitly if batch size is greater than 1 and the siblings are oneDNN impl.
* [GPU] Added shape agnostic optimized SoftMax kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update SoftmaxKernelBaseBF::Validate policy for shape agnostic kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add softmax_gpu_bf shape agnostic TC for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix failed TCs for ie-tests-linux-ubuntu20-gpu
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update to use stack array instead of global buffer
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Remove global buffer usage completely
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add #undef directive
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>