* [GPU] Change lws to avoid synchronization issue in nonzero_count (#16116)
* [GPU] Add unit test (#16116)
* [GPU] update count_nonzero_ref kernel(#16116)
- Support the case total data size exceed max work group size
- Add dynamic shape test case
* [GPU] Change input indexing calculation and add random input generator in unit test (#16116)
* [GPU] update random generation input funciton in nonzero_count (#16116)
* [GPU] update unit test (#16116)
* [GPU] cldnn unit test: update random generation function for other test failure (fusings_gpu/conv_fp32_multi_eltwise_quantization.basic/0) (#16116)
* [TF FE] Convert a model with Framework nodes
Now the conversion pipeline will convert all unsupported operations to Framework nodes
It is done with a hope that sub-graphs with Framework Nodes will be cut in later stages
like auto-pruning.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
* Fix dynamic element type for FusedBatchNorm
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
* Fix build issue
* Continue translation in case translator limitation
* Change undefined to dynamic type
* Have one more change to dynamic type
* Change undefined to dynamic in Const translator
* Expect MO to handle dynamic type
* Exclude TransposeSinking pass if model contains Framework nodes
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* switched public Azure Linux build to clang
* Fixed GNA compilation
* Suppressed warning in GNA tests
* switched public Azure Linux build to clang
* Fixed GNA compilation
* Suppressed warning in GNA tests
* More fixes
* Skip test on CPU
* Update template plugin main documentation pages
* Update plugin documentation
* Add more documentation for method
* Register new doxygen groups
* Updated group
* Added ie group
* Fixed comments
* Reuse new implementation inside the old one
* Try to fix titles
* Fix class fields level
* [GPU] Enabled ComparisonLayerTest in single layer tests.
It seems that before, these tests were disabled cause of some failures. Now I cannot see any errors, so I just enabled all of them.
* [GPU] Run clang format for comparison single layer tests.
* [GPU] Added handling of f16 type to IsInfLayerTest.
* [GPU] Added single-layer tests for IsFinite and IsNaN operations.
* [GPU] Added single-layer test for IsInf operation.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions.
But notice that currently, the activation kernel support only the same output data type as the input data type. So an additional reorder would be needed to convert to the correct output data type for these ops. Also worth noting is that activation functions are fused in reorder kernel. But for now, it's not working for these ops because in reorder activation call, there is a hard conversion of input data to output data type before activation. I don't know why it's added there, but it breaks fusion. So need to fix this activation fusion or disable this fusion for these ops.
* Revert "[GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions."
This reverts commit 3f9ffe617ecddce6dbbcdeab9584a7ddeb6d1845.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as eltwise op.
* [GPU] Changed CLDNN_ERROR_MESSAGE to OPENVINO_ASSERT in check_inputs_count method.
* [GPU] Minor fix for dynamic bert-base-uncased-qqp
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix to check full tensor only for static shape during creating onednn gemm
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
- Previously, PR15386 changed allocation of memory of primitives which are to be used as shape infer dep to host memory, for better shape infer perf.
- However this causes cache coherence issue in dGPU.
- Reverting this change so that the memory will be allocated to devicet
* [TF FE] Support EmptyTensorList and TensorListPushBack operations
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Rename a script to generate the test model
* Correct test model generating script
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* flush fp32 subnormals to zero in IR
* style fix in test_offline_api.py
* simplified call of FlushFP32SubnormalsToZero: is called form offline_transformations.cpp
* reverted offline_transformations.py
* use fpclassify
* style-fix
* Update src/common/transformations/tests/common_optimizations/flush_fp32_subnormals_to_zero_test.cpp
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
---------
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* initial version of implementation
* styles applied
* fixed and registration
* add more unit tests
* fixed and in legacy opset
* review remarks
* refactor of version name range
* [dGPU] Enable stable diffusion
+ Prevent to fuse swish into oneDNN reorder.
+ Makes concat explicitly if batch size is greater than 1 and the siblings are oneDNN impl.