* Applied w/a to resolve softmax accuracy issue
The original impl resulted in accuracy issue if leftover is not aligned with subgroup size.
(e.g., for shape [1024, 306] where the lws = 32, itemsNum = 9, leftover = 18, subgroup size = 16)
In such a case, the result got wrong if subgroup block read/write is used.
As a w/a, not to use subgroup block read/write if leftover is not aligned with nsubgroup size.
However we can come up with better itenNum size / lefover handling in the follot bwing up work.
* Fix build error & minor revise
* Fix condition
* [GPU] Fix sub kernel ordering issue in kernels_cache (#16746)
* [GPU] Add unit test for sub kernel idx (#16746)
* [GPU]Follow up code review (#16746)
* [GPU] Skip kernel compilation when current node is optimized out in update_impl (#16746)
* [GPU]Code refactoring (#16746)
+ Bugfix bfyx_to_blocked_format kernel of reorder prim for doubl blocked format
+ issued format is bs_fs_yx_bsv16_fsv32. Added test-cases.
+ Fixed accuracy issue from check_accuracy_issue
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* improve SoftMax fusion
* style and unit-test fix
* more precise SoftMax unit-tests
* rewritten SoftMaxFusion with single matcher
* fixes for align_mixed_fp32_fp16_types_test.cpp and mark_subgraph_to_keep_in_mixed_precision_test.cpp
* add include for pass/pattern/op/or.hpp
* get rank only when necessary
* style-fix
* add comment why SoftmaxFusion is called manually
* fix copy_runtime_info
* [GPU] Add clDNN shape agnostic kernels usage as an initial impls for dGPU
* [GPU] Use layout as a key of weights cache, implement logic for weights cache capacity calculation based on available memory
* Remove suppression Wno-delete-non-abstract-non-virtual-dtor
* Fixed Allocator warning
* Suppress warning for GPU plugin
* Skip warning for GNA
* Fixed preprocessing
* Added virtual constructor for base plugin class
* Some fix for CPU
* Suppress for CPU
* Fixed any
* Fixed meta
* Disable warning for paddle
* Fixed Allocator tests
* Move suppress to paddle
* Fixed benchmark_app
* Fix failed unit-tests on dGPU
+ modified fully_connected_random_test_i8_3d not to have ambiguous
+ oneDNN does NOT support i64 type for reorder. Added exception.
+ bugfix in prepare_primitive_fusing about exception of activation function
+ Add exception logic for dynamic to select ocl type in is_node_for_onednn
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Review adaptive max pool shape inference
* Review AvgPool and MaxPool
* Review convolution operator
* Review GroupConvolution shape inference
* Review ConvolutionBackpropData operator
* Review GroupConvolutionBackpropData op
* Review BinaryConvolution operator
- add common bases for convolution ops
- refactor convolution ops
* Review DeformableConvolution operator
* Use new convolution shape_infer in GPU
* Fix build and test issues
* Correct set output spatial shape
in default constructed back prop convolutions
* The convolution shape_infer use pads as parameters
the external padding can be operators or other class padding properties shape_infer should not modify operators padding when
called from plugin
* Apply code formatting
* Fix padding validation and update
* Use shape inference with padding instead fallback
for DeformableConvolution from opset1
* Update convertPadding function to be template
* * update kernel_ids using hash value
* Change set to unordered_map for kernels_code
* replace unique_id to hash value
* Remove hash_val params
* remove redundant codes (#16262)
** Remove unique_id in program_node
** Remove gen_kernel_id
** Remove set_kernels_source
** Remove remove_kernels
** Remove kernel_idx in kernels_cache
* * Use kernel_impl_params instead of kernel_id
* Divide batch when entry_point are duplicated
* rollback removing unique_id
* * Fix get_kernel failure issue (#102467)
- Modify has function of custom_gpu_primitive and generic_layer
- Add ==operation of generic_layer for _kernels map in kernels_cache
- Fix invalid kernel_impl_params related to unique_ptr life cycle issue
* Improve kernels_cache (#102467)
* Move add_kernels_source step to build_implementations
* Change replace kernels_code key to kernel_impl_params
* Return kernel vector in get_kernels
* Modify function name to get_kernels (#102467)
* Fix functions related graph serialization (#102467)
* Fix failure to run dynamic model (#102467)
* Add unit test
* Code review follow-up
- Add const to input params
- Add missing code to check kernel duplication in kernels_cache
* Add const to input params (#102467)
* [GPU] update hash and ==operator for generic_layer and custom_gpu_primitive (#102467)
* [GPU] override get_kernels_source in generic_layer and custom_gpu_primitive (#102467)
* [GPU] Fix onednn build error (#102467)
* [GPU] Fix Lin build error (#102467)
* [GPU] kernels_cache::get_kernels return vector of clone of cldnn::kernel (#102467)
* Updated serialization logics for improved kernel caches (#16262)
* primitive key kernel cache for serialization
* kernel serialization with binaries hash
* fix kernel cache init function for deserialization
* removed unnecessary codes
* [GPU] Update commnet and fix test failure (#16262)
* [GPU] Fix custom_gpu_primitive unit test failures (#16262)
* [GPU] Improved kernels cache serialization (#16262)
* removed hash in serialization logic
* update not to create a new kernels_cache for serialization
* code refactoring in serialization logic
* [GPU] Follow-up code review (#16262)
* [GPU] modify lock(#16262)
* [GPU] Fix custom_gpu_primitive unit test failure (#16262)
---------
Co-authored-by: Eddy Kim <eddy.kim@intel.com>
* [GPU] Added shape agnostic TopK kernel implementation
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update kernel to use internal buffers for shape agnostic kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add WA to compile_graph for shape agnostic arg_max_min_axis with non-const k input
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix is_dynamic pameter for FillCLKernelData with the case where the output is static shape
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix corner case where inbuf size becomes 0 when ops_size is 1
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Align plugins in caching properties
* Fixed caching mock tests
* Added new TestNoCachingProperties test
* Fixed test
* Added ov::caching_properties to API 1.0 metrics as well
* Prevent memory reset at runtime allocation for dynamic shape
* Set default alloc to reset mem
* Additional fixes :
- If there is any convolution/deconvolution users which requires padded input, enqueue reset buffer when reuse buffer.
- Removed cl finish from gpu_buffer::fill. (Hopefully it should be waited only when needed. Otherwise sync is to be done by event)
- Removed buffer reset from on_execute of nonzero count, which is not needed any more.
* Remove unused API
* Fix tensor offset to project the padding
* Added unittest
* Applied review comment
* Add GatherV7 and gatherV8 for convert_gather_0d pattern
* Add updating output_shape using reorder/reshape for scalar indice instead of using ConvertGather0D pass
* Add WA for NMS-gather8 pattern