* * update kernel_ids using hash value
* Change set to unordered_map for kernels_code
* replace unique_id to hash value
* Remove hash_val params
* remove redundant codes (#16262)
** Remove unique_id in program_node
** Remove gen_kernel_id
** Remove set_kernels_source
** Remove remove_kernels
** Remove kernel_idx in kernels_cache
* * Use kernel_impl_params instead of kernel_id
* Divide batch when entry_point are duplicated
* rollback removing unique_id
* * Fix get_kernel failure issue (#102467)
- Modify has function of custom_gpu_primitive and generic_layer
- Add ==operation of generic_layer for _kernels map in kernels_cache
- Fix invalid kernel_impl_params related to unique_ptr life cycle issue
* Improve kernels_cache (#102467)
* Move add_kernels_source step to build_implementations
* Change replace kernels_code key to kernel_impl_params
* Return kernel vector in get_kernels
* Modify function name to get_kernels (#102467)
* Fix functions related graph serialization (#102467)
* Fix failure to run dynamic model (#102467)
* Add unit test
* Code review follow-up
- Add const to input params
- Add missing code to check kernel duplication in kernels_cache
* Add const to input params (#102467)
* [GPU] update hash and ==operator for generic_layer and custom_gpu_primitive (#102467)
* [GPU] override get_kernels_source in generic_layer and custom_gpu_primitive (#102467)
* [GPU] Fix onednn build error (#102467)
* [GPU] Fix Lin build error (#102467)
* [GPU] kernels_cache::get_kernels return vector of clone of cldnn::kernel (#102467)
* Updated serialization logics for improved kernel caches (#16262)
* primitive key kernel cache for serialization
* kernel serialization with binaries hash
* fix kernel cache init function for deserialization
* removed unnecessary codes
* [GPU] Update commnet and fix test failure (#16262)
* [GPU] Fix custom_gpu_primitive unit test failures (#16262)
* [GPU] Improved kernels cache serialization (#16262)
* removed hash in serialization logic
* update not to create a new kernels_cache for serialization
* code refactoring in serialization logic
* [GPU] Follow-up code review (#16262)
* [GPU] modify lock(#16262)
* [GPU] Fix custom_gpu_primitive unit test failure (#16262)
---------
Co-authored-by: Eddy Kim <eddy.kim@intel.com>
* Review ROIPooling class
- check interval shape and label propagation
- add template shape_infer
- add shape infer into cpu plugin
- add test with StaticShape
* Use get_output_roi instead of get_output_size
* Add missing includes
* Review PSROIPooling operator
- review interval and label propagation
- add template shape_infer implementation
- add shape_infer to cpu plugin
* Add snippets dependency
* - removed dependency back
- added an INTEL_CPU condition on snippets configuring -> no dependency when configured w/0 CPU
* Disable snippets_ngraph_functions conditionally if inference_engine_snippets are not configured
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Move all openvino_conversion rountines into utils. Avoid using Squeeze without axis
that can create dynamic output rank
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [GPU] Added shape agnostic TopK kernel implementation
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update kernel to use internal buffers for shape agnostic kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add WA to compile_graph for shape agnostic arg_max_min_axis with non-const k input
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix is_dynamic pameter for FillCLKernelData with the case where the output is static shape
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix corner case where inbuf size becomes 0 when ops_size is 1
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Implement CTPUT in AUTO code logic
* Add logic to handle device loading failure
* add some code comments
* fix warnning conversion from size_t to int
* Updated code according to comments of bell and wanglei
* the preferred device code path need to be updated with ctput also
* add fallback logic for CTPUT
* Modify the code logic according to bell suggestion
* Add prints for debugging bug
* throw exception when no device to run pipline task
* initialize idleWorkerRequest for CTPUT
* fix getting properties
Signed-off-by: fishbell <bell.song@intel.com>
refine
Signed-off-by: fishbell <bell.song@intel.com>
* fix warning
Signed-off-by: fishbell <bell.song@intel.com>
* fix illegal character on windows
Signed-off-by: fishbell <bell.song@intel.com>
* fix illegal character
Signed-off-by: fishbell <bell.song@intel.com>
add missing include
Signed-off-by: fishbell <bell.song@intel.com>
* more code refine
Signed-off-by: fishbell <bell.song@intel.com>
---------
Signed-off-by: fishbell <bell.song@intel.com>
Co-authored-by: fishbell <bell.song@intel.com>
* Properties improvements: part 2
* Accurate configs handling in HETERO / BATCH
* Align plugins in caching properties
* Fixed caching mock tests
* Added new TestNoCachingProperties test
* Fixed test
* Added ov::caching_properties to API 1.0 metrics as well
* Fixes for HETERO plugin
* Fixed tests
* Even more refactoring in HETERO plugin config management
* Align plugins in caching properties
* Fixed caching mock tests
* Added new TestNoCachingProperties test
* Fixed test
* Added ov::caching_properties to API 1.0 metrics as well
* Prevent memory reset at runtime allocation for dynamic shape
* Set default alloc to reset mem
* Additional fixes :
- If there is any convolution/deconvolution users which requires padded input, enqueue reset buffer when reuse buffer.
- Removed cl finish from gpu_buffer::fill. (Hopefully it should be waited only when needed. Otherwise sync is to be done by event)
- Removed buffer reset from on_execute of nonzero count, which is not needed any more.
* Remove unused API
* Fix tensor offset to project the padding
* Added unittest
* Applied review comment