* Added -Wall for Clang and GCC
* Fixes
* Don't use /J
* Fixed warnings
* Fixed warnings
* More fixes
* Fixed for MSVC
* Fixed more warnings on Windows
* Suppressed some warnings in template plugin
* Update src/tests/functional/plugin/shared/include/behavior/plugin/caching_tests.hpp
* Added suppression for PT FE
* Suppressed warnings in TF FE
* Suppressed warnings on Core unit tests
* Suppress warnings in python
* Suppressed Windows warning for 3rd party modules
* Suppresed one more warning
* Onednn only supports 2D/3D gemm but openvino GPU plugin policy enforces 4D~6D.
This API mismatch causes problems in the post-op axis and requires massive code changes.
Therefore we decided to insert throw code for now and fix this issue later
if some models require non-(per tensor/full tensor) post-ops.
* Specifically, per-channel(=f) axis in this testcase becomes y-axis
because onednn gemm merges b,f axes into one batch axis.
* serialization of read_value and assign primitives
* lines should be <= 160 characters long
* added unit tests for read_value and assign
* updated to store is_output_evnet in primitive_inst
* removing _is_output_event in typed_primitive_impl_ocl
* added comments for mem_allocated and is_output_null
* 1. Correct the device list by priority order from high to low.
2. Remove GNA, CUDA, HPU, HDDL, NVIDIA from device list supported by AUTO/MULTI.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Filter out supported device when not specify the candidate device for AUTO plugin.
* Add Debug MSG
* Update.
* Update AUTO mock test cases.
* Update.
* Update.
* Update code style.
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* [GPU] Shape agnositc optimized gemm kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix CI failure
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply code review
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix dynamic shape accuracy drop on SQuAD v1.1
- F1: 91.81%, EM: 85.25% @bert-small-uncased-whole-word-masking-squad-0001
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply code review
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [GPU] Fix the functional issue using fc:onednn in bert model.
* The issue had happened when input dims are 3 with post-po eltwise.
* oneDNN FC out supports 2-dims only, so OV need to update output and post-op too.
* Fix ACC issue in b16 onednn FC. cldnn updates yxfb format in b16 for opt kernel, but no need in onednn.
* Remove W.A code for running fc cldnn.
* Support gemm primtiive and multi types ForceImplTypes
* Change env name OV_GPU_ForceImplTypes
* Do not change elstwise post-op shape from original node: it caused the ACC issue when multiple users.
Signed-off-by: hyunback <hyunback.kim@intel.com>
+ Bugfix of eltwise_b_fs_yx_fsv16 kernel for int satuation
+ Add optimizing for fsv32, fsv16 using vload
+ Add optimizing for double blocked format eltwise
+ Support mixed format and broadcasting
+ Add test-cases to eltwise_gpu_test
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Add shape_infer function for GatherND
* GatherND shape infer improvements
* Align test to trigger correct error message
* Add new and improve GatherND type_prop tests
* Update tests to use ov namespace
* Add GatherND common shape_infer tests
* Init shape infer tests for not common cases
* Tests refactor
* Add default ctor tests
* Add more test cases
* Register shape_infer for GatherND V5 and V8
* Enable more tests and print params
* Move GatherNDTestParams
* Review ctc loss operator for
- partial shape and label propagation
- template implementation of shape_infer
- update/extend tests
* Use namespace ov in ctc loss operator
* [GPU] Optimize permute for acdb format
Target subgraphs to be optimized-out
- input(bfyx) - permute(byxf) - conv
- conv(byxf) - permute(bfyx) - output
+ Fix test_device_mem_usage_estimation unit test failed.
added 3-axis interpolation for linear-onnx mode
fixed resample_opt for onnx mode, it didn't work in case of padding
added tests for the new implementation and fix
@OlehKravchyshyn
* [GPU] improved impl cache key (#14797)
- Add hash function for primitive and program_node
- Filter task before entering async compilation queue
* [GPU] improved impl cache key (#14797)
- Multiply magic prime number at input value of hash_combine to avoid hash collision
* [GPU] Update codes to follow up review comments (#14797)
- Change func name from pop_front_task to erase_front_task
- Change func name from get_layout_key to get_impl_key
- Remove average_unpooling.hpp because it was alread removed
- Replace std::list to std::deque in compilation_context
- Modify layout::hash() to get hash of shape from partial shape
- Remove calculation code to get hash from static layout in program_node => layout hash is calculated outside of program_node
* [GPU] Update gpu functional test for improved impl key (#14797)
* [GPU] update compilation queue (#14797)
* [GPU] Move type_string hash to primitive (#14797)
- Add hash for num_outputs in program_node
* [GPU] update hash functions for program_node (#14797)
- add hash for number of inputs in program_node
- program node::hash() had separated into void program node::caclulate_hash() and size_t program_node::get_hash()
* [GPU] Fix gpu unit test failures (#14797)
- move the location to calculate all nodes from compile_graph to program ctor
* [GPU] Fix build issue after rebase (#14797)
* [GPU] Update impl if optimized kernel is in impl_cache even if the shape does not change. (#14797)
- Apply improved hash key to mem kernels cache in update_weight
- Add missing hash value for broadcast
- Add simple unit test to check hash value for program_node, primitive and program_inst
* [GPU] The draft for integration oneDNN3.0
Initial PR.
1. Support oneDNN3.0 API
2. Use binary_mul post_opt instead of oscale channel-wise mask(2)
3. Disable some post-opt fusing because of no eltwise scale API
eltw(non_linear)+eltw(linear), eltw+sum+eltw(linear)
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Fix hardwish issue in 3.0
hard coded hardswish parameter(2.7) is changed alpha and beta from user's required input.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* clean up code
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Apply code review comment and fix ci issue
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Remove setting dst scale
- ACC issue
- No perf gain compared binary_mul
Signed-off-by: hyunback <hyunback.kim@intel.com>
* gpu serialization for onednn 3.0
* missed changes
* add onednn engine creator when loading model from cache
* fixed to use mem_dep index
* updated to save zero_point_mask for serialization
* fixed onednn fc serialization logic
* updated the logic to check if onednn is enabled
---------
Signed-off-by: hyunback <hyunback.kim@intel.com>
Co-authored-by: hyunback <hyunback.kim@intel.com>
* Optimize realloc for dynamic shape with
- Pre-aligned alloc for bounded dynamic shape
- Reuse internal buffer
* - Fix internal buffer of NMS kernel to be reused
- Fixed bug in nms quick sort
* Additional fix for internal buffer reuse
* Fix legacy dynamic batch to be applied only for 0-th dim dynamic shape with upper bound
* Fix unittest error
* Apply nms fixes of padding -1 to all buffers only when internal buffer is reused
* Not to have separate get_max_tensor, becuase currently there is no needs for that separate API.
Currently max tensor is only needed for memory allocation, and there is no need for minimum tensor size for now
* Fix allocation of internal buffer to be done for each layout