* C++ exception with description write lock_type thrown in the test body.
Use get_output_values_to_float()
* fusings_gpu/gemm_2in_act_scale_quantize_eltwise_i8.basic/2
* fusings_gpu/gemm_2in_act_scale_eltwise.basic/2
* Remove WA test code of [GPU][DG2] Fix fusings_gpu/gemm_2in_scale.basic/7 #15353
* Now non full-tensor post-ops are broadcasted
* Fix remote blob creation to use original shape
* Revert "Fix remote blob creation to use original shape"
This reverts commit 35c674aa97.
* Fix cldnn tensor adjusted blob to be reinterpreted with actual input layout
* gpu model caching unit tests
* added serialization unit tests
* added save and load for quantize primitive_inst
* reduced the range of inputs for Gemm tests
* updated the copyright year
* [GPU] Fix a bug of permute optimization
For int8 models, if there is FakeQuantize between permute and convolution, an operation like data type casting could be fused to permute. In this case, do not optimize permute.
* [GPU] Added shape agnostic optimized MVN kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply code review
Signed-off-by: Andrew Park <andrew.park@intel.com>
* replace built-in log2 with function macro to calculate power from integer
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Move compile-time JIT constants to cl code
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* benchmark_app: factor out advanced options
* cpp/benchmark_app: add usage word
* move api to advanced, group b/shape/data_shape/layout into Input shapes, factor out Statistics dumping options
* Factor out Device-specific performance options
* Factor out Preprocessing options
* Minor regroup
* serialization of proposal
* serialization of anchors in proposal
* added unit tests for gpu proposal
* updated the proposal primitive to be partially serialized
* serialization of primitive class
* removed unnecessary codes
* removed white spaces
* serialization of loop primitive
* serialization of nms
* fixed implicit concat logic in serialization
* added RUN_ALL_MODEL_CACHING_TESTS directive
* fixed an error related with specialization of boolean vector
* Added -Wall for Clang and GCC
* Fixes
* Don't use /J
* Fixed warnings
* Fixed warnings
* More fixes
* Fixed for MSVC
* Fixed more warnings on Windows
* Suppressed some warnings in template plugin
* Update src/tests/functional/plugin/shared/include/behavior/plugin/caching_tests.hpp
* Added suppression for PT FE
* Suppressed warnings in TF FE
* Suppressed warnings on Core unit tests
* Suppress warnings in python
* Suppressed Windows warning for 3rd party modules
* Suppresed one more warning
* Onednn only supports 2D/3D gemm but openvino GPU plugin policy enforces 4D~6D.
This API mismatch causes problems in the post-op axis and requires massive code changes.
Therefore we decided to insert throw code for now and fix this issue later
if some models require non-(per tensor/full tensor) post-ops.
* Specifically, per-channel(=f) axis in this testcase becomes y-axis
because onednn gemm merges b,f axes into one batch axis.
* serialization of read_value and assign primitives
* lines should be <= 160 characters long
* added unit tests for read_value and assign
* updated to store is_output_evnet in primitive_inst
* removing _is_output_event in typed_primitive_impl_ocl
* added comments for mem_allocated and is_output_null
* [GPU] Shape agnositc optimized gemm kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix CI failure
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply code review
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix dynamic shape accuracy drop on SQuAD v1.1
- F1: 91.81%, EM: 85.25% @bert-small-uncased-whole-word-masking-squad-0001
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply code review
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [GPU] Fix the functional issue using fc:onednn in bert model.
* The issue had happened when input dims are 3 with post-po eltwise.
* oneDNN FC out supports 2-dims only, so OV need to update output and post-op too.
* Fix ACC issue in b16 onednn FC. cldnn updates yxfb format in b16 for opt kernel, but no need in onednn.
* Remove W.A code for running fc cldnn.
* Support gemm primtiive and multi types ForceImplTypes
* Change env name OV_GPU_ForceImplTypes
* Do not change elstwise post-op shape from original node: it caused the ACC issue when multiple users.
Signed-off-by: hyunback <hyunback.kim@intel.com>