Commit Graph

1338 Commits

Author SHA1 Message Date
Roman Lyamin
23b863ffe8 [GPU] Added ScatterNDUpdate shape agnostic kernel (#15567) 2023-02-08 16:27:34 +01:00
Katarzyna Mitrus
76817f56c2 [ShapeInference] CTCGreedyDecoderSeqLen shape infer improvements (#15501)
* Shape infer improvements

* Add setter for merge repeated attribute

* Use new shape_infer in validate and infer types

* Add more type prop tests

* Add shape infer tests

* Align variable names in tests

* shape_infer call refactor

---------

Co-authored-by: Evgenya Stepyreva <evgenya.stepyreva@intel.com>
2023-02-08 12:24:48 +00:00
Pawel Raasz
c3083589bd Review reverse class for shape inference aspects (#15426)
* Check partial shape and label propagation

* Add template shape_infer implementation
2023-02-08 16:10:51 +04:00
Ilya Lavrenov
1f3e469c5e Added -Wall for Clang and GCC (#15513)
* Added -Wall for Clang and GCC

* Fixes

* Don't use /J

* Fixed warnings

* Fixed warnings

* More fixes

* Fixed for MSVC

* Fixed more warnings on Windows

* Suppressed some warnings in template plugin

* Update src/tests/functional/plugin/shared/include/behavior/plugin/caching_tests.hpp

* Added suppression for PT FE

* Suppressed warnings in TF FE

* Suppressed warnings on Core unit tests

* Suppress warnings in python

* Suppressed Windows warning for 3rd party modules

* Suppresed one more warning
2023-02-08 15:01:00 +04:00
Katarzyna Mitrus
d94dae79d8 [ShapeInference] CTCGreedyDecoder shape infer improvements (#15474)
* Add setter for ctc_merge_repeated

* shape infer improvements

* Add type prop tests

* Add cpu shape infer tests

* Tests refactor
2023-02-08 11:41:14 +04:00
Sergey Shlyapnikov
7b649c4150 [GPU] Fix reset_execution() method with wait option enabled for in_order queue type (#15562) 2023-02-08 09:18:46 +04:00
Nadezhda Ageeva
421d791f58 [HETERO] Return only own properties for compiled model (#15547) 2023-02-07 23:54:08 +04:00
Ilya Churaev
26108b1b67 Deprecate clone_model API, use model->clone() instead of (#15482)
* Deprecate clone_model API, use model->clone() instead of

* Renamed clone_nodes function

* Fixed exception
2023-02-07 23:53:17 +04:00
Nikolay Shchegolev
188dda668f [CPU] Fix sporadic SIGFAULT in GridSample. (#15009) 2023-02-07 17:57:34 +04:00
Dohyun Kim (Felix)
7659551d71 [GPU][DG2] Fix fusings_gpu/gemm_2in_scale.basic/7 (#15353)
* Onednn only supports 2D/3D gemm but openvino GPU plugin policy enforces 4D~6D. 
  This API mismatch causes problems in the post-op axis and requires massive code changes. 
  Therefore we decided to insert throw code for now and fix this issue later 
   if some models require non-(per tensor/full tensor) post-ops.
* Specifically, per-channel(=f) axis in this testcase becomes y-axis 
   because onednn gemm merges b,f axes into one batch axis.
2023-02-07 16:37:26 +09:00
Sungeun Kim
00d9ed0da4 [GPU] fix bug on resample_opt (#15434)
* fix bug: wrong feature slice num
2023-02-07 16:29:18 +09:00
Mingyu Kim
6fa31fbed2 [GPU] Show num_ccs for RANGE_FOR_STREAMS (#15525) 2023-02-07 15:22:45 +09:00
Eddy Kim
8e84531b58 [GPU] Serialization of read_value and assign (#15007)
* serialization of read_value and assign primitives

* lines should be <= 160 characters long

* added unit tests for read_value and assign

* updated to store is_output_evnet in primitive_inst

* removing _is_output_event in typed_primitive_impl_ocl

* added comments for mem_allocated and is_output_null
2023-02-06 11:10:59 -08:00
Roman Lyamin
014a35c3ce [GPU] Added strided_slice shape agnostic kernel (#15477) 2023-02-06 13:03:00 +04:00
Sergey Shlyapnikov
e003bf3af7 [GPU] Shape agnostic FC opt tiled kernel (#15396) 2023-02-06 12:17:55 +04:00
Sergey Shlyapnikov
cd48d76009 [GPU] Limit legacy fusions usage in Convolution kernels (#15465) 2023-02-06 12:08:22 +04:00
Kelvin Choi
8ed71a22fa [GPU] Update ScatterNDUpdate Op to use ngraph shape infer (#15176) 2023-02-05 21:31:33 -08:00
Wang, Yang
3bfd07d535 [AUTO plugin] MULTI_DEVICE_PRIORITIES doesn't return device list by priority order from high to low (#14754)
* 1. Correct the device list by priority order from high to low.
2. Remove GNA, CUDA, HPU, HDDL, NVIDIA from device list supported by AUTO/MULTI.

Signed-off-by: Wang, Yang <yang4.wang@intel.com>

* Filter out supported device when not specify the candidate device for AUTO plugin.

* Add Debug MSG

* Update.

* Update AUTO mock test cases.

* Update.

* Update.

* Update code style.

---------

Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
2023-02-06 10:09:54 +08:00
Ilya Churaev
21fc5fbd75 Enable clone test after PR 15220 (#15481)
* Enable deep copy template test

* Disable unrol if transformation
2023-02-03 17:41:25 +01:00
Ilya Lavrenov
de1631d67d Generalized OpenCL handling (#15253)
* Squashed commit of the following:

commit 62c992f6a0bc3a2f559faac6912be9c5632a359f
Author: Ilya Lavrenov <ilya.lavrenov@intel.com>
Date:   Sun Jan 22 11:38:18 2023 +0400

    Generalized OpenCL handling

* Updates

* Fixes

* Update thirdparty/CMakeLists.txt

test

* Fixed build with CL/cl2.hpp

* Fixes

* Fixes

* Fixed compilation flags

* Fixed build with target OpenCL 120

* Don't use cache
2023-02-03 15:36:47 +00:00
Pavel Esir
4103a931c2 [FP16] call marking for mixed precision inside ConvertPrecision (#14965)
* call marking for mixed precision inside ConvertPrecision

* fix typo in precisions list; moved conversion from f64->f32 to the very beginning

* remove obsolete convert_compressed_to_mixed_precision_test.cpp

* typo fix after merge

* corrected namespace prefix

* fixed align_mixed_fp32_fp16_types_test.cpp by removing redundant ConvertPrecision

* updated ConvertPrecison tests for mixed precision

* style fix
2023-02-03 13:47:57 +04:00
Ilya Churaev
1bb8223ecd Template ov plugin (#15220)
* Initial migration of TemplatePlugin to ov::IPlugin interface

* Fixed segfault

* Fixed static build and some template tests

* Fixed code style

* Fixed some template tests

* Fixed scale tests

* Disabled transformations in the template plugin

* Fixed ONNX tests

* Fixed compilation

* Fixed core tests

* Fixed some crashes

* Small fixes

* Migrate to ICompiledModel

* Fixed some behaviour tests (add legacy names and supported_properties)

* Fixed output precisions

* Fixed some tests

* Changed parameter->result test

* Fixed some preprocessing tests

* Added mean image preprocessing

* Disabled some tests

* Fixed some template tests

* Try to fix not implemented false

* Try to fix template tests

* Fixed doc

* Catch ov::NotImplemented exception

* Small changes

* Fixed build

* Try to fix build

* Fixed some comments

* Use new properties

* Fixed documentation

* Fixed properties
2023-02-03 13:37:40 +04:00
Andrew Kwangwoong Park
ab509ce164 [GPU] Added shape agnostic optimized GEMM kernel (#15317)
* [GPU] Shape agnositc optimized gemm kernel

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Fix CI failure

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Apply code review

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Fix dynamic shape accuracy drop on SQuAD v1.1

- F1: 91.81%, EM: 85.25% @bert-small-uncased-whole-word-masking-squad-0001

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Apply code review

Signed-off-by: Andrew Park <andrew.park@intel.com>

---------

Signed-off-by: Andrew Park <andrew.park@intel.com>
2023-02-03 09:26:35 +04:00
hyunback kim
9d8532e998 [GPU] Use onednn fc/gemm in dGPU. (#15143)
* [GPU] Fix the functional issue using fc:onednn in bert model.

* The issue had happened when input dims are 3 with post-po eltwise.
* oneDNN FC out supports 2-dims only, so OV need to update output and post-op too.
* Fix ACC issue in b16 onednn FC. cldnn updates yxfb format in b16 for opt kernel, but no need in onednn.
* Remove W.A code for running fc cldnn.
* Support gemm primtiive and multi types ForceImplTypes
* Change env name OV_GPU_ForceImplTypes
* Do not change elstwise post-op shape from original node: it caused the ACC issue when multiple users.

Signed-off-by: hyunback <hyunback.kim@intel.com>
2023-02-03 09:58:00 +09:00
Mingyu Kim
e9a208501b [GPU] Add layout supports for shuffle_channels (#15400)
* [GPU] Add layout supports for shuffle_channels
2023-02-03 09:52:40 +09:00
Mang Guo
9e83b081f4 Add gather lpt transformation (#14597)
* Add gather lpt transformation

* Add per-channel gather lpt dequantization support

* Fix review comments

* Add GPU test case

* Fix clang-format error gpu case  build error

* Fix comments

* Fix clang-format check fail

* Update docs

* Fix comments

* Add Gather opset1 quantization support
2023-02-02 16:13:52 +01:00
Nikolay Shchegolev
d167f4c733 [CPU] MEMC loading failed RuntimeError: There should be only one inst… (#15380)
* [CPU] MEMC loading failed RuntimeError: There should be only one instance of RegistersPool per thread.

* Fixes as per comments.
2023-02-02 16:10:21 +04:00
Min, Byungil
7bdc9ec36b [GPU] Optimize eltwise kernel for onednn formats (#15087)
+ Bugfix of eltwise_b_fs_yx_fsv16 kernel for int satuation
+ Add optimizing for fsv32, fsv16 using vload
+ Add optimizing for double blocked format eltwise
+ Support mixed format and broadcasting
+ Add test-cases to eltwise_gpu_test

Signed-off-by: Min, Byungil <byungil.min@intel.com>
2023-02-02 18:03:07 +09:00
Roman Lyamin
36df508baf [GPU] Added shape agnostic ref kernels for Select and Activation (#15016)
* [GPU] Added Select shape agnostic support

* [GPU] Added Activation shape agnostic support
2023-02-02 10:08:36 +04:00
Taylor Yeonbok Lee
864b5075b7 Allocate output to host if it is to be used by other node's shape infer dependency, because it requires copy to host in shape inference. (#15386) 2023-02-02 06:32:49 +01:00
Taylor Yeonbok Lee
3910e0b2d0 [GPU] Use ocl for i32 dtype concat / enable_profiling for dump_profiling (#15419)
* Use ocl for i32 concat

* Enabling dgpu profiling
2023-02-01 20:08:58 +01:00
Chenhu Wang
8fc4c2c6e1 [CPU]FC shape infer with an CPU shapeInfer object (#15092)
* with an shapeInfer object

* more efficient vector creation
2023-02-01 15:35:15 +01:00
Roman Lyamin
2d9a213ed6 [GPU] Added NV12toBGR single plane surface tests (#15417) 2023-02-01 18:26:39 +04:00
guozhong wang
1550a98bd2 Remove the redundant functions in the executable_network.cpp (#14909)
Co-authored-by: yanlan song <bell.song@intel.com>
2023-02-01 21:07:55 +08:00
Marcin Kusmierski
691ccad997 [GNA] Apply Style formatter (#15374)
* [GNA] Enable clang-format

* [GNA] apply fix for formatting

* [GNA] fixes to make code compilable after reformatting
2023-02-01 12:40:02 +01:00
Kelvin Choi
5bcfdf15df [GPU] Fix reduce fs_b_yx_fsv16 bug for MIN and MAX mode (#15060) 2023-01-31 16:02:42 -08:00
Katarzyna Mitrus
407590cfc2 [ShapeInference] GatherTree shape infer improvements (#15399)
* Shape infer improvments

* Add type_prop label and interval dims tests

* Update shape_infer tests

* Use new shape_infer

* Revert headers changes

* Rename test file
2023-01-31 14:04:19 +01:00
Katarzyna Mitrus
f342e5d208 [ShapeInference] Improve GatherND shape inference (#15378)
* Add shape_infer function for GatherND

* GatherND shape infer improvements

* Align test to trigger correct error message

* Add new and improve GatherND type_prop tests

* Update tests to use ov namespace

* Add GatherND common shape_infer tests

* Init shape infer tests for not common cases

* Tests refactor

* Add default ctor tests

* Add more test cases

* Register shape_infer for GatherND V5 and V8

* Enable more tests and print params

* Move GatherNDTestParams
2023-01-31 14:12:12 +04:00
Pawel Raasz
4ce3e9a88d Review CTCLoss class for shape inference aspects (#15375)
* Review ctc loss operator for
- partial shape and label propagation
- template implementation of shape_infer
- update/extend tests

* Use namespace ov in ctc loss operator
2023-01-31 14:10:30 +04:00
Pawel Raasz
3a8646215f Review roll class for shape inference aspects (#15295)
* Review Roll label and interval shape propagation

* Review Roll shape_infer template implementation

* Fix compilation issues
2023-01-31 14:05:23 +04:00
Jade Cho
06063201d5 [GPU] Optimize permute for acdb format (#15139)
* [GPU] Optimize permute for acdb format

Target subgraphs to be optimized-out
- input(bfyx) - permute(byxf) - conv
- conv(byxf) - permute(bfyx) - output
+ Fix test_device_mem_usage_estimation unit test failed.
2023-01-31 17:32:57 +09:00
OlehKravchyshyn
4700207af0 [GPU] Feature/intepolate 3 axes onnx 5d (#13796)
added 3-axis interpolation for linear-onnx mode
fixed resample_opt for onnx mode, it didn't work in case of padding
added tests for the new implementation and fix

@OlehKravchyshyn
2023-01-30 22:45:53 -08:00
Vladislav Golubev
d1397b7b48 [LPT] Rank limitations removed (#14785)
* [LPT] LayerTransformation: removed legacy rank checks

* [LPT] Added test cases with 1D and 6D ranks & existing tests corrected
2023-01-31 00:26:59 +00:00
Paul Youngsoo Ahn
0b5603fa98 [GPU] improve primitive impl caching mechanism with new unified key (#14797)
* [GPU] improved impl cache key (#14797)
- Add hash function for primitive and program_node
- Filter task before entering async compilation queue

* [GPU] improved impl cache key (#14797)
- Multiply magic prime number at input value of hash_combine to avoid hash collision

* [GPU] Update codes to follow up review comments (#14797)
- Change func name from pop_front_task to erase_front_task
- Change func name from get_layout_key to get_impl_key
- Remove average_unpooling.hpp because it was alread removed
- Replace std::list to std::deque in compilation_context
- Modify layout::hash() to get hash of shape from partial shape
- Remove calculation code to get hash from static layout in program_node => layout hash is calculated outside of program_node

* [GPU] Update gpu functional test for improved impl key (#14797)

* [GPU] update compilation queue (#14797)

* [GPU] Move type_string hash to primitive (#14797)
- Add hash for num_outputs in program_node

* [GPU] update hash functions for program_node (#14797)
- add hash for number of inputs in program_node
- program node::hash() had separated into void program node::caclulate_hash() and size_t program_node::get_hash()

* [GPU] Fix gpu unit test failures (#14797)
- move the location to calculate all nodes from compile_graph to program ctor

* [GPU] Fix build issue after rebase (#14797)

* [GPU] Update impl if optimized kernel is in impl_cache even if the shape does not change. (#14797)
- Apply improved hash key to mem kernels cache in update_weight
- Add missing hash value for broadcast
- Add simple unit test to check hash value for program_node, primitive and program_inst
2023-01-30 14:35:58 -08:00
Eddy Kim
c2518f1e4a [GPU] Serialization logic updates for OneDNN 3.0 (#15182)
* [GPU] The draft for integration oneDNN3.0

Initial PR.

1. Support oneDNN3.0 API
2. Use binary_mul post_opt instead of oscale channel-wise mask(2)
3. Disable some post-opt fusing because of no eltwise scale API
    eltw(non_linear)+eltw(linear), eltw+sum+eltw(linear)

Signed-off-by: hyunback <hyunback.kim@intel.com>

* Fix hardwish issue in 3.0

hard coded hardswish parameter(2.7) is changed alpha and beta from user's required input.

Signed-off-by: hyunback <hyunback.kim@intel.com>

* clean up code

Signed-off-by: hyunback <hyunback.kim@intel.com>

* Apply code review comment and fix ci issue

Signed-off-by: hyunback <hyunback.kim@intel.com>

* Remove setting dst scale

- ACC issue
- No perf gain compared binary_mul

Signed-off-by: hyunback <hyunback.kim@intel.com>

* gpu serialization for onednn 3.0

* missed changes

* add onednn engine creator when loading model from cache

* fixed to use mem_dep index

* updated to save zero_point_mask for serialization

* fixed onednn fc serialization logic

* updated the logic to check if onednn is enabled

---------

Signed-off-by: hyunback <hyunback.kim@intel.com>
Co-authored-by: hyunback <hyunback.kim@intel.com>
2023-01-30 09:41:25 -08:00
Nadezhda Ageeva
82a07845a5 [GPU]: Update device architecture to support other vendors (#15232) 2023-01-30 14:12:09 +04:00
Eddy Kim
3ace063040 [GPU] Adding copy functions for image2d memory (#15330)
* implemented copy functions for image2d

* updated to dump data using copy_to without lock

* calculating _bytes_count for image2d
2023-01-27 16:47:47 -08:00
Roman Lyamin
4089ee0899 [GPU] Added WA for currently unsupported scale_shift_opt agnostic kernel (#15341) 2023-01-27 16:47:35 +04:00
Sungeun Kim
9e54ac6518 [GPU] Add black list to avoid fusing crop before experimental_detectron_roi_feature_extractor (#15036)
* Add black list to avoid fusing crop before experimental_detectron_roi_feature_extractor
* propagate crop when crop is cascaded
2023-01-27 18:52:37 +09:00
Taylor Yeonbok Lee
cd9e772802 [GPU] Optimize realloc for dynamic shape (#15169)
* Optimize realloc for dynamic shape with
- Pre-aligned alloc for bounded dynamic shape
- Reuse internal buffer

* - Fix internal buffer of NMS kernel to be reused
- Fixed bug in nms quick sort

* Additional fix for internal buffer reuse

* Fix legacy dynamic batch to be applied only for 0-th dim dynamic shape with upper bound

* Fix unittest error

* Apply nms fixes of padding -1 to all buffers only when internal buffer is reused

* Not to have separate get_max_tensor, becuase currently there is no needs for that separate API.
Currently max tensor is only needed for memory allocation, and there is no need for minimum tensor size for now

* Fix allocation of internal buffer to be done for each layout
2023-01-27 00:40:31 -08:00