Commit Graph

978 Commits

Author SHA1 Message Date
hyunback kim
2582f04c9c [GPU] Optimize stable diffusion perf igpu (#18200)
* [GPU] Optimize stable_diffusion performance in iGPU.

Change the existing heuristic shape condition to permute and no transpose gemm in case of transpose gemm.

Signed-off-by: hyunback <hyunback.kim@intel.com>
2023-06-28 13:57:10 +02:00
Wilson Seok
1efb9eafae [GPU] Add condition check for dynamic shape and onednn_impl in concat_in_place_optimization::match() (#18034)
* add dynamic shape support for dgpu in prepare_buffer_fusing

* add unit test

* add space between test cases

* update condition of impl create() for concat dynamic shape

* update unit test

* add comment and update unit test

* add impl_param.is_type() function
2023-06-27 23:39:00 -07:00
Paul Youngsoo Ahn
50897e86e6 [GPU] Impl cldnn::condition to support dynamic shape (#18051)
* [GPU] Impl cldnn::condition to support dynamic shape (#18051)
* Impl CreateIfOp
* Update calc_output_layouts and execute_impl
* Enable gpu unit test
* Create gpu functional test

* [GPU] Follow-up code review (#18051)
* remove redundant codes
* create custom execute method for condition_inst
* change name from update_loop_primitive_map to update_inner_program_io_map

* [GPU] Fix gpu func test failures for fp16

* Add more test-cases to support fp16 and nested if case

* [GPU] remove redundant codes
* refactoring var names
* fix windows build error

* [GPU] Fix windows build issue

* [GPU] update calc_output_layouts

* [GPU] remove custom condition_inst::execute

* Remove virtual keyword from primitive_inst::execute()

* [GPU] Share single task executor between main program and inner program

* [GPU] Fix input rank issue for const inner network in condition op

* [GPU] apply calc_output_layouts for roi_align

Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>

* [GPU] avoid checking allow_new_shape_infer for inner program

---------

Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
2023-06-27 17:05:26 +02:00
Andrew Kwangwoong Park
1566567ca4 [GPU] Fix output layout calculation for crop and fc (#18207)
* Fix get_partial_shape tensor API to access the correct index of dimensions

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Update the rule specifying output_type to the legacy one by referring to calc_output_layout

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Add reproducible TCs related to issues for ov_gpu_unit_tests

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Fix failed fc dynamic i8 TCs for ov_gpu_unit_tests

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Fix are_data_types_sutable_for_onednn not to invalidate output layout

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Apply comment

Signed-off-by: Andrew Park <andrew.park@intel.com>

---------

Signed-off-by: Andrew Park <andrew.park@intel.com>
2023-06-27 11:30:30 +02:00
Mingyu Kim
61b15ce31a Revert "[GPU] Reorder weights refactoring (#17787)" (#18248)
This reverts commit d00c7d30f9.
2023-06-27 17:26:18 +09:00
Taylor Yeonbok Lee
bcf58344cc Fix crash for shape of subgraph due to missing mem_dep (#18246) 2023-06-26 16:48:10 -07:00
Wilson Seok
f306a11b82 [GPU] fix issues of MobileFaceNet for dynamic shape (#18171)
* fix issues of MobileFaceNet for dynamic shape

* update unit test
2023-06-26 17:22:15 +09:00
Taylor Yeonbok Lee
bf299c807e [GPU] Not to add sync if the node belongs to shape of subgraph (#18158)
* Not to add sync if the node is within shape of subgraph
Because the dependency is cpu impl so the execution is already finished.

* Fixed as review comment : Skip clFinish only when the runtime dep is shape of subgraph, not the current node
2023-06-25 21:51:45 -07:00
Taylor Yeonbok Lee
22ef2f4e6a Fix bug in weight reorder. (#18224)
The original memory was overwritten unexpectedly because it was chekcing shared_ptr instead of actual buffer address
2023-06-24 00:35:07 -07:00
Irina Efode
31b07c40d9 Add global config for test infra (#17547)
* [IE TESTS] Add Global test config for Subgraph base test

* Replace using option by function redefinition

* fix build

* remove extra changes for gna/template

* code style

* add nvidia to devices

* Fix debian

* remove nvidia
2023-06-24 01:07:36 +04:00
Roman Lyamin
d00c7d30f9 [GPU] Reorder weights refactoring (#17787) 2023-06-23 16:01:55 +04:00
Roman Lyamin
cca8cf15ef [GPU] softmax_kernel_items_class_optimized fix (#18178) 2023-06-23 16:00:11 +04:00
hyunback kim
3c378eb7ac [GPU] Fix onednn implicit concat issue with reorder as input. (#18180)
* [GPU] Fix onednn implicit concat issue with reorder as input.

Fix for missed memory offset handling in onednn reorder.

Signed-off-by: hyunback <hyunback.kim@intel.com>
2023-06-23 10:46:50 +00:00
Steve Yoo
d13adf7ae8 Allow new shape infer of ShapeOf (#17912)
* Fixed to use input shape rank when calculating output layout, added unit test case

* Fixed to use input shape rank when creating shape_of primitive, added functional tests
2023-06-22 21:04:41 -07:00
Mingyu Kim
c40efac569 [GPU] Typo (#18167) 2023-06-22 17:49:34 +09:00
Pavel Durandin
a104e6218a [GPU] Fix windows debug fail in contexts (#18168) 2023-06-22 12:39:01 +04:00
Andrew Kwangwoong Park
52b9df4a6d [GPU] Dynamism support for ReadValue and Assign ops (#18086)
Signed-off-by: Andrew Park <andrew.park@intel.com>
2023-06-21 13:07:55 +04:00
Min, Byungil
96a0c539bd [GPU] Not to convert crop to implicit on dynamic (#18148)
Signed-off-by: Min, Byungil <byungil.min@intel.com>
2023-06-21 09:55:55 +02:00
hyunback kim
bcd2463813 [GPU] Fix skipped GemmBaseTests in iGPU. (#18001)
* [GPU] Fix skipped GemmBaseTests in iGPU.

Current GemmBaseTests in iGPU are skipped, just showed pass, but actual not run.

Signed-off-by: hyunback <hyunback.kim@intel.com>
2023-06-21 16:09:06 +09:00
yanlan song
05e8bd375e Bell/auto api 2.0 (#17805)
* 2.0 innitial

Signed-off-by: fishbell <bell.song@intel.com>

* enable all tests

Signed-off-by: fishbell <bell.song@intel.com>

* remove unecessary files

Signed-off-by: fishbell <bell.song@intel.com>

* move container header to auto foler, remove uncessary macro define

Signed-off-by: fishbell <bell.song@intel.com>

* enable caching

Signed-off-by: fishbell <bell.song@intel.com>

* enable query_model

Signed-off-by: fishbell <bell.song@intel.com>

* support loaded_from_cache property

Signed-off-by: fishbell <bell.song@intel.com>

* fix some build warning

Signed-off-by: fishbell <bell.song@intel.com>

fake inputs/outputs if needed

Signed-off-by: fishbell <bell.song@intel.com>

* resolve conflict

Signed-off-by: fishbell <bell.song@intel.com>

* skip unsupported test

Signed-off-by: fishbell <bell.song@intel.com>

* use mock icore from common foler

Signed-off-by: fishbell <bell.song@intel.com>

* fix failure for remote tensors

Signed-off-by: fishbell <bell.song@intel.com>

* apply ppp related fix in auto

Signed-off-by: fishbell <bell.song@intel.com>

* fix build warning on windows

Signed-off-by: fishbell <bell.song@intel.com>

* fix ppp output layout issue

Signed-off-by: fishbell <bell.song@intel.com>

* fix ppp output layout issue

Signed-off-by: fishbell <bell.song@intel.com>

* clean up headers

Signed-off-by: fishbell <bell.song@intel.com>

* log formatting

Signed-off-by: fishbell <bell.song@intel.com>

* enable fps logging for binder mode

Signed-off-by: fishbell <bell.song@intel.com>

* apply review comments

apply review comments

Signed-off-by: fishbell <bell.song@intel.com>

* remove all legacy namings, exenetwork/network/metric/IE etc

Signed-off-by: fishbell <bell.song@intel.com>

* update readme

Signed-off-by: fishbell <bell.song@intel.com>

* fix build lto issue

Signed-off-by: fishbell <bell.song@intel.com>

* minor wording

Signed-off-by: fishbell <bell.song@intel.com>

* case fix

Signed-off-by: fishbell <bell.song@intel.com>

---------

Signed-off-by: fishbell <bell.song@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
2023-06-21 00:10:59 +08:00
Wilson Seok
3519050ef0 skip all user format check when dynamic shape in get_preferred_format() to avoid endless recursive call (#18096) 2023-06-19 18:52:58 -07:00
Patman11
b9575d9586 [GPU] Disable threaded kernel compilation when running in Windows Store app (#18062) 2023-06-19 17:55:47 +04:00
Min, Byungil
9943ffc259 [GPU] Fix unit-tests for dGPU (#18125)
+ Resolved unit-tests failure on dGPU
+ Applied get_test_default_config for testing config

Signed-off-by: Min, Byungil <byungil.min@intel.com>
2023-06-19 11:41:47 +04:00
Min, Byungil
555c083336 [GPU] Optimize out Gather by converting to implicit crop (#17743)
+ Changed Gather if it divides input tensor along batch axis
+ Converted Gather to cldnn Crop in CreateGatherOpBase
+ Added implicit Crop condition for batch axis

Signed-off-by: Min, Byungil <byungil.min@intel.com>
2023-06-19 05:05:22 +00:00
Vladimir Paramuzov
3d79bd1ac5 [GPU] Minor layout optimizer refactoring (#17553) 2023-06-16 10:33:53 +04:00
Pavel Esir
aa32ff1df3 keep Const + DecompressionConvert for CPU (#15930)
* keep Const+DecompressionConvert pattern for CPU

* temporary disabled failing unit-tests

* disable CF by modifying bounds evaluate as well; minor corrections

* added TODOs with ticket numbers

* join const+decompression markings

* minimized convert_precision.cpp changes

* minor corrections

* refactor fp16 transformations: moved into separate fp16_compression folder

* style-fix

* minor fixes

* do not disable evaluate and CF in shape path

* safer disabling of Const conversion

* style-fix and minor corrections

* restore original placement of ConvertPrecision
2023-06-15 13:07:22 +04:00
Andrei Gorbachev
52834659c4 [GPU] additional checks fixed for fully_connected (#18068) 2023-06-15 09:11:38 +04:00
Mykhailo Hnap
bae926de22 [GPU] Unique-10 operation implementation. (#16412)
* [GPU] Unique-10 operation implementation.

* Handled flattened case.

* Created results for all outputs in single layer test.

* Save total unique count as fifth output.

* Handled axis case.

* Added unique reshape kernel.

* Moved data types to unique primitive constructor.

* Added shape agnostic Unique ref kernel.

* Added blocked layout support to Unique-10.

* Use int in bubble sort.

* Added unit tests.

* Added support for blocked layouts to flattened mode.

* Fixed usage of shape_info in kernel.

* Use correct total data size for dynamic shapes.

* Commented some functional tests.

For some reasons big shapes cause std::bad_alloc.

* Initialize out_counts with zeros.

* Implemented new approach for reducing memory footprint.

Changed first kernel to only count unique values and changed second kernel to fill all outputs.

* Revert "Commented some functional tests."

This reverts commit a7f9763c575e71e14b85ee37adf1e98f10785c15.

* Fixed calc output layouts for flattened case when rank in greater than 4.

* Added temporary fix for axis case when rank is greater than 4.

* Revert "Added temporary fix for axis case when rank is greater than 4."

This reverts commit 236640d2f0e9d5b1f8dcbbf9482763badd7fde66.

* Renamed "unique" to "unique_count" and "unique_reshape" to "unique_gather" primitives.

* Quick fix for add_intermediate_node to consider dep_idx of multiple output

* Fix bug for multiple output:
1) get_reorder was getting reorder from cache regardless of the dep_idx.
2) remove_redundant_reorder was not considering original dep_idx

* Fixed conflicts.

* Fixed win build issue.

* Fixed build issue.

* Revert "Fix bug for multiple output:"

This reverts commit d4a2c4f32eabe9108df31d4837fed8995c93bd1c.

* Revert "Quick fix for add_intermediate_node to consider dep_idx of multiple output"

This reverts commit 2dfd2aaefdf32067a7469505b35f7096632ac5f2.

* Added some tests to skip config.

---------

Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
2023-06-14 10:41:51 -07:00
Andrei Gorbachev
1761427ab1 fixed fp16 x fp16 overflow in NonMaxSuppression (#18038) 2023-06-14 15:58:49 +04:00
Roman Lyamin
63a5ec5762 [GPU] Several fixes for format traits (#18018) 2023-06-14 14:33:58 +04:00
Sergey Shlyapnikov
e631f65a9b [GPU] Fix in-order queue synchronization issue related to OCL/OneDNN impls interaction with CPU impls (#17976) 2023-06-14 10:15:04 +09:00
Ilya Churaev
0743e9bfb5 Removed legacy methods SetBatch and SetBlob (#17984)
* Removed legacy methods SetBatch and SetBlob

* Fixed GPU plugin build

* Remove DYN_BATCH_LIMIT from tests

* Revert some changes in GPU plugin
2023-06-12 18:54:23 +00:00
Ilya Churaev
df44f92a97 Remove NV12 and I420 blobs and deprecate some legacy API (#17919)
* Remove NV12 and I420 blobs and deprecate some legacy API

* Fixed some errors

* Remove NV12 blobs

* Remote NV12 conversion

* Fixed other warnings

* Suppress version

* Fix some warnings

* Fixed version

* Try to fix some warnings

* Suppress warnings in C header

* Suppress warnings in C

* Fixed Windows exceptions

* Try to fix warnings

* Try to fix C bindings build

* Suppress InferRequest

* Fixed some build issues

* Fixed some errors
2023-06-12 21:15:02 +04:00
Sergey Shlyapnikov
70e0caca4f [GPU] Fix dynamic padding processing of static dimension (#17978) 2023-06-12 08:39:42 +04:00
Wilson Seok
cff083f83d [GPU] gather nd shape agnostic kernel implementation (#17940)
* gather nd shape agnostic kernel implementation

* add func test

* fix minor bugs

* minor bug fixes

* fix win build error
2023-06-10 00:28:00 -07:00
Andrew Kwangwoong Park
c413825845 [GPU] Fuse type conversion only reorders to the prev nodes (#17881)
* Fuse convert reorder to prev MVN/Concat node

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Add dynamic TCs for ov_gpu_unit_test

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Add descriptions for changes

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Fix kernel selection failure

Signed-off-by: Andrew Park <andrew.park@intel.com>

* Add is_type_conversion_only function for reorder_node

Signed-off-by: Andrew Park <andrew.park@intel.com>

---------

Signed-off-by: Andrew Park <andrew.park@intel.com>
2023-06-09 16:07:01 -07:00
Ilya Lavrenov
a0119fe33c Android debug build (#17955) 2023-06-09 08:03:10 +04:00
Sergey Shlyapnikov
58d79aa3a6 [GPU] Add shape_of subgraphs markup and initial cpu implementations (#17762)
* [GPU] Add shape of subgraphs markup and initial cpu implementations for some of primitives

* Apply review comments

* Exclude eltwise with boolean mode types from shape of subgraphs and fix leftovers
2023-06-08 13:46:21 +04:00
Taylor Yeonbok Lee
f246015dd7 [GPU] Fix issue in runtime buffer fusing (#17909)
* There were two issues in runtime buffer fusing
1) Missing condition in matcher for dyanmic tensor
2) If the node is marked as can_be_optimized = true at build time and then turned out to false at runtime, the kernel compilation has been skipped becuaes it was checking node->can_be_optimized
=> To resolve this issue, added can_be_optimzied to impl_param and let the impl create check can_be_optimized in impl_param instead of that in node.

* Fixed primtiive::can_be_optimize to be set through function
2023-06-07 19:39:26 -07:00
hyunback kim
13028397b7 Optimize permute gemm onednn (#17621)
* [GPU] Optimized out permute in permute-gemm(onednn) pattern.

Permute can be optimized out when permute's in and out are compatible and onednn gemm.

Signed-off-by: hyunback <hyunback.kim@intel.com>
2023-06-07 16:20:59 +09:00
Ilya Churaev
36625404eb [GPU] Fix GPU remote context name initialization (#17850) 2023-06-05 12:00:04 +04:00
Sergey Shlyapnikov
db8d23231a [GPU] Change priority of CPU implementations (#17829) 2023-06-05 11:21:26 +04:00
Vladimir Paramuzov
1ce447674e [GPU] Better device input memory reuse (#17853) 2023-06-05 09:30:22 +04:00
Kelvin Choi
ec0daa5b10 [GPU] Apply m_pythondiv for fusing of eltwise div (#17590) 2023-06-02 17:29:02 -07:00
Yaroslav Torziuk
eb588f0336 Add subgroup block reading in softmax_gpu_items_class_optimized.cl (#16223) 2023-06-02 12:59:55 -07:00
Taylor Yeonbok Lee
f670dc5a0d [GPU] Enable runtime buffer fusing for dynamic shape (#17668)
* Initial impl for runtime buffer fusing
Passing unittest with static kernel

* pass unittest with dynamic impl

* Refactor allocate_output

* Separate header of buffer fusing

* Refactored buffer fusing :: matcher/optimize

* More cleanup

* Fix crash in dolly

* Reset can_be_optimized of primitive_inst when it is not

* Fix empty tensor : Primitive with empty data should be skipped

* Fix issue in dynamic padding : Static kernel should not contain dynamic padding dims
Fix missing reset of update_shape_done_by_other flag

* Not to add cache with emtpy kernel for optimized out inst

* Fix corner case error in buffer fusing
- Shapes of some preds may not be changed, but still needed to do update_impl because 1) paddings are changed 2) output memory should be updated
- optimizable impl should not be added to the cache

* Allowing reorder & permute_ref to be optimized concat predecessor

* Some more fixes :
runtime buffer fusing is available only when all preds/concat are dynamic
runtime buffer fusing is to be executed only if the node is dynamic

* Fix allocate_output parameter called by get_estimated_device_mem_usage according to the new change

* Fixed error in cascaded concatt

* Need to reinterprete even though the size is same
2023-06-02 12:39:28 -07:00
Sergey Shlyapnikov
5afbd4cf92 [GPU] Remove clFinish call from USM memory lock function (#17830) 2023-06-02 16:17:05 +04:00
Andrei Gorbachev
97113b317f [GPU] fix incorrect deformable_group_idx calculation (#17759) 2023-06-01 10:51:48 +04:00
Vladimir Paramuzov
ac26216869 [GPU] Functional fixes for nvidia (#17735) 2023-06-01 09:45:30 +04:00
Maciej Smyk
dc36ec11b5 [DOCS] Link adjustment for dev docs + fix to build.md CPU link for master (#17744)
* link-update-1

* link update

* Update build.md

* dl workbench

* Update README.md
2023-05-31 13:27:20 +04:00