+ Optimized out Reshape if only batch axis has padding
+ Not to add Reorder before oneDNN if only batch axis has padding
+ Re-calculate output layout's padding if Reshape is optimized opt
+ Not to apply optimization of stable diffusion iGPU perf to dGPU(#18200)
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* [GPU] Use real layout for cpu impls instead of memory's
* [GPU] Add memory tracking and pre allocation mechanism
* Tests and minor code refactoring
* Apply review comments
* Remove unused include
* Add two plane YOV to Grey conversion
* Add i420 to grey conversion
* Add yuv to grey conversion for GPU
* Fix cmakes
* Remove static from local function
* Remove opencv dependency from tests
* Put grey_from_yuv_single_plane into namespace
* [GPU] disable blocked format for dynamic shape model(#18448)
* [GPU] Return default format for output layout rank when user node is reshape in get_preferred_format
- Rollback code to disable blocked formmat for dynamic shape
* [GPU] Add unit test checking has_reshape_user
* [GPU] remove redundant comments
* Update DepthToSpace to use ngraph shape infer
* Remove legacy block_size limitation for static shape
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add TCs for ov_gpu_func_tests and ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Initial commit for proxy plugin
* Include proxy to openvino library
* Fixed some tests
* Added plugin properties
* Added readme
* Support Compile model for the proxy plugin
* Fixed python
* Remove gmock
* Fixed some tests
* Fixed load plugin tests
* Removed internal PROXY_FALLBACK_PRIORITIES initialization property
* Fixed code style
* Added new tests
* Create architecture.md
* Fixed some proxy tests
* Implement not implemented functions
* Fixed ICore mock
* Remove redundant code
* Added import/export tests
* Fixed hetero import/export tests
* Fixed ICore mock
* Fixed import tests
* Fixed build
* Remove redundant parse config
* Fixed some comments
* Try to fix Windows build
* Fixed incorrect logic
* Small fix in tests
* Fixed python API
* Fixed typo
* Try to fix python
* Switch GPU under proxy
* Fixed GPU name
* Revert GPU plugin under proxy
* Small changes in CMake files
* Temp commit
* Build without proxy
* Revert "Temp commit"
This reverts commit 1ac9824bdf.
* Fixed test linking
* Removed tests from ncc check
* Add option to disable proxy plugin
* Fixed minor comments
* Disable some proxy tests if IR frontend is disabled
* Enable Intel GPU under the proxy
* Fixed typo
* Fixed segfault in tests
* Small fix for case if only GPU is registered
* Fixed code style
* Added remote context tests
* Added proxy tests to CI
* Fixed mock engine
* Test change
* Revert "Test change"
This reverts commit 2d1d67766f.
* Add new tests
* Removed some tests
* Revert "Removed some tests"
This reverts commit 090398652f.
* Revert incorrect logic
* Removed unused variables
* Use original name for the GPU plugin
* Fixed CMake
* Do not show hidden devices
* Try to fix GPU remote context
* Fixed GPU plugin build
* Added interface for proxy remote context
* Remove local properties
* Remove redundant API
* Fixed typo
* Added remote tensors
* Fixed code style
* Fixed some comments
* Unwrap remote tensors before conversion to Blob
* Added cast for legacy API
* Fixed some cldnn tests
* Do not add preprocessing for proxy plugin
* Enabled more tests and wrap tensors in infer request
* Use hardware request inside conversion wrapper
* Fixed hand on cache calculation
* Try to fix some tests
* Support set tensor for remote tensors in proxy plugin
* Revert "Support set tensor for remote tensors in proxy plugin"
This reverts commit 5a927de590.
* Remove redundant friend from compiled model and fix life time for infer
request
* Fixed code style
* Add additional so pointers to the tensor
* Rewrite approach for compile model and tensor
* Removed API from proxy
* Fixed is/as Blob for wrapped Blobs
* Wrap tensor when set tensor to plugin
* Fixed recursive call
* Don't unwrap tensors for AUTO plugin
* Fixed Some Multi tests with remote blob for proxy
* Align context name with tests
* Fixed code style
* Try to fix more tests
* Some minor changes
* Try to fix OVHoldersTests
* Try to save low level SO in high level wrappers
* Revert "Try to save low level SO in high level wrappers"
This reverts commit 430ff8a526.
* Revert "Try to fix OVHoldersTests"
This reverts commit 32604f0d3e.
* Disable some tests
* Fixed DynamicBatchShapeTests
* Fixed caching tests and skip OVHoldersTest
* Small refactoring
* Fixed import model
* Small fix
* Fix typo which causes fails of caching tests
* Disabled AUTO BATCH for proxy device
* Support Export in Auto batch
* Small changes
* Fixed initialization fallback to plugin with proxy name
* Added more values for tests
* Ask all devices and create context if no device id
* Support export in auto batch
* Fixed some comments
* Fixed some comments and removed auto batch
* Fixed some comments
* Fixed auto batch test and some comments
* Fixed build
* Removed proxy plugin class from dev api
* Fixed code style
* Fixed disabled tests
* [GPU] Add roi_align get_shape_infer_dependencies (#18345)
* [GPU] Fix concat cpu impl for buffer fusing case
* [GPU] Add roi_align shape_infer unit tests
* [GPU] Fix windows build issue
* [GPU] add unit test
* Add auto pad attribute support for conv
* Fix to let concat onednn impl check can_be_optimized in impl_param instead of that in node
* Apply auto padding to kernel param for conv ocl impl
* conv shape agnostic kernel is not selected if conv is not explicit padding
* Fix failed TCs for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* skip fuse_quantize_f if input or output layout is dynamic
* Update condition of can_fuse_reorder_to_prev for concat in shapeof subgraph
* skip concat_in_place_optimization if concat is shape of subgraph
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add reorder if eltwise is shapeof subgraph and data type between input and output is different
* Skip reorder optimization if reorder has dynamic shape on remove_redundant_reorders
* Add reproducible TCs for ov_gpu_unit_tests
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* enable memory reuse for dynamic models
* updated to return dependant events for the shape_of primitive
* fixed memory_pool.release_memory()
* fixed a lint error
* fixed missing default value
* updated to use reset flag for dynamic models
* changed to use is_dynamic_output_layout instead of is_dynamic
* updated to use get_internal_params instread of buffer_ptr
* added a memory reuse test for dynamic models
* [GPU] Optimize stable_diffusion performance in iGPU.
Change the existing heuristic shape condition to permute and no transpose gemm in case of transpose gemm.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* add dynamic shape support for dgpu in prepare_buffer_fusing
* add unit test
* add space between test cases
* update condition of impl create() for concat dynamic shape
* update unit test
* add comment and update unit test
* add impl_param.is_type() function
* [GPU] Impl cldnn::condition to support dynamic shape (#18051)
* Impl CreateIfOp
* Update calc_output_layouts and execute_impl
* Enable gpu unit test
* Create gpu functional test
* [GPU] Follow-up code review (#18051)
* remove redundant codes
* create custom execute method for condition_inst
* change name from update_loop_primitive_map to update_inner_program_io_map
* [GPU] Fix gpu func test failures for fp16
* Add more test-cases to support fp16 and nested if case
* [GPU] remove redundant codes
* refactoring var names
* fix windows build error
* [GPU] Fix windows build issue
* [GPU] update calc_output_layouts
* [GPU] remove custom condition_inst::execute
* Remove virtual keyword from primitive_inst::execute()
* [GPU] Share single task executor between main program and inner program
* [GPU] Fix input rank issue for const inner network in condition op
* [GPU] apply calc_output_layouts for roi_align
Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
* [GPU] avoid checking allow_new_shape_infer for inner program
---------
Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
* Fix get_partial_shape tensor API to access the correct index of dimensions
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update the rule specifying output_type to the legacy one by referring to calc_output_layout
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add reproducible TCs related to issues for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix failed fc dynamic i8 TCs for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix are_data_types_sutable_for_onednn not to invalidate output layout
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply comment
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Not to add sync if the node is within shape of subgraph
Because the dependency is cpu impl so the execution is already finished.
* Fixed as review comment : Skip clFinish only when the runtime dep is shape of subgraph, not the current node