* keep Const+DecompressionConvert pattern for CPU
* temporary disabled failing unit-tests
* disable CF by modifying bounds evaluate as well; minor corrections
* added TODOs with ticket numbers
* join const+decompression markings
* minimized convert_precision.cpp changes
* minor corrections
* refactor fp16 transformations: moved into separate fp16_compression folder
* style-fix
* minor fixes
* do not disable evaluate and CF in shape path
* safer disabling of Const conversion
* style-fix and minor corrections
* restore original placement of ConvertPrecision
* [GPU] Unique-10 operation implementation.
* Handled flattened case.
* Created results for all outputs in single layer test.
* Save total unique count as fifth output.
* Handled axis case.
* Added unique reshape kernel.
* Moved data types to unique primitive constructor.
* Added shape agnostic Unique ref kernel.
* Added blocked layout support to Unique-10.
* Use int in bubble sort.
* Added unit tests.
* Added support for blocked layouts to flattened mode.
* Fixed usage of shape_info in kernel.
* Use correct total data size for dynamic shapes.
* Commented some functional tests.
For some reasons big shapes cause std::bad_alloc.
* Initialize out_counts with zeros.
* Implemented new approach for reducing memory footprint.
Changed first kernel to only count unique values and changed second kernel to fill all outputs.
* Revert "Commented some functional tests."
This reverts commit a7f9763c575e71e14b85ee37adf1e98f10785c15.
* Fixed calc output layouts for flattened case when rank in greater than 4.
* Added temporary fix for axis case when rank is greater than 4.
* Revert "Added temporary fix for axis case when rank is greater than 4."
This reverts commit 236640d2f0e9d5b1f8dcbbf9482763badd7fde66.
* Renamed "unique" to "unique_count" and "unique_reshape" to "unique_gather" primitives.
* Quick fix for add_intermediate_node to consider dep_idx of multiple output
* Fix bug for multiple output:
1) get_reorder was getting reorder from cache regardless of the dep_idx.
2) remove_redundant_reorder was not considering original dep_idx
* Fixed conflicts.
* Fixed win build issue.
* Fixed build issue.
* Revert "Fix bug for multiple output:"
This reverts commit d4a2c4f32eabe9108df31d4837fed8995c93bd1c.
* Revert "Quick fix for add_intermediate_node to consider dep_idx of multiple output"
This reverts commit 2dfd2aaefdf32067a7469505b35f7096632ac5f2.
* Added some tests to skip config.
---------
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
* Remove NV12 and I420 blobs and deprecate some legacy API
* Fixed some errors
* Remove NV12 blobs
* Remote NV12 conversion
* Fixed other warnings
* Suppress version
* Fix some warnings
* Fixed version
* Try to fix some warnings
* Suppress warnings in C header
* Suppress warnings in C
* Fixed Windows exceptions
* Try to fix warnings
* Try to fix C bindings build
* Suppress InferRequest
* Fixed some build issues
* Fixed some errors
* Fuse convert reorder to prev MVN/Concat node
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add dynamic TCs for ov_gpu_unit_test
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add descriptions for changes
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix kernel selection failure
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add is_type_conversion_only function for reorder_node
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [GPU] Add shape of subgraphs markup and initial cpu implementations for some of primitives
* Apply review comments
* Exclude eltwise with boolean mode types from shape of subgraphs and fix leftovers
* There were two issues in runtime buffer fusing
1) Missing condition in matcher for dyanmic tensor
2) If the node is marked as can_be_optimized = true at build time and then turned out to false at runtime, the kernel compilation has been skipped becuaes it was checking node->can_be_optimized
=> To resolve this issue, added can_be_optimzied to impl_param and let the impl create check can_be_optimized in impl_param instead of that in node.
* Fixed primtiive::can_be_optimize to be set through function
* [GPU] Optimized out permute in permute-gemm(onednn) pattern.
Permute can be optimized out when permute's in and out are compatible and onednn gemm.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Initial impl for runtime buffer fusing
Passing unittest with static kernel
* pass unittest with dynamic impl
* Refactor allocate_output
* Separate header of buffer fusing
* Refactored buffer fusing :: matcher/optimize
* More cleanup
* Fix crash in dolly
* Reset can_be_optimized of primitive_inst when it is not
* Fix empty tensor : Primitive with empty data should be skipped
* Fix issue in dynamic padding : Static kernel should not contain dynamic padding dims
Fix missing reset of update_shape_done_by_other flag
* Not to add cache with emtpy kernel for optimized out inst
* Fix corner case error in buffer fusing
- Shapes of some preds may not be changed, but still needed to do update_impl because 1) paddings are changed 2) output memory should be updated
- optimizable impl should not be added to the cache
* Allowing reorder & permute_ref to be optimized concat predecessor
* Some more fixes :
runtime buffer fusing is available only when all preds/concat are dynamic
runtime buffer fusing is to be executed only if the node is dynamic
* Fix allocate_output parameter called by get_estimated_device_mem_usage according to the new change
* Fixed error in cascaded concatt
* Need to reinterprete even though the size is same
* Review interpolate shapes and label propagation
* Review shape_infer template implementation
* Update shape infer of interpolate in GPU plugin
- Add new tensor accessor for ov::Tensor map
* Correct casting in dim::scale function
* Remove validation of size of input 1 in v0
* Relax inputs check for interpolate v4
* Correct GPU shape inference
* Use ov::Tensors in interpolate's evaluate
- Remove some duplicated code
- Apply comments from review
* Set shape in interpolate's eval for output tensor
* primitive serialization
* updated primitive::desc() to use impl_param instead of program_node
* added hash caching unit tests
* added missed calls to save and load of parent
* updated copyright year
* [GPU] Added shape agnostic optimized Permute_tile_8x8_4x4 kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add permute_gpu_tile_8x8_4x4 shape agnostic TCs for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix calculation for required local mem size
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update not to condisder x and feature dimension for tile size on shape agnostic kernel case
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
+ Invalid calculation in reducing un-aligned feature axis for b_fs_yx_fsv16
+ Some reduce modes are not invariant by using 0 value out of range
+ Added jit ZERO_INVARIANT_REDUCTION
+ Enable blocked unit-tests on dGPU by PR#15873
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* enable PaddlePaddle elementwise broadcast
* fix CI fail issue
* Apply suggestions from code review
* fix CI fail issue
* only B to A broadcast is supported for PDPD
* fix GPU plugin testcase fail issue
* keep PDPD broadcast_merge cpu plugin implement align with ov core
* add type prop test case for pdpd broadcast dst shape smaller than src shape
* Build using conanfile.txt
* Update .ci/azure/linux_arm64.yml
* Several improvements
* Removed conanfile.py
* Try to use activate / deactivate
* Fixed clang-format code style
* Supported TBB version from Conan
* Added more NOMINMAX
* Fixed static build
* More improvements for static build
* Add usage of static snappy in case of static build
* More fixes
* Small fixes
* Final fixes
* deserialization of dynamic batch
* updated multi stream tests
* added unit tests
* updated cache dir name
* resolved type conversion warning
* removed teardown()
* added const