* Migrate Negative operator to new API
* Remove `visit_attributes` is same as base
* Use std::negate instead of lambda
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* Migrate slice to new API
* Remove visit_attributes, is same as base class
* Move shape checks to shape_infer
- minor refactor Slice op
* Move `get_tensors_partial_shapes` to dev API
* Correct comment
Co-authored-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
---------
Co-authored-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
* skip excessive mem alloc request in build
* update mem check function
* fix os behavior
* update mem size check location
* only dynamic shape case takes check_allocatable
* update check condition
* `RNNSequenceTest` to API2.0
* `Result` to API2.0
* `Reshape` to API2.0
* `ReorgYolo` to API2.0
* `RegionYolo` to API2.0
* Alignment fixes
* Skip more `RNNSequenceTest` cases
* Migrate Less operator to new API
* Migrate Greater operator to new API
- use less implementation in greater to reduce bin size
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* [workflows/linux] Switch to sccache and Azure Blob Storage
* Install curl
* Remove --show-config
* Add sccache to other Linux workflows
* sccache to Android, curl to riscv and CC
* Use sccache action instead of manual install
* Oops, missed sccache manual installation in two places
* Use env vars instead of hardcoded CMAKE_C(XX)_COMPILER_LAUNCHER
* Forgot one more stage in Linux CC pipeline
* Temporarily disable Blob Storage for RISC-V
For some reason sccache has no effect on build time and show 0 hits
and 0 compilation requests despite being in CMake calls
* forgot to add sccache installation to Linux CC
* Revert "Temporarily disable Blob Storage for RISC-V"
This reverts commit b528f41dad583a38b9ef93121e38044b9dccb71b.
* Missing container option for CC build
* Remove curl installation
* Remove CCACHE* variables which have no effect on sccache
* Revert sccache changes for Linux RISC-V workflow
* Add Rotation support to primitive and kernel
* Add unit tests
* Add transformation for NMSRotated
* add single-layer tests
* Fix: angle value for the same box may have its sign changed several times passing through iterations of batch and class loops.
* fix review comments
* Migrate Minimum op to new API
* Refactor evaluates to reduce binary size
- add infer_broadcast_shape, get shapes from tensors reduce OV_ASSERT
- refactor Evaluate structures to reduce binary size
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
Transformation fuses Transpose on first or second MatMul's input
and sets MatMul's transpose_a/transpose_b accordingly.
TransposeMatMul is already part of SmartReshape, but it can be added
to MOCTransformations as well so native models that are don't use reshape
can benefit from that.
Ticket: CVS-118908
* Preserve partial values on mod inputs
- static values full range of integers
- intervals only if not negatives
* Fix bounds evaluate when inputs are scalars
Current implementation tries to leverage branchless approach, but it's not correct
if scale is 0. In that case - zero point can can become inf or nan and multiplication
by 0 doesn't change its value. That causes another issue - infinite or NaN zero point
cannot be optimized out later.
Ticket: CVS-122931
Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com>
* try to fix memory leak issue
cpustreamer is released, but there are still thread id in t_stream_count_map
* fix threadlocal affect all threads
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* add comment for local() function to avoid mistaken modification
in the future
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* use custom stread id
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* fix review comments
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* fix format issue
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* create shared_ptr before assert
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
---------
Signed-off-by: HU Yuan2 <yuan2.hu@intel.com>
* Initial implementation of primitive, kernel selector, dummy kernel for RMS Norm
Signed-off-by: Andrew Park <andrew.park@intel.com>
* RMS ref kernel implementation with single WI
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add TC and reference func for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add internal RMS norm op
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add transformation which fuse RMS decompsition pattern to RMS internal op
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix pattern for RMS fusion transformation
* Update rms ref kernel for optimization and additional planar format suuport
* Initial impl for optimized rms kernel excluding leftovers handling and case smaller than vector size
* Update the initial version to handle leftovers and case smaller than vector size
* Fuse pre decom and post comp reorders additionally
* Enable dynamic impl for rms again
* Revert fuse pre decomp and post comp reorders additionally
* Add subgraph TC for ov_gpu_func_tests
* decrease error margin for f32 data type
* update description
Signed-off-by: Andrew Park <andrew.park@intel.com>
* update test param for input shapes
* Apply comments
* Fix failed TC for invalid gamma element type
* Apply comments
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update pattern that fuse post reorder together
* Apply comments
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>