* Initial implementation of primitive, kernel selector, dummy kernel for RMS Norm
Signed-off-by: Andrew Park <andrew.park@intel.com>
* RMS ref kernel implementation with single WI
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add TC and reference func for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add internal RMS norm op
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add transformation which fuse RMS decompsition pattern to RMS internal op
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix pattern for RMS fusion transformation
* Update rms ref kernel for optimization and additional planar format suuport
* Initial impl for optimized rms kernel excluding leftovers handling and case smaller than vector size
* Update the initial version to handle leftovers and case smaller than vector size
* Fuse pre decom and post comp reorders additionally
* Enable dynamic impl for rms again
* Revert fuse pre decomp and post comp reorders additionally
* Add subgraph TC for ov_gpu_func_tests
* decrease error margin for f32 data type
* update description
Signed-off-by: Andrew Park <andrew.park@intel.com>
* update test param for input shapes
* Apply comments
* Fix failed TC for invalid gamma element type
* Apply comments
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update pattern that fuse post reorder together
* Apply comments
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [TF FE] Provide full support of TF1 Control flow and TensorArray ops
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Add missed header for TensorArrayV3 op
* Temporarily disable GRU cell fusion
* Update src/common/transformations/src/transformations/common_optimizations/moc_transformations.cpp
* Fix a case when element_shape for TensorArrayV3
* Fix translator for TensorArrayCloseV3
* Update summarize graph with TensorArrayCloseV3
* Add layer tests for TensorArrayScatterV3, Close, Size, Array
* Fix output shape for Merge node
* Remove unused variable
* Fix translator for TensorArrayConcatV3
* Fix translator for TensorArrayConcatV3
* Add layer tests for TensorArrayWriteV3, Gather, and Concat
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Add translator for GatherTree
* Fix TF FE unit-test for GatherTree
* Fix GatherTree translator
* Fix GatherTree translator to handle 1d end_token
* Fix undeclared parameter issue
* Fix GatherTree unit-test
* Add TensorArrayV3Replacer transformation
* Temporarily disable dangling transformation
* Recover RemoveMultiSubGraphOpDanglingParamsResults transformation
* Recover GRUCellFusion transformation
* Simplify check for GRUCellFusion transformation
* Use proper name for unit-tests
* Simplify translator for TensorArrayWriteV3
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix RemoveMultiSubgraphOpDanglingParamsResults transformation
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Additional fix for remove_multi_subgraph_op_dangling_params
* Make static TI run a dynamic subgraph
* Dedicated SL test
* Change condition to respect stat shapes
* Adjust test to cover the code path properly
* Recover fallback for still failing case GNMT
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Co-authored-by: Maksim Kutakov <maksim.kutakov@intel.com>
* [TF FE] Document full list of TF operations and their support by TF FE
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update src/frontends/tensorflow/docs/supported_ops.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* add m1 mac pipelines as a matrix parameter
* Update mac.yml
disable java_api because of macos arm64 - Java is not available on macOS arm64 runners
* Update mac.yml
added always condition for all tests
* Update mac.yml
* Update mac.yml
* Update mac.yml
* Update setup.py
temp commit
* Update tools/openvino_dev/setup.py
* use matrix for var
* add mxnet to extras only for x86_64
* skip failing tests
* use xfail for Python tests; add missing filter for transformations tests
* skip CPU func tests on x86_64 mac; skip some tests from CPU func tests on arm mac
* Update mac.yml
* skip tests on mac arm
* skip tests on darwin; apply review
* add more skips for python and c++ tests
* skip tf tests
* skip more tf tests; skip more Python UT stages
* rm alwayses, rm triggers, add nightly trigger
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Introduce WA to improve performance of find_port() method
* Add mutex
* Remove redindant lock
* Reduce the number of get_tensor_ptr calls
* Fixed typo
* Removed WAs from Hetero plugin
* Update model list and install cpu torch
* Move to hub tests
* Update tests/model_hub_tests/torch_tests/requirements.txt
* Make pytorch mainteiners owners of torch tests
* TorchFX: Constant value pass optimization
* Replace op.Constant with make_constant in fx_decoder
* Using shared memory for constant value passing
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
---------
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
* Fix command for Building with Ninja
Removing current directory from the command.
* Update docs/dev/build_windows.md
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Reference implementation for u4 constant compression from pytorch model based on bitwise ops pattern
* Fixed order of 4-bit halfs in byte
* Switched PyTorch FE to dev mode: in case if model cannot be fully converted, give partially converted model with PTFrameworkNode's with a printed warning (normally would raise an exception in case).
* Moved u4 compression to utils_quantize. Implemented not-interleaved version of u4 compression
* Removed debug output
* Added aten::matmul to the list of exceptions in may_produce_alias as a workaround for gptq models
* Added patching for gptq models applied automatically in convert_model
* WA for an inssue with u4 with earlier convert to fp16
* U4 blocked repacking for gptq patched model layout
* Deleted obsolete u4 re-packing based on aten::cat. Fixed the resulting u4 constant shape. Removed debug output.
* Revert "Switched PyTorch FE to dev mode: in case if model cannot be fully converted, give partially converted model with PTFrameworkNode's with a printed warning (normally would raise an exception in case)."
This reverts commit 0ef1455e70.
* Update src/frontends/pytorch/src/op/cat.cpp
* Check mask and shift values in u4 pattern. deque -> OutputVector for u4_compression_stack
* Convert to a given floating type instead of half in gptq patching. Better structured code.
* Code style fix
* Removed deque include
* Code style fixes
* Trailing space removed
* Fixed patched_forward and ts_decoder after unvalidated commits.
* Swap nibbles in u4/i4
* Better exception handling around jit.trace and gptq.patch_model
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
* Code style
* Revers int4 byte order
* Fixed core tests
* Fixed unguarded dynamic_cast result
Co-authored-by: Evgenya Nugmanova <eva.my.link@gmail.com>
* Fixed transformation tests
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* Prevent patching of non-gptq models
* Removed extra calling of quantized weights decompression patterns
* Better detection of supported AutoGPTQ models + more diagnostics
* Accurate diagnostics in case when aten::stack has multiple axes
---------
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
Co-authored-by: Evgenya Nugmanova <eva.my.link@gmail.com>
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* Migrate VariadicSlice to new API
- refactor to reduce bin size
* Move `get_tensors_partial_shapes` to dev API
* Use get_tensors_partial_shapes in VariadicSplit
* Remove `visit_attributes` is same as base
* Gather needs to keep the original input/output rank
- because the parameters as indices, batch_dims and axis depend on the rank.
- add input_rank to gather primitive.
* don't query on set_preferred_formats pass
-when the force_implementations is set.
-when forcing_impl is not onednn.