* Introduce WA to improve performance of find_port() method
* Add mutex
* Remove redindant lock
* Reduce the number of get_tensor_ptr calls
* Fixed typo
* Removed WAs from Hetero plugin
* Update model list and install cpu torch
* Move to hub tests
* Update tests/model_hub_tests/torch_tests/requirements.txt
* Make pytorch mainteiners owners of torch tests
* TorchFX: Constant value pass optimization
* Replace op.Constant with make_constant in fx_decoder
* Using shared memory for constant value passing
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
---------
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
* Fix command for Building with Ninja
Removing current directory from the command.
* Update docs/dev/build_windows.md
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Reference implementation for u4 constant compression from pytorch model based on bitwise ops pattern
* Fixed order of 4-bit halfs in byte
* Switched PyTorch FE to dev mode: in case if model cannot be fully converted, give partially converted model with PTFrameworkNode's with a printed warning (normally would raise an exception in case).
* Moved u4 compression to utils_quantize. Implemented not-interleaved version of u4 compression
* Removed debug output
* Added aten::matmul to the list of exceptions in may_produce_alias as a workaround for gptq models
* Added patching for gptq models applied automatically in convert_model
* WA for an inssue with u4 with earlier convert to fp16
* U4 blocked repacking for gptq patched model layout
* Deleted obsolete u4 re-packing based on aten::cat. Fixed the resulting u4 constant shape. Removed debug output.
* Revert "Switched PyTorch FE to dev mode: in case if model cannot be fully converted, give partially converted model with PTFrameworkNode's with a printed warning (normally would raise an exception in case)."
This reverts commit 0ef1455e70.
* Update src/frontends/pytorch/src/op/cat.cpp
* Check mask and shift values in u4 pattern. deque -> OutputVector for u4_compression_stack
* Convert to a given floating type instead of half in gptq patching. Better structured code.
* Code style fix
* Removed deque include
* Code style fixes
* Trailing space removed
* Fixed patched_forward and ts_decoder after unvalidated commits.
* Swap nibbles in u4/i4
* Better exception handling around jit.trace and gptq.patch_model
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
* Code style
* Revers int4 byte order
* Fixed core tests
* Fixed unguarded dynamic_cast result
Co-authored-by: Evgenya Nugmanova <eva.my.link@gmail.com>
* Fixed transformation tests
* Update src/bindings/python/src/openvino/frontend/pytorch/gptq.py
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* Prevent patching of non-gptq models
* Removed extra calling of quantized weights decompression patterns
* Better detection of supported AutoGPTQ models + more diagnostics
* Accurate diagnostics in case when aten::stack has multiple axes
---------
Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com>
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
Co-authored-by: Evgenya Nugmanova <eva.my.link@gmail.com>
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* Migrate VariadicSlice to new API
- refactor to reduce bin size
* Move `get_tensors_partial_shapes` to dev API
* Use get_tensors_partial_shapes in VariadicSplit
* Remove `visit_attributes` is same as base
* Gather needs to keep the original input/output rank
- because the parameters as indices, batch_dims and axis depend on the rank.
- add input_rank to gather primitive.
* don't query on set_preferred_formats pass
-when the force_implementations is set.
-when forcing_impl is not onednn.
* Add Multinomial-13 to MO
* Add Multinomial tests for MO IR reader
* Move convert_type check
* Imports clean up
* Update pacgage BOM file
* Avoid files collision in tests