* Dynamic shape memory reuse solution
* Fix Split node to properly work with dyn mem
* Fix race condition for Memory mgrHandle
* Avoid Memory race condition between GetData and SetDataHandle
Add a lock for race condition between ov::intel_cpu::Memory::GetData() and ov::intel_cpu::Memory::SetDataHandle() is not a good solution,
which will impact the inference performance. We found that it is unnecessary get edge DataPtr in inferRequest::SetBlob or GetBlob, which
only need the tensorDesc, so we can only get tensorDesc to replace get dataPtr to avoid this race condition.
* Resolve reviewer's comments
* Avoid performance impact due to frenquent reset MemMngrHandle
If MemMngrHandle already has been assigned an external buffer, it can be reused.
Else it need create a new one.
* multiclass_nms opset9 spec, api, reference, paddle fe mapper, paddle fe unittest.
* multiclass_nms opset9 cpu node impl.
* multiclass_nms opset9 shape infer fix.
* multiclass_nms opset9: add transform ConvertMulticlassNms8ToMulticlassNms9.
* ConvertMulticlassNmsToMulticlassNmsIE: to MulticlassNmsIEInternal
* add test dependency package paddledet==2.1.0
* 1. fix for roisnum overflow. 2. common shape_infer private function.
Signed-off-by: jialipen <cecilia.peng@intel.com>
* 1. use common infer_shape helper. 2. fix roisnum overflow issue. 3. fix for nmsWithEta.
* test suite for opset9 multiclass_nms smoke tests pass, with both static and dynamic shapes.
code clean for unit test.
* decouple specification from this PR.
* op fuzzy: dynamic input/output
* reference impl refactor
* multiclass_nms_base no need clone_inputs.
* code clean
* restrict ppdet import
* fix clang format error
* change ppdet import to resolve CI fail issue related to its dependency.
* fix CI
* refactor: multiclass_nms_shape_inference for opset9 and reference impl.
TODO: could be applied to opset8 and even matrix_nms.
* fix CI build failure.
* CI fix for ambiguous namespace reference issue when
building static libs.
* update nms save_model python scripts.
* dynamic inputs for NMS with CPU plugin.
* copyright header for test scripts.
* op comformance test for multiclass_nms_9.
* minor update: is_type
* python opset9 and multiclass_nms
* flake8 CI fix
flake8 CI fix
flake8 CI fix
* remove NmsBase. stage1.
flake8 CI fix
remove NmsBase. stage 1 fix.
* rm NmsBase. stage2.
* more multiclass_nms prop tests and fix.
* remove unchanged ops from binding opset9.
* dependcy of paddle_tests.
* fix: add MulticlassNms to op mapper.
* clang format fix
* fix merge error.
* add formats for 3d conv
data formats
-bs_fs_zyx_bsv32_fsv32
-bs_fs_zyx_bsv32_fsv16
-bs_fs_zyx_bsv8_fsv4
-bs_fs_zyx_bsv8_fsv2
-bs_fs_zyx_bsv16_fsv32
-b_fs_zyx_fsv2, b_fs_zyx_fsv4
weight formats
-os_is_zyx_osa2_isa8_osv8_isv2
-os_is_zyx_osv8_isv4
-os_is_zyx_osv8_isv2
-gs_oizyx_gsv32
* add supported formats for primitives
* choose onednn convolution impl for 3d conv
* optimize layout of shallow depth convolution
* remove reorder for conv
* Don't remove reorder between bs_fs_zyx_b32_f16/f32 and bfyx.
* add formats to SetDefault() to optimize gws/lws for quantize/eltwise
* fallback cldnn if onednn pooling's layout is b_fs_zyx_fsv32 and i8.
* fixed wrong position for new weight formats
* restore imad_case()
* This func is used to choose format for fallbacked cldnn
* [GPU] add debug flag: OV_GPU_SerialCompile
0(default): parallel compile
1: serial compile
* add is_mixed_layout
* remove format::bs_fs_zyx_bsv8_fsv4 in needs_onednn_small_ic_to_blocked
* prevent to fuse the reorder which is between quantize and conv
* shallow feature first conv
* Revert "[MO args][ONNX FE]fix cutting graph with input, output or both (#9698)"
This reverts commit 2b03d5fe66.
* Fix cutting the graph when inputs/outputs are passed to the MO
* Check that port exists
* Simplification of getting node port
* Reducing amount of nesting inside searching of node by operation name
* Refactoring
- remove mutable default arg
- changes in code style
- change variables name
* Check that user input data type is dictionary
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* [GPU] Modify Softmax single layer tests to check Softmax-8 is supported with axes in [-rank, rank) interval
* [GPU] Fix cldnn::softmax::dimension_t documentation
* [GPU] Fix ParamsKey::EnableSoftmaxDim
Support Z dimension.
* [GPU] Add Softmax single layer test that checks 5D case
Since some Softmax kernel code contains ifdef on 5-dimensional case,
a test case is needed that covers this functionality.
* [GPU] Support axis 0 in Softmax
* [GPU] Modify Softmax single layer tests to check axis 0
* [GPU] Modify Softmax items class optimized kernel to handle axis 0 correctly
Modify single layer test accordingly.
* [GPU] Modify Softmax unit-test to check softmax::normalize_b
* Split SoftMaxLayerTest into opset1 and opset8 versions
Use SoftMax8LayerTest in the tests throughout repository.
SoftMaxLayerTest now defaults to SoftMax1LayerTest for compatibility.
* [GPU] Add f16 test-case for Softmax single-layer test
Co-authored-by: tgubanova-lohika <tgubanova@lohika.com>
* dft with single layer test
* idft with single layer test
* fix output param usage in dft
* update dft according to the clang-format
* move output layout setup to calc_output_layout
* add support for other dimensions
* add clDNN unit test for DFT/IDFT
* remove unnecessary original rank
* use defined formats in kernel
* fix dft docs
* changes after review
* Revert "fix dft docs"
This reverts commit 45b05172dfd161d92dae6d26e0f1b74748e56fd5.
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Mykhailo Hnap <mhnap@lohika.com>
With new networkx release (2.8.1) some of MO tests started to fail
with following error:
```
def __setstate__(self, state):
self._graph = G = state["_graph"]
self._adjdict = G._pred if hasattr(G, "pred") else G._adj
AttributeError: 'Graph' object has no attribute '_adj'
```
Seems like regression that was introduced in
f50fc70b8c
convolution_gpu_yxfb_yxio_b16 for fp16 has hardcoded reqd_work_group_size
to (16, 1, 1). On devices where CL_DEVICE_MAX_WORK_GROUP_SIZE is 512
GetOptimalLocalWorkGroupSizes picks (16, 2, 1) for LWS.
That causes issues during clEnqueueNDRangeKernel since LWS doesn't match
with reqd_work_group_size in the kernel.
* Add single layer tests for GPU
* Add GPU primitive for ExperimentalDetectronGenerateProposalsSingleImage
* Add kernel for ExperimentalDetectronGenerateProposalsSingleImage
* Add unit test
* rename abbreviation edgpsi to the full name experimental_detectron_generate_proposal_single_image
* Add f16 support to operation
* Add f16 support to the unit test
* Add notification about the second output in primitive
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
* Added shell for Eye-9
* Updated spec for Eye-9
* Added reference for Eye-9
* eye cpu
* Added op impl check for Eye-9
* Fix unallowed dynamic to static dim conversion in eye shape_infer
* Add template plugin tests for dynamic shapes
* Add template plugin tests for dynamic shapes batch input
* Enable batch shape input dynamic rank
* Uncomment 3D batch cpu Eye tests
* Update assertions and messages
* use ov::element type
* Remove redundant evaluate from eval map
* Style fix
* Add static_cast<T>(1) to cpu eye
* Add defaults to eye cpu class members
* Reuse out_ptr and checks
* Reutrn if onesPerBatchNum == 0
* Add Eye CPU Dynamic shape tests with 2D batch
* Additional test cases for CPU and reference
* Disable 3D batch eye cpu tests
* Fix CPU implementation for matrix with not equal cols and rows
* Update CPU test name
* Disable CPU Eye 3D batch static shapes tests
Co-authored-by: Alexandra Sidorova <alexandra.sidorova@intel.com>
Co-authored-by: Yury Gaydaychuk <yury.gaydaychuk@intel.com>
* Update oneDNN rls-v2.6
* Support weight tag for oneDNN v2.6
* Fix first conv selection issue in oneDNN
* oneDNN v2.6 required specific tags to run jit:ir primitives.
* any_tag can find optimized primitives in oneDNN.
* Enable aBcd2b src tag for oneDNN v2.6
* Add create_memory_desc from format string.
* Apply group depthwise separable conv uses jit:ir in oneDNN v2.6
* Use byxf format.
* Update only use acdb format in shallow group conv
* Fix refconv selection in shallow conv with post operations.