* Robust detection of Cython version (#19537)
* Aligned protobuf version in conanfile.txt with onnx recipe (#19525)
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Change `VPUX` occurences to `NPU`
* Change library for `NPU` device in `api_conformance_helpers.hpp`
* Rename `MYRIAD plugin`
* Switch `HARDWARE_AWARE_IGNORED_PATTERNS` VPU to NPU
* Rename DEVICE_KEEMBAY to DEVICE_NPU
* Rename VPUX_DEVICE_NAME to NPU_DEVICE_NAME
* Rename vpu_patterns to npu_patterns
* Change VPUX occurences to NPU after review
* Remove VPUX device comment
* Change VPUX/vpu to NPU in tests/time_tests
* Rename VPU to NPU in docs after review
* Rename VPU to NPU in tools/pot after review
* Renamed vpu.json to npu.json in tools/pot after review
* Restore CommonTestUtils::DEVICE_KEEMBAY
---------
Co-authored-by: MirceaDan99 <mircea-aurelian.dan@intel.com>
* Add upper bound
* backport flake fix
* Support of protobuf >= 21 (#18351)
* Corrected typo
* Ability to compile with newer protobuf versions
* Limit numpy (#18406)
* Revert "[PyOV] Pin version of Cython for API 1.0 (#18604)" (#18681)
* Revert "[PyOV] Pin version of Cython for API 1.0 (#18604)"
This reverts commit 787796d88f.
* Suppressed clang warning
* Restrict scipy module version for POT (#18237)
* Restrict scipy module version for POT
Latest release https://pypi.org/project/scipy/1.11.0 causes dependency conflicts
* Bump OMZ to include scipy restriction
---------
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Alina Kladieva <alina.kladieva@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* [DOCS] Local distribution page improvements (#18049)
* add slider with os specific libs
* doc review
* local distrib doc changes
* [DOCS] Added local distribution libraries path (#18191)
* add relative path to the table
* add another column
* new table format
* fix build issue
* fix tab name
* remove old table
* format fixes
* change font
* change path windows
* change tabset name
* add arm and 86_64 tables
* remove list dots
* [DOCS] Add FrontEnd API note (#18154)
* add note
* fix typo
* add advance cases note
* tf doc note
* wording change
port: #17449
conformance table added
ARM merged with CPU
precision support and layout tables removed from the overview device article (info available in device articles)
Port from #17744
JIRA Ticket: 110042
Update of hardcoded links to switch references from latest, nightly and 2022.3 (and earlier) to 2023.0.
JIRA Ticket: 111393
Fix for the Mac (Intel CPU) link name (it should be Intel CPU instead of Intel GPU).
Port: #17484
A new PR will be created with more changes, as suggested by jane-intel and slyalin. The "deprecated" label for articles and additional content on converting models to ONNX will be covered then.
* [MO][TF FE] Document freezing as essential step for pruning SM format
* Update docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* StridedSlice improvements:
-Bound evaluation for begin, end partial values when ignore mask set.
- Custom constant fold implementation.
* Improve const folding when all begin or end values
are ignored
* Add transformation to convert adaptive pool to reduce
* Update src/common/transformations/src/transformations/common_optimizations/moc_transformations.cpp
* Add tests and apply feedback
* Simplify if branches
This change mimicks LinearToLinearONNXReplacer transformation in
legacy frontend, where linear interpolate mode is replaced with
linear_onnx due to performance reasons.
Ticket: CVS-108343
* Fixed dependencies check, made unsatisfied dependencies show only in case of error.
* Small fix.
* Test correction.
* Small test correction.
* Temporarily added debug print.
* Debug output.
* Debug output.
* Debug output.
* Test fix.
* Removed debug output.
* Small fix.
* Moved tests to check_info_messages_test.py
* Remove dependies checks from MO.
* Small corrections.
* Added Torchscript backend
* Added some torchscript backend tests to ci
* Removed tests from CI as torch.compile doesn't support 3.11 currently
* Fixed linter issues
* Addressed PR comments and linter issues
porting: https://github.com/openvinotoolkit/openvino/pull/15931
Divided MO Extensibility article into separate smaller articles,
Applied the suggestion from [DOCS] Better statement about MO extensions as internal API [Recreating #14062] #15679
Recreated images in svg format
Fixing directives
* [TF FE] Provide single tensor names for inputs and outputs in SavedModel
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
* Xfail some cases due to internal problems in TF
* Xfail other layer test
* Extend documentation for function to adjust tensor names
* Use old path of tf2 layer testing for legacy frontend
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix tensor names copying in TS transformations
* added a check that sinking is available for all consumers in TS backward transformations
* codestyle
* Apply review comments, add result sorting by tensor names in graph comparator
* delete debug code
* fix RemoveConsumers method implementation
* fix snippet tests
* use reference instead of raw pointer
* add new transformation tests
* fix transformation tests
Co-authored-by: Andrei Kochin <andrei.kochin@intel.com>
Since TF 2.10 the native model freezing can produce constants with undefined value,
i.e. tensor shape can be any and value is []. In this case the tensor just fills up with
the default value (0 - for numerics, "" - for strings)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Added missing import keyword (#17271)
* [DOCS] shift to rst - opsets N (#17267)
* opset to rst
* change list indentations
* fix formula
* add n operations
* add negative and nonzero
* fix link
* specs to rst
* fix matrixnms path
* change path to if
* fix list
* fix format
* DOCS remove deprecated options (#17167)
* DOCS remove deprecated options
* removed a couple more not actual questions
* remove the whole lines completely
* remove a couple of more deprecations
---------
Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com>
Co-authored-by: Pavel Esir <pavel.esir@intel.com>
* fix threading test sporadic failure
* fix read wrong data in muti threading
* fix read and write sync
* add lock before cpu._cpu_mapping_table[i][CPU_MAP_USED_FLAG],because CPU_MAP_USED_FLAG may be modified by set_cpu_used
* initial fix
* add corresponding unit test
* skip reorder fusing when sibling node does not support fused padding
* fix data type of axis for win build
* Revert "fix data type of axis for win build"
This reverts commit 719ea75d7826aafc7bb94c1971586c33a9842f10.
* add static casting for win build
* Replace opset with op version for TransposeSinking and SmartReshape transformations to reduce binary size
* replace opset with op version in some op_conversions transformations
* codestyle
* [AUTO] Plugin takes only Intel dGPU as 1st priority
* Update test case
* Simplify the code
* Support more test cases in GetDeviceList API
* Add notIntelGPU to _deviceBlocklist in AUTO plugin
* Restore some code formats
* Update test cases
* Add some logs to GetValidDevice API
* Simplify the code
---------
Co-authored-by: Wanglei Shen <wanglei.shen@intel.com>
* ConstFold Gather op in case of dynamic dims in data input
* Update ConstantFolding transformation to support Gather with dynamic input; add test
* always mark ShapeOf nodes as can_be_folded
* add additional checks for fused_names in the gather test
---------
Co-authored-by: Andrei Kochin <andrei.kochin@intel.com>
* [TF FE] Implement optimal conversion of body graphs
Preliminary setting input shapes and types for body graph InputModel
provides more optimal conversion of body-graphs.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Moved exception checks to _convert(), added suggestion to try legacy TF in case of conversion fail.
* Added test.
* Added send_conversion_result() method.
* Small correction.
* Update tools/mo/openvino/tools/mo/convert_impl.py
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Moved test_suggest_legacy_fe() test to check_info_messages_test.py.
* Removed not needed import.
* Small correction.
---------
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* [LPT] reused reference FQ implementation in fold_fake_quantize
* [LPT] Removed legacy parameters
* Added plugin tests with per-channel FQ for GrConv wo reshape
* Apply folding only in the case when FQ data input is constant
* EliminateFQ fix
* Separeted SavedModelVariablesIndex class from Saved Model
* Renamed SavedModelVariablesIndex class
* Enabled Tensorflow MetaGraph
* Enabled Tensorflow MetaGraph
* Covered VariableV2 and Assign nodes
* Applied review comments
* Added tests
* Added names to input/output ports too
* Fixed naming for using with MO
* Applied part of review comments
* Renamed meta.cpp and saved_model.cpp
* Applied shared_ptr for memory management of PtrNode
* Fixing CI
* Prevent cycles while passing thru graph
* Released requirement for Checkpointable Object Graph
* Changed naming approach to align port order
* Changed renaming order (before reordering)
* Added a Placeholder translator which checks updated shape
* WA missing Identity name
* Fix CI and restored lost translators after rebase
* WA for output names
* Removing unused params after cutting a model
* Prevents crash in case VariableV2 appears in freezed model
* Fixed saved model in case no variables.index found, but
variables exists
* Changed approach for handling native formats support
* Aligned behavior with freezing .meta files
* Fixed behavior for cutting a model by input tensor
* Applied review comments
* fix: embedded export is available for embedded targets only
* [GNA] functional tests fix - embedded export should NOT be possible on non-embedded target
* [GNA] tests added/justified to process both negative and positive path
* [GPU] Fix i8 representation error for clamp due to overflow
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix to not include in ocl code
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* fix input issuse of ScatterNDUpdate conformance test
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix typo and optimize temporary variable
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
---------
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add _streams_info_table in Executor config
* change useHyperThreading init value
* restore cmake
* fix comments
* add calling enableCpuPinning property
* fix judgment about number of sockets in init_stream
* fix test case compile issue
* fix ci test case fail issue
* modify GetPerformanceStreams calling position
* add affinity in get_cpu_pinning
* modify ecore judgement
* add no binding core on ADL
* fix ci issue, add get_num_numa_nodes()
* fix code style
* fix StreamsHasHigherPriority issue
* fix according to comments
* fix performance degression
* fix code style
* code style
* fix warning
* fix ci test failed
* fix ImportNetwork issue
* fix ci test case issue
* fix smoke_CachingSupportCase_CPU issue
* add ExportOptimalNumStreamsTest test
* modify test name
* modify ExportOptimalNumStreams test
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
* Update MULTI doc per current implementation
Signed-off-by: Peter Chen <peter.chen@intel.com>
* Update the description of Multi-Device execution mode
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Remove sample code and video
1. Remove the sample code for removed behaviors
2. Remove the video to avoid confusion
Signed-off-by: Peter Chen <peter.chen@intel.com>
---------
Signed-off-by: Peter Chen <peter.chen@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Intermediate state
* Remove old dyn batch path in the new api
* Remove legacy dyn batch support
* Remove dyn batch support field from the config
* Revert changes to the common part
* Revert accidental change in the test file
* Minor fixes
* Fix support for dyn batch without setting current
* Typo fix
* TypeRelaxed<>::clone_with_new_inputs thread safety fix
* Style
* Make TypeRelaxed<BaseOp>::clone_with_new_inputs copy node the same way as copy ctor of ov::Node
* Removed mutex field from intel_cpu::GraphContext
* Removed all about has_type_relaxed_ops field from the snippets subgraph
* Clonning test
* update auto architecture doc
* update auto architecture doc
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* update for comments
---------
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* [GPU] Fix levit-128s accuracy issue
Wrong batch dims for fused eltwise of gemm.
-> The issue is getting incorrect batch size of fused eltwise used by gemm.
Its rank is different from src tensor. Eltwise tensor rank was reduced by mistake.
It is only reproduce in batch 1 and full tensor.
The batch size in here means all of non spatial dims, but previous implementation was default batch dim role.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* use oneTBB for arm64
* force THREADING=TBB
* test: remove TBB_DIR for linux arm64
* update linux and mac arm64 packages
* update SHA256
* add comment
* disable add_rpath for tbb libraries on mac arm64
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
* [GPU] Resolve failed unit-tests on dGPU
+ Modified unit-tests of asymetric conv with per channel(WA for oneDNN issue)
+ Modified conv unit-tests with padded input or output
+ For testing oneDNN conv, it needs to query oneDNN about format. Applied this to conv tests.
+ Modified accuracy checking logic in unit-tests which have different format on dGPU.
+ reorder from fsv16 to bfyx should not be optimized out if not aligned by 16
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Fix of class conflicts in different frameworks.
* Remove commented code.
* Moved FakeQuantWithMinMaxVars to common part.
* Fixed BOM package test.
* Removed not needed code.
* Removed not needed code.
* Path retrieval fix
* More detailed messages in the failing test
* Exe path with model name
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* [GPU] Fix dump_graph failure issue in levit-128s model.
1. to_string() in strided_slice always access begin/end/stride param id from dependencies
regardless of max dependencies.
2. Add an exception in dump_full_node(). It helps below.
- Avoid a dump failure. Usually, graph dump are used during debugging,
which reduces unnecessary debugging time due to graph dump failure.
- You can immediately see which node has failed, making it easy to find it.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Revert "Revert "[CPU] optimize shape infer of Reshape (#16537)" (#16703)"
This reverts commit 06cacfe2a7.
* fix reshape connext with nonzero issue
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add nonzero connect with reshape testcase
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add debug code
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix test case issue
fix shape_nonzero testcase issue
fix a bug in origin test case
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* Revert "add debug code"
This reverts commit c305464c8c.
* fix other review comments except test case
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
---------
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* Fix Interpolate-11 in MO
* Add forgotten file
* Fix output type of TopK-11
* Do not force precision on port 1 for mode scales
* Update tools/mo/openvino/tools/mo/ops/interpolate.py
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Andrei Kochin <andrei.kochin@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* [Dynamic shape] Improve shape infer performance for igpu by preventing copy from usm_device to usm host from lock()
* Fixed is_shape_infer_dep to use pointer instead of unique_id becuase unique_id may not be set
* Try to return skipped test after FQ fix
* Copy FQ broadcast case from CPU to TEMPL tests
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Remove constructors for ov Exceptions
* Fixed linux build
* Fixed ONNX Frontend
* Fixed paddle
* Fixed exceptions in tests
* Deprecate constructors for ov::Exception
* Suppress some warnings
* Merge several exceptions
* Some small changes
* Suppress more warnings
* More warnings
* mode warnings
* Suppress more warnings
* More warnings
* [GNA] Fix 1D Pooling realized as part of 2D Convolution
* [GNA] Fix pooling for GNA_SW_FP32 mode when fused with Convolution2d
* [GNA] Fix ConvolutionPoolingStrideNotEqualWindowTest tests for 3_5
* shift to rst
* test snippets
* test build fixes
* change code block
* test new path
* change path
* add cancel
* change note format
* add docs
* change path to snippet
* change path to snippet
* change list format
* fix list
* fix snippets path
* fix format
* fix lists
* fix snippet
* compiled model doc fix
* change indentation
* small fixes to format
+ Resolved issues related to deconv
+ Modified test-cases for conv, fc.
+ In fc unit-tests, tiny tensors showed unexpected behavior. Modified tensor size a little
+ Bugfix in get_test_stream
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Enabled several ARM CPU tests
* Removed not-valid tests
* Fixed several template plugin tests
* Removed non-working suppressions
* Disabled 2 tests on ARM CPU
* Added deprecation of nv12 legacy API
* Added new files
* Change macros
* Suppress warnings for preprocessing
* Suppress warnings in tests
* Suppress warnings for Windows
* updated to allocate memory in order of size while deserializing
* fix windows build error
* updated to check dependencies between not connected nodes
* [MO] Remove use of mapping file and its generation
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix pylinter findings
* Remove usage of mapping file in the layer tests
* Fixing layer tests for legacy frontend
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [GPU] Fix to skip reorder optimization during post_optimize_graph phase
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply comment
Signed-off-by: Andrew Park <andrew.park@intel.com>
* update condition to check empty padding
Signed-off-by: Andrew Park <andrew.park@intel.com>
* add condition to check batch size
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix gather_nonzero not to be marked as constant.
Even though count nonzero is to be turned into a constant, gather nonzero still cannot infer shape at the moment of propagate constant.
* Apply the fix only for gather_non_zero
* [TF FE] Support delayed batch setting
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Cover BOM list
* Add unit-tests for batch setting with layout
* Apply code-review: check batch size
* Apply code-review: default index for any dimension
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [LPT][CPU] Added callback for AddTransformation
* [WIP] Convolution scales fusion
* Force to use weight sclae to test performance.
* Update on interface.
* Use weight scale to adapt to ONEDNN 3.x API changes.
* Update the code.
* Update ONEDNN fix for gemm_x8s8s32x_conv kernel
* Fix the bug in ONEDNN and deconvFusingScale.
* Fuse FC Bias when having DQscale.
* WR to perf regression on
* Update onednn version.
* Fix bug and clean code.
* FC fusing dq scale bug fix.
* Add more comments and debug information.
* Fix CI issues.
* Merge ONEDNN changes.
* Fix CI issues and bugs.
* Apply review comments.
* Update comments.
* Apply reveiw comments.
* Avoid using LPT BiasAttribute RTInfo.
* Applied review comments.
---------
Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com>
* Used singleton class for version check.
* Moved VersionChecker to utitl/version.py, added tests.
* Minor corrections.
* Sort imports.
* Small correction.
* Small correction.
* Remove exclusive_asyc_requests property from AUTO plugin
* Update test case
* Add test case to test incorrect config
* Remove the test case related to exclusive_asyc_requests property of AUTO plugin
* Allow stable sort in TopK when sorting by indices
* Clarification of stable sorting by index and unblocked test
* XFAIL the test again
* Clarification of sorting by indices
* Revert of changes in previous versions op TopK (spec)
* [CPU] ARM architecture support
This patch extends existing CPU plugin capabilities with ARM CPUs optimized support
* Fixed undefined reference in unit tests
* refactoring
* Fixed Eltwise node behavior for ARM
* init commit
* tests passed
* fix skip failures
* Apply suggestions from code review
---------
Co-authored-by: dmitrygo <dmitry.gorokhov@intel.com>
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* 1. refine the logic to ov::device::properties setting.
2. the config overrides will be performed if same config setting is came from CMD line.-a
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update configuration sample file within README.md.
* Update.
* Update.
* 1. Update configuration example file within REAMDME.md for Python version.
2. implement the config DEVICE_PROPERTIES value convertation between the string type and dictionary of Python type.
3. Update the configuration file loading and dumping logic.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
* Update.
* Update.
* Update.
* Update.
* 1. Enable configs to be interchangeable between C++ and Python.
2. Update perf_count showing logic.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Revert the logic of showing show performance counters.
* Update help msg for loading config option.
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Benchmark_app set ov::hint::allow_auto_batching through compile_model
* Remove the process about allow_auto_batching in set_property of core
* Remove allow_auto_batching and auto_batch_timeout property from AUTO plugin
* Reserve the info logs and add API to check auto_batching
* Update test case, rm AB property test from core config tests
* Update some API in AUTO plugin config
* fix Paddle unit tests unexpected exceptions and seg fault issue
* parse confine from reqfile to keep algin with other requirements
* Apply suggestions from code review
* Apply suggestions from code review
* [TF FE] Support NonMaxSuppression with named outputs
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Simplify the test for NMS named outputs
* Share a script for test model generation
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Applied w/a to resolve softmax accuracy issue
The original impl resulted in accuracy issue if leftover is not aligned with subgroup size.
(e.g., for shape [1024, 306] where the lws = 32, itemsNum = 9, leftover = 18, subgroup size = 16)
In such a case, the result got wrong if subgroup block read/write is used.
As a w/a, not to use subgroup block read/write if leftover is not aligned with nsubgroup size.
However we can come up with better itenNum size / lefover handling in the follot bwing up work.
* Fix build error & minor revise
* Fix condition
* [LPT][TESTS] GrConv: added test cases with per channel dq on weights and without reshape
* FoldFQ: don't transform FQ with quantization by several dimensions
* ConvolutionTransformation: supported GrConv with per channel dq on weights and without reshape
* fold_reshape: refactoring
* [TF FE] Test ResourceGather operation and fix debug caps
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix test generation script
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Remove cache_dir property from AUTO plugin
* Pass the secondary property to hardware plugin
* Update test case
* Update test case, meta plugin will pass the properties to device without checking
* Fix in create_same_type_const_scalar; accurate updating type for parameter when inlining function call body
* Added Unique to the list of operations with named output ports (another MUSE fix)
* Draft: working version of extension with named ports in TF
* Merge fixes
* Refactor and productize POC
* Clean up
* Fix build
* Fix code style
* Fix lib so extension test
* Fix namespaces
* Remove usage of Any from CreatorFunction
* Fix build
* Fix arm build
* Apply review feedback
* Fix build after merge
* Apply suggestions from code review
---------
Co-authored-by: Sergey Lyalin <sergey.lyalin@intel.com>
* [GPU] Fix sub kernel ordering issue in kernels_cache (#16746)
* [GPU] Add unit test for sub kernel idx (#16746)
* [GPU]Follow up code review (#16746)
* [GPU] Skip kernel compilation when current node is optimized out in update_impl (#16746)
* [GPU]Code refactoring (#16746)
* [PyOV] Fix getting all names in OVDict
* Add docs and adjust tests
* Fix linter issues
* Adjust typing and add test for incorrect key type
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
+ Bugfix bfyx_to_blocked_format kernel of reorder prim for doubl blocked format
+ issued format is bs_fs_yx_bsv16_fsv32. Added test-cases.
+ Fixed accuracy issue from check_accuracy_issue
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Show the detailed failure message when AUTO load network failed
* Add test case
* Update test case to check multi load network failed
* Update test case based master
* RM _availableDevices hard code from AUTO
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
* Add dependency from ov_plugins.hpp only for files which use it
* Remove rebuild files depends on CI_BUILD_NUMBER changes
* Try to fix static build
* Fixed comments
* Fixed build
* Merged some change
* Try to fix build
* Try to fix nvidia build
* Take LTO value from target property
* Add Core property to switch from `mmap` to `read`
in IR FrontEnd
* Add tests on `ov::enable_mmap` property
* Add `enable_mmap` in C & Py APIs
* ClangFormat
* Added convert_model() params docs.
* Added auto-generating of most cli params.
* Added auto-generating of cli params.
* Small correction.
* Removed wrong change.
* Corrected default values.
* Fixed errors, added tests.
* Small correction.
* Corrected params descriptions, moved cli specific params to separate file.
* Moved params specifics to utils/help.py.
* Fixing run_timest python script for input and output precision
* Update code according to the PR review
* Update run_timetest according to the last review
* Add input_precision and output_precision to test_timetest as well
* Set input/output precision per model
* improve SoftMax fusion
* style and unit-test fix
* more precise SoftMax unit-tests
* rewritten SoftMaxFusion with single matcher
* fixes for align_mixed_fp32_fp16_types_test.cpp and mark_subgraph_to_keep_in_mixed_precision_test.cpp
* add include for pass/pattern/op/or.hpp
* get rank only when necessary
* style-fix
* add comment why SoftmaxFusion is called manually
* fix copy_runtime_info
* [TF FE] Add diagnostics capabilities via Framework nodes
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Refactor normalize logic
* Applied code-review feedback: fix in get_unsupported_operations_and_failures
* Handle unknown exception type
* Store only first encountered failure
* Update src/frontends/tensorflow/tests/convert_unsupported.cpp
* Apply code-review ffeedback: use stringstream
* Correct Key for exception message
* Fix build
* Use helper for creation of fw node with exception message inside
* Add test for conversion with unknown exception
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Fix layer tests for BatchToSpace and add to the pre-commit
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Specify type for batch_shape
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Test the second output for TopK operation
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Switch off no sorted case
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [AUTO] Add ENABLE_STARTUP_FALLBACK and ENABLE_RUNTIME_FALLBACK properties to Python API
* Add DEVICE_BIND_BUFFER property
* Add AUTO properties to C API
* Update test case && Update AUTO properties in PYTHON API
* Create dedicated files for auto plugin
* Update header files
* Update test case
* Modify code style
* Update variable name
* Add test case for invalid input value
* Move memory tests from core to template plugin tests
* Rewrite tests to use template plugin
* Don't clone model in INTExecutable
* Add reset and modify tests
* Delete old test
* Fix clang-format
* Fix VariableState::set_state
* Enable and add var modify tests
* Fix INTExecutable
* Apply comments
[MULTI] pass through to AUTO with CTPUT hint
After this change
-- MULTI doesn't support setting infer request via CPU(4),GPU(8).
-- MULTI doesn't support CompiledModel::set_property() and ExecutableNetwork::GetConfig().
* [GPU] Add clDNN shape agnostic kernels usage as an initial impls for dGPU
* [GPU] Use layout as a key of weights cache, implement logic for weights cache capacity calculation based on available memory
* Review adaptive max pool shape inference
* Review AvgPool and MaxPool
* Review convolution operator
* Review GroupConvolution shape inference
* Review ConvolutionBackpropData operator
* Review GroupConvolutionBackpropData op
* Review BinaryConvolution operator
- add common bases for convolution ops
- refactor convolution ops
* Review DeformableConvolution operator
* Use new convolution shape_infer in GPU
* Fix build and test issues
* Correct set output spatial shape
in default constructed back prop convolutions
* The convolution shape_infer use pads as parameters
the external padding can be operators or other class padding properties shape_infer should not modify operators padding when
called from plugin
* Apply code formatting
* Fix padding validation and update
* Max and Avg pool don't update op properties
from plugin shape inference
- use ShapeInferWithPadding for pooling operators
* Remove not used function in shape_inference
* Fix evaluates in MaxPool
* Relax convolution shape infer inputs size check
* Remove unused entryFallbackWithPadding class
* Remove unused dilations variable
* Remove unused resize_attributes from max_pool_base
---------
Co-authored-by: mitruska <katarzyna.mitrus@intel.com>
* User can set input and output precision for timetest tool
* Update run_timetest.py with the ip and op options as well
* Use only one getType function
* Add extra line at the end of the file
* Remove unused parameters
* Update comment accordingly
---------
Co-authored-by: Vitaliy Urusovskij <vitaliy.urusovskij@intel.com>
* Remove suppression Wno-delete-non-abstract-non-virtual-dtor
* Fixed Allocator warning
* Suppress warning for GPU plugin
* Skip warning for GNA
* Fixed preprocessing
* Added virtual constructor for base plugin class
* Some fix for CPU
* Suppress for CPU
* Fixed any
* Fixed meta
* Disable warning for paddle
* Fixed Allocator tests
* Move suppress to paddle
* Fixed benchmark_app
* add reshape shapeinfer in cpu plugin
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add squeeze and unsqueeze
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add precision i8 i64 on test
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix code out of bounds risk
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* test performance of this PR
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix code issue
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* Revert "test performance of this PR"
This reverts commit f4f9f002de28d03bc1c55c24067f75b74824904c.
* fix reviewer comment
fix throw message
not create ov::shape instance
remove i8 test case
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix pytorch layer test failed issue
inputShape(1,0) outpattern(-1) is a valid input
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix windows compile issue
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* fix rebase mistaken
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
---------
Signed-off-by: Hu Yuan2 <yuan2.hu@intel.com>
* add opextension support
* support opconversion
* fix test contructor ambiguous
* fix ci fail
* add tag to avoid compiler ambiguous
* move tests to layer_tests & remove PaddleTag
* static cast
* use create_ov_node_by_name
---------
Co-authored-by: Luo Cheng <cheng.luo@intel.com>
* Fix failed unit-tests on dGPU
+ modified fully_connected_random_test_i8_3d not to have ambiguous
+ oneDNN does NOT support i64 type for reorder. Added exception.
+ bugfix in prepare_primitive_fusing about exception of activation function
+ Add exception logic for dynamic to select ocl type in is_node_for_onednn
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Reference impl for interpolate-11 init
* ND support init
* Tests clean up
* Add evaluate method for Interpolate-11
* New version tests init
* Type parametrized tests
* Tests duplication clean up and reusage of v4 test cases
* Add clipping to the type bounds
* Style fix
* Add float type tests
* Fix default ports values
* Commented code clean up
* Add passing cube_coeff param
* Tests clean up
* Add separate namespace
* Adjust variable names
* Adjust function name
* Use vectors instead of raw ptrs
* update func to static inline
* Adjust types
* Add Interpolate-11 to template plugin evaluates map
* Revert interpolate-11 core evaluate support
* Use const ref to filter
* Use static cast
* Update link
* Enable MapAllocator in IR Frontend
* Fix `ov_infer_request_ppp` test
With `mmap()`ing of IR, .bin can't be deleted until unmapping.
And it shows that there was a leak in test
* Add comment to Win `CreateFile()` regarding
FILE_SHARE_DELETE
* Unmap .bin file before IR files deletion
Wait ov::Model deletion to trigger .bin file unmapping
before IR files deletion
* ClangFormat
* Add `use_map_allocator` switch in FE
In case of direct use of FE (e.g. via MO), `mmap()` is OFF.
But in case of use FE via Core, `mmap()` is ON.
* Review adaptive max pool shape inference
* Review AvgPool and MaxPool
* Review convolution operator
* Review GroupConvolution shape inference
* Review ConvolutionBackpropData operator
* Review GroupConvolutionBackpropData op
* Review BinaryConvolution operator
- add common bases for convolution ops
- refactor convolution ops
* Review DeformableConvolution operator
* Use new convolution shape_infer in GPU
* Fix build and test issues
* Correct set output spatial shape
in default constructed back prop convolutions
* The convolution shape_infer use pads as parameters
the external padding can be operators or other class padding properties shape_infer should not modify operators padding when
called from plugin
* Apply code formatting
* Fix padding validation and update
* Use shape inference with padding instead fallback
for DeformableConvolution from opset1
* Update convertPadding function to be template
* * update kernel_ids using hash value
* Change set to unordered_map for kernels_code
* replace unique_id to hash value
* Remove hash_val params
* remove redundant codes (#16262)
** Remove unique_id in program_node
** Remove gen_kernel_id
** Remove set_kernels_source
** Remove remove_kernels
** Remove kernel_idx in kernels_cache
* * Use kernel_impl_params instead of kernel_id
* Divide batch when entry_point are duplicated
* rollback removing unique_id
* * Fix get_kernel failure issue (#102467)
- Modify has function of custom_gpu_primitive and generic_layer
- Add ==operation of generic_layer for _kernels map in kernels_cache
- Fix invalid kernel_impl_params related to unique_ptr life cycle issue
* Improve kernels_cache (#102467)
* Move add_kernels_source step to build_implementations
* Change replace kernels_code key to kernel_impl_params
* Return kernel vector in get_kernels
* Modify function name to get_kernels (#102467)
* Fix functions related graph serialization (#102467)
* Fix failure to run dynamic model (#102467)
* Add unit test
* Code review follow-up
- Add const to input params
- Add missing code to check kernel duplication in kernels_cache
* Add const to input params (#102467)
* [GPU] update hash and ==operator for generic_layer and custom_gpu_primitive (#102467)
* [GPU] override get_kernels_source in generic_layer and custom_gpu_primitive (#102467)
* [GPU] Fix onednn build error (#102467)
* [GPU] Fix Lin build error (#102467)
* [GPU] kernels_cache::get_kernels return vector of clone of cldnn::kernel (#102467)
* Updated serialization logics for improved kernel caches (#16262)
* primitive key kernel cache for serialization
* kernel serialization with binaries hash
* fix kernel cache init function for deserialization
* removed unnecessary codes
* [GPU] Update commnet and fix test failure (#16262)
* [GPU] Fix custom_gpu_primitive unit test failures (#16262)
* [GPU] Improved kernels cache serialization (#16262)
* removed hash in serialization logic
* update not to create a new kernels_cache for serialization
* code refactoring in serialization logic
* [GPU] Follow-up code review (#16262)
* [GPU] modify lock(#16262)
* [GPU] Fix custom_gpu_primitive unit test failure (#16262)
---------
Co-authored-by: Eddy Kim <eddy.kim@intel.com>
* Review ROIPooling class
- check interval shape and label propagation
- add template shape_infer
- add shape infer into cpu plugin
- add test with StaticShape
* Use get_output_roi instead of get_output_size
* Add missing includes
* Review PSROIPooling operator
- review interval and label propagation
- add template shape_infer implementation
- add shape_infer to cpu plugin
* Add snippets dependency
* - removed dependency back
- added an INTEL_CPU condition on snippets configuring -> no dependency when configured w/0 CPU
* Disable snippets_ngraph_functions conditionally if inference_engine_snippets are not configured
---------
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Move all openvino_conversion rountines into utils. Avoid using Squeeze without axis
that can create dynamic output rank
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [GPU] Added shape agnostic TopK kernel implementation
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update kernel to use internal buffers for shape agnostic kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add WA to compile_graph for shape agnostic arg_max_min_axis with non-const k input
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix is_dynamic pameter for FillCLKernelData with the case where the output is static shape
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix corner case where inbuf size becomes 0 when ops_size is 1
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Implement CTPUT in AUTO code logic
* Add logic to handle device loading failure
* add some code comments
* fix warnning conversion from size_t to int
* Updated code according to comments of bell and wanglei
* the preferred device code path need to be updated with ctput also
* add fallback logic for CTPUT
* Modify the code logic according to bell suggestion
* Add prints for debugging bug
* throw exception when no device to run pipline task
* initialize idleWorkerRequest for CTPUT
* fix getting properties
Signed-off-by: fishbell <bell.song@intel.com>
refine
Signed-off-by: fishbell <bell.song@intel.com>
* fix warning
Signed-off-by: fishbell <bell.song@intel.com>
* fix illegal character on windows
Signed-off-by: fishbell <bell.song@intel.com>
* fix illegal character
Signed-off-by: fishbell <bell.song@intel.com>
add missing include
Signed-off-by: fishbell <bell.song@intel.com>
* more code refine
Signed-off-by: fishbell <bell.song@intel.com>
---------
Signed-off-by: fishbell <bell.song@intel.com>
Co-authored-by: fishbell <bell.song@intel.com>
* Properties improvements: part 2
* Accurate configs handling in HETERO / BATCH
* Align plugins in caching properties
* Fixed caching mock tests
* Added new TestNoCachingProperties test
* Fixed test
* Added ov::caching_properties to API 1.0 metrics as well
* Fixes for HETERO plugin
* Fixed tests
* Even more refactoring in HETERO plugin config management
* Align plugins in caching properties
* Fixed caching mock tests
* Added new TestNoCachingProperties test
* Fixed test
* Added ov::caching_properties to API 1.0 metrics as well
* Prevent memory reset at runtime allocation for dynamic shape
* Set default alloc to reset mem
* Additional fixes :
- If there is any convolution/deconvolution users which requires padded input, enqueue reset buffer when reuse buffer.
- Removed cl finish from gpu_buffer::fill. (Hopefully it should be waited only when needed. Otherwise sync is to be done by event)
- Removed buffer reset from on_execute of nonzero count, which is not needed any more.
* Remove unused API
* Fix tensor offset to project the padding
* Added unittest
* Applied review comment
* Added Saved Model proto descriptors
* Included Google's protobuf repository
* Added wstring version of ov::util::directory_exists
* Added initial implementation of Saved Model iterator
# Conflicts:
# src/frontends/tensorflow/src/frontend.cpp
* Added missing proto files to repository
* Implemented reading of variables index and data files
# Conflicts:
# src/frontends/tensorflow/src/frontend.cpp
* Renamed class
# Conflicts:
# src/frontends/tensorflow/src/frontend.cpp
* Fix for cross-platform directory_exists
* Fixed codestyle and simplified code
* CI fixes
* Separeted Saved Model iterator from Proto iterator
* Moved variables index into separate class
* Added initial implementation of reading a variables from
saved model
# Conflicts:
# src/frontends/tensorflow/src/frontend.cpp
* Added external variable mapping
* Code cleanup
* Commit is for discussion purposes!!!
Implemented RestoreV2 with a workaround for strings
Not optimized, includes mem leak
* In progress...
* Added DT_STRING coverage into decoder_proto
* m_variables_index moved into underlying class
* Updated copyrgihts, added space between license and code
* Moved string constant to separate class
* Added AssignVariableOp operation
* Changed behavior of RestoreV2
Updated stubs for other ops
* Second working implementation, enabled:
Program-only models
Variables reading from data files
* Extended docs
* Fixed dynamic type
* Fixed naming
* Added Snappy submodule to support compression in TF FE
* Enabled Snappy Compression for TF FE
* Make static linkage of Snappy
Changing Warning as error behavior for 3rd party
* CI fixes
* Added Snappy copyright info
* Aligned behavior of StringConstant with UnsupportedConstant
* Added correct naming and removing unused inputs/outputs
* [TF FE] Post leftovers to support the MUSE model in SavedModel format
It contains tests imitating a case with Tokenizer extension and raised problems:
setting custom type for body graph Parameter, named ports for RaggedTensorToSparse
and Unique operations.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update src/frontends/tensorflow/tests/convert_tricky_models.cpp
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* py/benchmark_app: fix -hint
Don't warn about values which are explicitly set in -hint.
That aligns C++ and Python implementations.
Ticket 106544
* Remove extra throw
* Fix code style
* Add GatherV7 and gatherV8 for convert_gather_0d pattern
* Add updating output_shape using reorder/reshape for scalar indice instead of using ConvertGather0D pass
* Add WA for NMS-gather8 pattern
* Improve op support for detectron mask rcnn
* Initial commit
* Fix for reading processed list
* Format code
* Cleanup
* cleanup
* Cleanup
* cleanup test
* Add comment
* Add rt_info
* fix type
* More fixes for detectron
* Fix build
* Add tests for if
* Revert changes in index
* Add comment
* Fix test
* Fix get_axes_range
* Add tests and fix if type alignment
* Fix code style
---------
Co-authored-by: Mateusz <mateusz.mikolajczyk@intel.com>
* [PyOV] Fix issues with RTMap
* update year
* some clean-up and items fix
* tests and small fixes
* Update src/bindings/python/src/pyopenvino/utils/utils.cpp
* undo changes
* fix serialization on python side
* rt_info as rt_map
* undo several changes in tests
* fix mo test
* sadd docstrings
* add tests
* fix codestyle
* try to fix win
* fix master
* apply comments
At the conversion stage we can't resolve Assert node because the condition
is computed only during inference time.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update core_impl.cpp
Add first implementation of register_compile_time_plugins (needs to depend on the actual CMake configuration as a next step).
* Update core.cpp
Check for missing plugins.xml
* Update core_impl.cpp
Avoid exception for missing plugins.xml
* Update core_impl.hpp
Add register_compile_time_plugins function definition
* Plugin loading based on CMake configuration
* Remove debug output command
* Unify static/dynamic plugin loading
* Add CMake option for plugins.xml that defaults to off
* Move GENERATE_PLUGINS_XML option to features.cmake
* Add missing brace
* Remove unnecessary #ifdef check
* Prepare to resolve conflicts
* Fix compile error
* Activate generation of plugins.xml in OpenVINODeveloperPackageConfig.cmake
* Fix CMake installation
* Plugin loading logic implemented in ie_core.cpp as well
* Fix format
* Small fixes
* Fixed code style
* Skip if xml file wasn't found
* Added function to find compiled plugins
* Generalize plugins hpp
* Use new API
* Fixed old core
* Fixed static build
---------
Co-authored-by: CSBVision <bjoern.boeken@csb.com>
* [GPU] Use 4-dim directly for onednn in gemm
We were collapsing n-dim into 3d for onednn gemm, But it is not necessary, up to 4d.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* [GPU] Added shape agnostic kernel for fully_connected_gpu_imad
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add fully_connected_gpu_imad shape agnostic TCs for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply comments
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* 1. Add device blacklist for AUTO plugin.
2. Update the logic to parse out the device candidate list from the inputting config MULTI_DEVICE_PRIORITIES.
3. Update the corresponding mock test cases.
4. Ignore the GTEST warning for the test cases.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
* Update.
* Update.
* Add description about blacklist.
* Apply suggestions from code review
Update.
Co-authored-by: yanlan song <bell.song@intel.com>
* Update.
* Apply suggestions from code review
Updated.
Co-authored-by: yanlan song <bell.song@intel.com>
Co-authored-by: River Li <river.li@intel.com>
* Update test case.
* Update test case.
* Update test case.
* Update.
* Update.
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: yanlan song <bell.song@intel.com>
Co-authored-by: River Li <river.li@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* Review ROIPooling class
- check interval shape and label propagation
- add template shape_infer
- add shape infer into cpu plugin
- add test with StaticShape
* Use get_output_roi instead of get_output_size
* Add missing includes
* [VPUX] - Tensor data with element type f16, is not representable as pointer to i16
* [VPUX] - Tensor data with element type f16, is not representable as pointer to i16
* [MO][TF FE] Do not print TF FE message in case of fallback
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Correct test model with Switch and Merge
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Runtime fallback to other devices
* Update properties.hpp
* Update infer callback in AUTO
* Avoid some hang cases
* Add test cases for AUTO runtime fallback
* Replace mockExecutor with ImmediateExecutor
* Update the runtime fallback logic
* Update test case and support the case thar infer failed on CPU_HELP
* Update the test to detect whether to throw exception
* fix the error of CTPUT
* Add lock to AUTO executable network GetContext
* Update variable name in selectOtherDevice API
* Simplify variables and add testcase to improve test coverage
* Fix the issues when release CPU_HELP device and clean up the code
* Clean up code
* Fix inference for non-const inputs for operators:
- batch to space
- space to batch
* Evaluate of b2s, s2b supports all parameter inputs
- update template plugin test to use parameters instead constants
* Fix documentation (md and inline) for C++ and Python spech samples
* Fix clang-format
* Minor fix
* Fix clang-format
* Fix a typo
* Fix according to Mike's review
* Fix clang-format
* Extend template plugin tests
* Fixed loaded_from_cache for new API
* Added const
* Added ov::loaded_from_cache as supported property of CompiledModel
* Remove loaded_from_cache from core
* Reverted logic for old plugins
* Fixed comments
* Fixed build
* fix 1
* fix 2-10
* fixed code style
* fixed win plugin
* fixed linux plugin
* fixed a part of tests
* fixed test fot linux
* fixed pooling_gpu_test fot linux
* fixed pooling_gpu_test fot linux
* fix after review and enable wd4267 in makefile
* fix after review
* errors of unit test are fixed
* Update ONNX Runtime from rel-1.8.1 to rel-1.14.0
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
* Upgrade Cmake to 3.24.0
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
* Revert "Upgrade Cmake to 3.24.0"
This reverts commit 04a00f60c0.
* Update CMake to version 3.24.0
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
* Skip CApiTest.test_custom_op_openvino_wrapper_library test for tmp, will add back with the new ONNX Runtime version
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
---------
Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
* Add descriptions to the transformations, add additional checks
* fix a warning
* TransposeSinking Rafactoring part2: move the transformations to a separate folder, align namespaces
* TransposeSinking refactoring: class names, namespaces
* codestyle
* resolve merge conflicts
* fix special FQ with zero range in quantized models
* fix format & comments
* Add test case
* remove dot interval test case from smoke_LPT/FakeQuantizeTransformation.CompareFunctions
* Remove dot interval gpu test case because Pooling is also folded
* handle review comment
* fix code style
* update docs
* remove fold_zero_multiply
* Mark all failed ONNX layer tests as XFail
* Add additional xfailed marks
* Add one more failed tests into XFail
* Add conditions for CPU/GPU failures
* Revert "Add conditions for CPU/GPU failures"
This reverts commit 790524c59c.
* Add failures separation for CPU/GPU
* Replace all xfail with skip
* Remove redundant clone from serialize pass
* Revert padding changes in serialize pass
* Provide a class for local copy of nodes with paddigs
* Fixed comments
* IR serialization for dynamic models
* added ShapeOf1To3 transformation pass
* fixed input output type mismatch
* removed unnecessary codes
* moved ConvertShapeOf1To3 from common to GPU plugin
* updated copyright year
* fixed build errors
* Reduce the number of validate and infer types in ConvertPrecision
Currently, ConvertPrecision pass frequently runs validate and infer types.
This is due to the fact that it iterates over every precision pair, then over
the whole model followed by validate and infer types.
The proposed solution is to iterate over the model: for each node iterate
over precisions array, update the node if required followed by validate and
infer types.
Ticket: 81311
* use map
* clang format
* move enum hasher
* fix gpu
* revalidate
* reinvalidate if node has changed
* remove validate for input prec changes
* fix gpu
* review
* find
* fix pytorch case
* revalidate
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* Stabilize ascending comparison of ref impl
* Use reference to gtest param
* Create ref impl tests
* Fix descending by index sorting
* Sort by index both ways
* Make sort by index always ascending (revert)
* Add possibility to use memory alignment different than 64B
* update tests for new memory api
* Remove ineffective code
* [FIX] Fix memory alignment issues for graph compiler primitives
* Update after review
* Ability to provide several source dirs for ncc-style checks
* Fixed include headers; added NCC to TF common
* Fixed NCC for frontends
* Fixed NCC for frontends
* Extra fixes
* Fixest push --f
* Clang-format
* Apply comments
* Add an option to specify required clang-format version
* Update src/frontends/tensorflow/src/decoder_proto.cpp
* Update src/frontends/tensorflow/src/decoder_proto.cpp
* Review adaptive avg pool shape inference
* Review adaptive max pool shape inference
* Review AvgPool and MaxPool
* Minor improvement for StaticShape
* Update ShapeInferBaseWithPadding's infer
to be compatible with interface after rebase
* Fix build issues
* Set default pads before checks
* Fix include openvino headers
* [GPU] Change lws to avoid synchronization issue in nonzero_count (#16116)
* [GPU] Add unit test (#16116)
* [GPU] update count_nonzero_ref kernel(#16116)
- Support the case total data size exceed max work group size
- Add dynamic shape test case
* [GPU] Change input indexing calculation and add random input generator in unit test (#16116)
* [GPU] update random generation input funciton in nonzero_count (#16116)
* [GPU] update unit test (#16116)
* [GPU] cldnn unit test: update random generation function for other test failure (fusings_gpu/conv_fp32_multi_eltwise_quantization.basic/0) (#16116)
* [TF FE] Convert a model with Framework nodes
Now the conversion pipeline will convert all unsupported operations to Framework nodes
It is done with a hope that sub-graphs with Framework Nodes will be cut in later stages
like auto-pruning.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
* Fix dynamic element type for FusedBatchNorm
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
* Fix build issue
* Continue translation in case translator limitation
* Change undefined to dynamic type
* Have one more change to dynamic type
* Change undefined to dynamic in Const translator
* Expect MO to handle dynamic type
* Exclude TransposeSinking pass if model contains Framework nodes
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* switched public Azure Linux build to clang
* Fixed GNA compilation
* Suppressed warning in GNA tests
* switched public Azure Linux build to clang
* Fixed GNA compilation
* Suppressed warning in GNA tests
* More fixes
* Skip test on CPU
* Update template plugin main documentation pages
* Update plugin documentation
* Add more documentation for method
* Register new doxygen groups
* Updated group
* Added ie group
* Fixed comments
* Reuse new implementation inside the old one
* Try to fix titles
* Fix class fields level
* [GPU] Enabled ComparisonLayerTest in single layer tests.
It seems that before, these tests were disabled cause of some failures. Now I cannot see any errors, so I just enabled all of them.
* [GPU] Run clang format for comparison single layer tests.
* [GPU] Added handling of f16 type to IsInfLayerTest.
* [GPU] Added single-layer tests for IsFinite and IsNaN operations.
* [GPU] Added single-layer test for IsInf operation.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions.
But notice that currently, the activation kernel support only the same output data type as the input data type. So an additional reorder would be needed to convert to the correct output data type for these ops. Also worth noting is that activation functions are fused in reorder kernel. But for now, it's not working for these ops because in reorder activation call, there is a hard conversion of input data to output data type before activation. I don't know why it's added there, but it breaks fusion. So need to fix this activation fusion or disable this fusion for these ops.
* Revert "[GPU] Implemented IsFinite, IsInf, and IsNaN operations as activation functions."
This reverts commit 3f9ffe617ecddce6dbbcdeab9584a7ddeb6d1845.
* [GPU] Implemented IsFinite, IsInf, and IsNaN operations as eltwise op.
* [GPU] Changed CLDNN_ERROR_MESSAGE to OPENVINO_ASSERT in check_inputs_count method.
* [GPU] Minor fix for dynamic bert-base-uncased-qqp
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix to check full tensor only for static shape during creating onednn gemm
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
- Previously, PR15386 changed allocation of memory of primitives which are to be used as shape infer dep to host memory, for better shape infer perf.
- However this causes cache coherence issue in dGPU.
- Reverting this change so that the memory will be allocated to devicet
* [TF FE] Support EmptyTensorList and TensorListPushBack operations
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Rename a script to generate the test model
* Correct test model generating script
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* flush fp32 subnormals to zero in IR
* style fix in test_offline_api.py
* simplified call of FlushFP32SubnormalsToZero: is called form offline_transformations.cpp
* reverted offline_transformations.py
* use fpclassify
* style-fix
* Update src/common/transformations/tests/common_optimizations/flush_fp32_subnormals_to_zero_test.cpp
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
---------
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
* initial version of implementation
* styles applied
* fixed and registration
* add more unit tests
* fixed and in legacy opset
* review remarks
* refactor of version name range
* [dGPU] Enable stable diffusion
+ Prevent to fuse swish into oneDNN reorder.
+ Makes concat explicitly if batch size is greater than 1 and the siblings are oneDNN impl.
* Small CoreImpl refactoring
* Removed cache_dirhandling from CPU plugin
* clang-format
* Fixed python tests
* Fix
* Fixed bugs in HETERO case
* Fixed clang-format and warnings in auto plugin
* Added import_export as capability for TEMPLATE plugin
* Commented throw exception from loaded_from_cache
* Fixed clang-formatof ro template plugin
This is a corner case because body graph nodes have named output ports.
This allows to support custom RetinaNet model.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* remove ov::device::thermal
ov::device::thermal was only supported on myriad
* additional cleanup
* remove myriad from AUTO and MULTI
auto n multi n hetero
+ remove mentions of listing myriad devices
* two final fixes
* Update ov_auto.py
---------
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* Remove warning suppression: wd4018, wd4309
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Remove linux warning suppression no-sign-compare
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* ov::intel_cpu::VectorDims base value type is size_t;
dnnl::memory::dims base value type is int64_t;
All compare data up to int64_t can fix warning and there is potential issue.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* channelAxis maybe == -1; means: no exist any more.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix recursive macro: "one_of", "everyone_is" sign-compare warning.
Must pass same value type.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix Windows sign unsign compare warning
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* There are 2 instances:
using ov::Dimension::value_type = int64_t
using ov::intel_cpu::StaticDimension::value_type = size_t
All up to int64.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* linux have too many sign-compare issue.
Complete windows sign-compare firstly.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix clang issues.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix warning.
Because instantiate T1=unsigned int, T2=int
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix warning for tests unit reorder_node_test.cpp
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix warning : ASSERT_GE(step, 1u);
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Fix tests: warning C4018
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Remove auto, using int64_t is more reasonable.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
---------
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Support setting and getting element type, shape and value in PyTorch FE InputModel
* Fix code style
* Fix code style
* Fix rsub layer test
* Fix py style
* Apply review feedback
* Fix code style
* Fix initial values of input and output flags in Place
* Enable AUTO to support execution mode hint.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add test case.
* Set default value "PERFORMANCE" for ov::hint::execution_mode.
* Update.
* Update.
* Correct default ov::hint::execution_mode value for the default value checking test case.
* Update.
* Delete obsolete config.hpp file.
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Init auto_batch plugin unit test
* Add more mock test
* Add to ci yml file
* Fix clang issue
* Resolve compilation issue
* Fix symbol multiple definition in static build
* Add test cases for AutoBatchInferRequest
* Add test cases for AutoBatchAsyncInferRequest
* qFixed build error after PR-15229
* Resolve blocked issue when call StartAsync test cases
* add more test for auto batch async inference
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
* AUTO cumulative throughput mode ignore candidate device that fail to load device
* Simplify the judgement logic of whether Auto set to Multi
* Add description about _AutoSetToMulti variable
* Update variable name to _AutoCallMulti
* Refine logic of AUTO execution_devices
* Add loading error massage
* Add test case
* Add filter to execution_devices of MULTI
* Add execution_devices test in load fail sitution
* Simplify the logic of execution_devices
* Update auto_executable_network.cpp
* Update src/plugins/auto/multi_executable_network.cpp
Co-authored-by: yanlan song <bell.song@intel.com>
* Update src/plugins/auto/auto_executable_network.cpp
Co-authored-by: yanlan song <bell.song@intel.com>
* Update test case
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: yanlan song <bell.song@intel.com>
* [GPU] Added shape agnostic optimized SoftMax kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update SoftmaxKernelBaseBF::Validate policy for shape agnostic kernel
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add softmax_gpu_bf shape agnostic TC for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Fix failed TCs for ie-tests-linux-ubuntu20-gpu
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Update to use stack array instead of global buffer
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Remove global buffer usage completely
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add #undef directive
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* enable streams info table based on CPU mapping
* add detail processor info for mix stream
* fix code style issue
* fix typo
* fix code style issue for Android build
* update description of streams info table
* move streams info related function to new file
* remove duplicated definition
* add description for parameters of get_streams_info_table()
* update test case file
* fix windows build issue
* fix windows build issue
* fix windows build issue
* fix typo
* update latency mode for hybrid platform
* update limit threads for latency
* update latency mode for 2 sockets platform
* Fixed legacy extensions passing to MO tool.
* Added tests.
* Corrected test.
* Add debug print.
* Moved tests to layer tests.
* Added comment.
* Moved legacy ext tests to separate file. Fixed tmp .pb file cleaning.
* Small correction.
* Run MO Python API tests directory in CI.
* Small fix.
* Fix for case of splitted output.
* Corrected imports.
* Corrected imports.
* Added run of legacy extensions tests from subprocess.
* Check if the device is supported when AUTO retrieves the device list based on the ov::device::priorities.
* Update the logic to handle the situation like -d AUTO:-CPU to benchmark APP.
* Remove MYRIAD and add NVIDIA for AUTO supported devices.
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* Fixed compilation with gcc-7
* Update src/core/reference/include/ngraph/runtime/reference/eye.hpp
Co-authored-by: Katarzyna Mitrus <katarzyna.mitrus@intel.com>
* returned f16 and bf16
---------
Co-authored-by: Katarzyna Mitrus <katarzyna.mitrus@intel.com>
* Move cpu streams executor to new API
* Remove legacy headers from new dev API
* Fixed build issues
* Fixed build
* Fixed typo
* Fixed typo
* Fixed build
* Fixed code style
* Add exception for template constructor of SoPtr
- by using ENABLE_CPU_SUBSET_TESTS_PATH cmake variable
one can specify list of relative paths to functional test
which will be included into target ov_cpu_func_tests_subset
Target was renamed from cpuDebugFuncTests to
ov_cpu_func_tests_subset
- by using ENABLE_CPU_SPECIFIC_TARGET_PER_TEST=ON one can
trigger generating specific target for each test file, i.e.
- ov_cpu_func_slt_convolution
- ov_cpu_func_subgraph_mha
* Moved template backend to new API
* Fixed compilation
* Fixed some comments
* Fixed ov_core_unit_tests
* Fixed some tests
* Fixed ONNX Frontend tests
* Fixed transformation tests
* Fixed dynamic tests
* Fixed sporadic in CPU tests
* Added WA for plugin
* Fixed copy_to for scalar tensors
* Add support for concatenation in Loop
* Apply suggestions from code review
* Fix win build
* Fix issues with propagation shapes and types in Loop
* Fix einsum
* Set type and shape of count in frontend
* Add Weights by ops
* Upgrade conformance tools
* api_conformance
* Change prefix
* Reorg meta info
* Chnage base algo
* fix all other
* return summary
* Update the report
* wa
* review
* find test case for MultiDeviceInferRequest::SetBlob
* improve line coverage of infer_request
* add test cases for queryState and exception test case for perf count
* fix querystate running fail
* add test case to memory_states.cpp
* rename name of test case
* add memory_states.cpp to CMakeLists.txt
* Use _LogTag to judge whether MULTI
* clang-format intel_gna/memory_states.cpp
* Modify the position of the macro ENABLE_INTEL_CPU in the test case
---------
Co-authored-by: Chen Peter <peter.chen@intel.com>
* support paddle slim
* fix scale shape issue in dequantize_linear
* fix node implicit construct failed in yolov5 and yolov7
* correct the round mode
* improve the accuracy of slim
* support paddle slim
* fix scale shape issue in dequantize_linear
* correct the round mode
* refactor some tests
* fix according to comments
* support zero_point and fallback round_mode
* Enable crop shape agnostic kernel
* Added unit test
* Added new scalar argument for crop (eltwise) for being used as runtime input offset in shape agnostic kernel
* Fix eltwise to have runtime offset only for crop
* Fix unittest error
* Applied review comment
* [GPU] Fix output format not changing at runtime
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Add remove_redundant_reorders pass TC for ov_gpu_unit_tests
Signed-off-by: Andrew Park <andrew.park@intel.com>
---------
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Review label and interval shape propagation for:
- space to batch
- space to depth
- shuffle channels
- depth to space
- batch to space
* Review template implementation of shape_infer for:
- space to batch
- space to depth
- shuffle channels
- depth to space
- batch to space
* Apply clang-format
* Update src/core/shape_inference/include/batch_to_space_shape_inference.hpp
Co-authored-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
* Update src/core/shape_inference/include/space_to_batch_shape_inference.hpp
Co-authored-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
* Shuffle channels remove label from channel dim
---------
Co-authored-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
* Moved Task, Streams, CPUStreams Executors to new API
* Fixed some build issues
* Fixed new build issues
* Try to fix tests
* Fixed inference unit tests
* Small build fix
* Added more system headers
* Try to fix naming style
* Fixed namespace
* Fixed android build
* [GPU] Apply multi-threads for async compilation context (#15683)
- Use CPUStreamExecutor in compilation context
- Use single compilation context, impl_cache and kernels_cache for multple streams
- Move compilation context to cldnn::program
- Move impl_cache to cldnn::program
- Create thread-safe impl_cache
- Create thread independent compilation function in kernels_cache
- Use kernels_cache in program and remove it from network
* [GPU] Fix segfault issue: ocl_engine and ocl_device are released during remained compilation context task are running (#15683)
- compilation context has own CPUStreamExecutor
* [GPU] Follow-up codereview (#15683)
- LruCacheThreadSafe inherit LruCache
- FuncRemoveItem has std::pair<Key,Value> as input
- Change prepare_tools to init_program
* [GPU] Create primitive_impl::build_kernels (#15683)
* [GPU] Fix unit test build error (#15683)
* [GPU] Remove redundant code (#15683)
- Remove try catch for debug
- Call compilation_context.cancel() in destructor of network
* [GPU] combine two atomic counter in kernels_cache (#15683)
* [GPU] Follow-up code review (#15683)
* [GPU] Fix nullptr exception in unit test (#15683)
* [GPU] Follow-up code review (#15683)
- Modify mutex lock in compilation context
* [GPU] Fix windows build issue (#15683)
* Solve test case failure issue for 32bits
1. ov_core_unit_test
2. ov_cpu_unit_test
Change-Id: I5e6afda0865fedc1de7fe84dd5f132e642263303
* Solve windows build issue
Change-Id: I1e6ea4d930c41322a73a701d566f0cdee2a4e098
* Disable several 64bit test cases in case of 32bit system
Change-Id: Ib8ef784953bf15cb42048dd905f17a85e52482b1
* Update a simple solution
Change-Id: Ie2e2cd369fe98bfcd26f3416bf36d4dfb0f24c25
* update for 64bits failure
Change-Id: I6571b7842a0fecc01fff169a21fa7aae9eb9da14
* Use OPENVINO_ARCH_64_BIT replace custom macro
Change-Id: I7e72b74aed8f0226513bc0e06ce2381322b42f71
* use kernel caching for dynamic models
* replaced cl_cache with blob
* updated to serialize dims info of input and output
* updated to skip unicode tests in Windows
* [TF FE] Support conversion of models with non-standard extensions in the path
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update tools/mo/unit_tests/moc_tf_fe/conversion_basic_models.py
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* C++ exception with description write lock_type thrown in the test body.
Use get_output_values_to_float()
* fusings_gpu/gemm_2in_act_scale_quantize_eltwise_i8.basic/2
* fusings_gpu/gemm_2in_act_scale_eltwise.basic/2
* Remove WA test code of [GPU][DG2] Fix fusings_gpu/gemm_2in_scale.basic/7 #15353
* Now non full-tensor post-ops are broadcasted
* Added some new tensor API
* Added tests on constructors
* Small changes
* Fixed tensor tests
* Fixed tests
* Added parametrized tests
* Extend tests and delete copy_to from remote tensor
* [GNA] Create ngraph implementation for relu_torch_pot model for further tests. Create legacy pass fusing FC-Eltwise-Const layers pattern into single FC layer with biases
* [GNA] Fix review comments, applied proper code style to changed code
* Add test for negative axes, preliminary solution to solve uncorrect
results
* Normalize axes in operation NormalizeL2
* Add test for negative axes
* Add EOF
* Update ov::hint::performance_hint UNDEFINED value from empty string to "UNDEFINED".
* Update benchmark Python version.
* Update.
* Update.
* Update.
* Update the description about hint setting within benchmark APP README and help message.
* Fix remote blob creation to use original shape
* Revert "Fix remote blob creation to use original shape"
This reverts commit 35c674aa97.
* Fix cldnn tensor adjusted blob to be reinterpreted with actual input layout
* gpu model caching unit tests
* added serialization unit tests
* added save and load for quantize primitive_inst
* reduced the range of inputs for Gemm tests
* updated the copyright year
* [Common][FE] Implement reverse infer for Transpose
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update src/common/transformations/tests/common_optimizations/reverse_shape_and_type_infer.cpp
* Update src/common/transformations/tests/common_optimizations/reverse_shape_and_type_infer.cpp
* Update src/common/transformations/src/transformations/common_optimizations/reverse_shape_and_type_infer.cpp
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* Add one more tests with constant order and known output
* Fix reverse infer for a case of know order and output shape
---------
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
* enable --compress_to_fp16 by default in MO
* corrected docs, added warning if user did't specify --compress_to_fp16 explicitly
* fix failing MO unit-tests
* do not wipe out data_type if user defined it explicitly by cli argument
* updated warning message and docs
* corrected phrasing
* corrected phrasing in FP16_Compression.md
* set compress_to_fp16=False for convert tests
* leftover: set compress_to_fp16=False for convert tests
* minor correction
* print info message in main.py, som minor changes
* typos fix
* fix losing information whether arguments set by user or got from defaults
* returned back default values instead of None
* more selective correcting of test_mo_convert_pytorch.py; added test for cases when compression is enabled/disabled or left by default
* fix test_mo_convert_pytorch.py
* optimize TensorIterator DynamicBuffer by preallocating a large chunk of intermediate buffer.
code clean.
review update: always copy in transfer as it is not worthy.
review update: update mem_holder_buffer as dnnl::memory instead of shared_ptr of it.
review update: reuse mem_buffer_holder even if the shape changes.
review update: growth factor.
review update: bug fix.
* fix code style
* review update: rewrite the dynamic buffer using the cpu Memory class, instead of dnnl::memory
* Update src/plugins/intel_cpu/src/nodes/tensoriterator.cpp
Co-authored-by: Maksim Kutakov <maxim.kutakov@gmail.com>
* Update src/plugins/intel_cpu/src/nodes/tensoriterator.cpp
Co-authored-by: Maksim Kutakov <maxim.kutakov@gmail.com>
* review update: minor fix
---------
Co-authored-by: Maksim Kutakov <maxim.kutakov@gmail.com>
* Use new evaluate method in template plugin
* Add tensor at the end of each iteration
* Remove class TemporaryOverrideOutputs
* Set shape of tensor after evaluate
* Revert "Remove class TemporaryOverrideOutputs"
This reverts commit e345ba9188.
* Update tensors when evaluate passed
* Copy data Tensor when HostTensor was initialized
* Set shape to output tensor in TemporaryOverrideOutputs
* Fix code style
* Add test
* Remove unused code
* Create reshape with scalar when shape is empty
* Reshape, special_zero = true
* Revert "Create reshape with scalar when shape is empty"
This reverts commit 0f901f419a.
* Use Shape with size zero and value max_int for dynamic tensors
* Restore Shape{0} for dynamic tensors
* Revert "Restore Shape{0} for dynamic tensors"
This reverts commit cb2d0e58eb.
* Temporary remove the test
* Use shape{0} for dynamic tensors
* Revert "Use shape{0} for dynamic tensors"
This reverts commit 08460a486b.
* Use Shape{0} for dynamic tensors
* Use new evaluate in template plugin
- Add tensor conversion between ov::Tensor <-> HostTensor
- Add shape utils to create special case shape to be dynamic shape
- Utils are in dev API to remove duplicates
* Move WA for set shape into the ov::tensor.
* Remove dynamic shape from or_tensor helper
* Mark tensor conversion utils as deprecated
- move shape util as core internal only
- update transpose test to not use deprecated functions
* Add missing deprecate suppression macro
---------
Co-authored-by: Artur Kulikowski <artur.kulikowski@intel.com>
* Add CC support for ir reader
Change-Id: I3e1c02222800be090a4307bff8c231ad28b23ff7
* Fix clang issue
Change-Id: Idaf7bc5632bd558cfb7b0ecd8891435e5ba5c6ca
It turned out that NormalizeL2 is absent in tf.raw_ops api
and always presented in the decomposed form.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Adds base class and first test for tflite_layer tests
* adds layer tests for unary ops
* adds functionality to get tensors from ops
* 1. adds functionality to use custom funcs for input generation
2. removed UNIQUE op from testing ops
* adds functionality to use custom dtypes
* Cast operation support
* Enhanced tfl layer tests
* Cast operation support
* Transpose Sinking: fix dummy case
* Supported 3 more ops: L2_NORMALIZATION, ARG_MAX, ARG_MIN
* Support scalar shapes
* Supported 1 more op: TRANSPOSE_CONV
* Supported 2 more ops: COMPLEX_ABS, RFFT2D (in combination)
* (DE)QUANTIZE as Identity. Questionable
* Trigger tfl layer tests in .ci
* Apply suggestions from code review
* empty constant support
* Commit as-is. Debug prints inside
* Not ready yet
* Style
* Comments resolved
* Style
* Dynamic shape support
* Style
---------
Co-authored-by: rnugmano <ruslan.nugmanov@intel.com>
Co-authored-by: missjane <estepyreva@gmail.com>
* Disable set_property() to support ov::device::properties setting.
* Update benchmark APP to set device properties through compile_model() instead of through set_property().
* Update.
* Update.
* Update some test case including ov::device::properties setting via core.ser_property().
* Since core.set_property() didn't support ov::device::properties setting, just remove the test case to check compile_model() works well if setting ov::device::properties via core.set_property() first.
* Update CompileModel in test name to CompiledModel
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* Add corresponding test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
* Update.
* Remove the changes of this commit as this modification has nothing to do
with this PR.
This reverts commit 4f04b9f085.
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* add test case for device_bind_buffer
* Correct path to header file properties.hpp
* rename remote blob testcase with multi
* add test case for remote blob and device bind buffer
* add logs for debug
* disable test case RemoteBlobInitializedWithoutGPU
* add property for remote blob test case
* remove debug logs for bind_multi_schedule.cpp
* fix MultiDeviceMultipleGPU_Test fail
* add test case for oversubsciption of infer requests
* get optimal number to create inferRequests
* using macro ENABLE_INTEL_CPU to make sure tests need CPU
* fix the issue that canCreateRemoteTensorThenInferWithAffinity test case fails to run
* remove ov::hint::PerformanceMode::UNDEFINED from MultiDeviceMultipleGPU_Test
* add test case for get perf_hint from GetMetric
* Increase Mock GetMetric test sleep time
* add mock test case for getMetric
* add new test case OVAutoExecutableNetworkTest
* convert ov::Any to ov::hint::Priority
* resolve conflict of get_metric.hpp
* add macro ENABLE_INTEL_CPU for gpu test case and fix cases not getting instantiated for cpu test
* fix the issue of running Mock GetMetric test cases fail
* add perf_hint test cases to properties_tests.cpp
* Modify the logic of judging whether it is a single device in ctput mode
* Enable AUTO compiledModel::get_property supporting its properties only.
* Update.
* Update.
* Update some releated test cases.
* Update.
* Update related test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
---------
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
2023-02-17 11:27:51 +08:00
5444 changed files with 329722 additions and 120148 deletions
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
steps:
- task:UsePythonVersion@0
inputs:
versionSpec:'$(OV_PYTHON_VERSION)'# Setting only major & minor version will download latest release from GH repo example 3.10 will be 3.10.10.
versionSpec:'$(OV_PYTHON_VERSION)'# Setting only major & minor version will download latest release from GH repo example 3.10 will be 3.10.10.
addToPath:true
disableDownloadFromRegistry:false
architecture:'x64'
githubToken:$(auth_token)
displayName:Setup Python 3.10
displayName:Setup Python 3.11
name:setupPython
- bash:|
#!/bin/bash
@@ -151,13 +151,11 @@ jobs:
- checkout:self
clean:'true'
fetchDepth:'1'
submodules:'true'
path:openvino
- checkout:openvino_contrib
clean:'true'
fetchDepth:'1'
submodules:'true'
path:openvino_contrib
@@ -165,7 +163,7 @@ jobs:
set -e
sudo -E $(REPO_DIR)/install_build_dependencies.sh
# Move jdk into contrib
# 'clang' compiler is to check that samples can be built using it
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
steps:
- task:UsePythonVersion@0
@@ -43,7 +43,7 @@ jobs:
disableDownloadFromRegistry:false
architecture:'x64'
githubToken:$(auth_token)
displayName:Setup Python 3.10
displayName:Setup Python 3.11
name:setupPython
- bash:|
#!/bin/bash
@@ -82,13 +82,11 @@ jobs:
- checkout:self
clean:'true'
fetchDepth:'1'
submodules:'true'
path:openvino
- checkout:openvino_contrib
clean:'true'
fetchDepth:'1'
submodules:'true'
path:openvino_contrib
@@ -108,10 +106,9 @@ jobs:
inputs:
# Coverity has too many PARSE_ERROR errors with ENABLE_FASTER_BUILD=ON. Disabling FASTER_BUILD.
OV_PYTHON_VERSION:3.10.10# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
OV_PYTHON_VERSION:3.11.2# Full version of Python its required for LD_LIBRARY_PATH. More details https://github.com/microsoft/azure-pipelines-tool-lib/blob/master/docs/overview.md#tool-cache
We welcome community contributions to OpenVINO™. Please read the following guide to learn how to find ideas for contribution, practices for good pull requests, checking your changes with our tests and more.
## How to contribute to the OpenVINO project
OpenVINO™ is always looking for opportunities to improve and your contributions
play a big role in this process. There are several ways you can make the
product better:
## Before you start contributing you should
### Provide Feedback
- Make sure you agree to contribute your code under [OpenVINO™ (Apache 2.0)](https://github.com/openvinotoolkit/openvino/blob/master/LICENSE) license.
- Figure out what you’re going to contribute. If you don’t know what you are going to work on, navigate to the [Github "Issues" tab](https://github.com/openvinotoolkit/openvino/issues). Make sure that there isn't someone working on it. In the latter case you might provide support or suggestion in the issue or in the linked pull request.
- If you are going to fix a bug, check that it's still exists in the latest release. This can be done by building the latest master branch, and make sure that the error is still reproducible there. We do not fix bugs that only affect older non-LTS releases like 2020.2 for example (more details about [branching strategy](https://github.com/openvinotoolkit/openvino/wiki/Branches)).
* **Report bugs / issues**
If you experience faulty behavior in OpenVINO or its components, you can
[create a new issue](https://github.com/openvinotoolkit/openvino/issues)
in the GitHub issue tracker.
* **Propose new features / improvements**
If you have a suggestion for improving OpenVINO or want to share your ideas, you can open a new
* **User documentation** is built from several sources and published at
[docs.openvino.ai](docs.openvino.ai), which is the recommended place for reading
these documents. Use the files maintained in this repository only for editing purposes.
* The easiest way to help with documentation is to review it and provide feedback on the
existing articles. Whether you notice a mistake, see the possibility of improving the text,
or think more information should be added, you can reach out to any of the documentation
contributors to discuss the potential changes.
You can also create a Pull Request directly, following the [editor's guide](./docs/CONTRIBUTING_DOCS.md).
## "Fork & Pull Request model" for code contribution
### Promote and Support OpenVINO
### [](https://github.com/openvinotoolkit/openvino/blob/master/CONTRIBUTING.md#the-instruction-in-brief)The instruction in brief
* **Popularize OpenVINO**
Articles, tutorials, blog posts, demos, videos, and any other involvement
in the OpenVINO community is always a welcome contribution. If you discuss
or present OpenVINO on various social platforms, you are raising awareness
of the product among A.I. enthusiasts and enabling other people to discover
the toolkit. Feel free to reach out to OpenVINO developers if you need help
with making such community-based content.
- Register at GitHub. Create your fork of OpenVINO™ repository [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) (see [https://help.github.com/articles/fork-a-repo](https://help.github.com/articles/fork-a-repo) for details).
- Install Git.
- Set your user name and email address in a Git configuration according to GitHub account (see [https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup) for details).
- Choose a task for yourself. It could be a bugfix or some new code.
- Choose a base branch for your work. More details about branches and policies are here: [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches)
- Clone your fork to your computer.
- Create a new branch (with a meaningful name) from the base branch you chose.
- Modify / add the code following our [Coding Style Guide](./docs/dev/coding_style.md).
- If you want to add a new sample, please look at this [Guide for contributing to C++/C/Python IE samples](https://github.com/openvinotoolkit/openvino/wiki/SampleContribute)
- If you want to contribute to the documentation and want to add a new guide, follow that instruction [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation)
- Run testsuite locally:
- execute each test binary from the artifacts directory, e.g. `<source dir>/bin/intel64/Release/ieFuncTests`
- When you are done, make sure that your branch is to date with latest state of the branch you want to contribute to (e.g. `git fetch upstream && git merge upstream/master`), push your branch to your GitHub fork; then create a pull request from your branch to the base branch (see [https://help.github.com/articles/using-pull-requests](https://help.github.com/articles/using-pull-requests) for details).
## Making a good pull request
Following these guidelines will increase the likelihood of your pull request being accepted:
- One PR – one issue.
- Build perfectly on your local system.
- Choose the right base branch [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches).
- Follow the [Coding Style Guide](./docs/dev/coding_style.md) for your code.
- Update documentation using [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation) if needed.
- Cover your changes with test.
- Add license at the top of new files [C++ example](https://github.com/openvinotoolkit/openvino/blob/master/samples/cpp/classification_sample_async/main.cpp#L1-L2), [Python example](https://github.com/openvinotoolkit/openvino/blob/master/samples/python/hello_classification/hello_classification.py#L3-L4).
- Add enough information: a meaningful title, the reason why you made the commit and a link to the issue page if exists.
- Remove unrelated to PR changes.
- If it is still WIP and you want to check CI test results early then use _Draft_ PR.
- Submit your PR and become an OpenVINO™ contributor!
* **Help Other Community Members**
If you are an experienced OpenVINO user and want to help, you can always
share your expertise with the community. Check GitHub Discussions and
Issues to see if you can help someone.
## Testing and merging pull requests
## License
Your pull request will be automatically tested by OpenVINO™'s precommit (testing status are automatically reported as "green" or "red" circles in precommit steps on PR's page). If any builders have failed, you need fix the issue. To rerun the automatic builds just push changes to your branch on GitHub. No need to close pull request and open a new one!
## Merging PR
When the reviewer accepts the pull request and the pre-commit shows a "green" status, the review status is set to "Approved", which signals to the OpenVINO™ maintainers that they can merge your pull request.
By contributing to the OpenVINO project, you agree that your contributions will be
licensed under the terms stated in the [LICENSE](./LICENSE.md) file.
OpenVINO documentation is built using Sphinx and the reStructuredText formatting.
That means the basic formatting rules need to be used:
### White Spaces
OpenVINO documentation is developed to be easily readable in both html and
reStructuredText. Here are some suggestions on how to make it render nicely
and improve document clarity.
### Headings (including the article title)
They are made by "underscoring" text with punctuation marks (at least as
many marks as letters in the underscored header). We use the following convention:
```
H1
====================
H2
####################
H3
++++++++++++++++++++
H4
--------------------
H5
....................
```
### Line length
In programming, a limit of 80 characters per line is a common BKM. It may also apply
to reading natural languages fairly well. For this reason, we aim at lines of around
70 to 100 characters long. The limit is not a strict rule but rather a guideline to
follow in most cases. The breaks will not translate to html, and rightly so, but will
make reading and editing documents in GitHub or an editor much easier.
### Tables
Tables may be difficult to implement well in websites. For example, longer portions
of text, like descriptions, may render them difficult to read (e.g. improper cell
widths or heights). Complex tables may also be difficult to read in source files.
To prevent that, check the [table directive documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#table-directives)
and see our custom directives. Use the following guidelines for easier editing:
* For very big and complex data sets: use a list instead of a table or remove
the problematic content from the table and implement it differently.
* For very big and complex data sets that need to use tables: use an external
file (e.g. PDF) and link to it.
* For medium tables that look bad in source (e.g. due to long lines of text),
use the reStructuredText list table format.
* For medium and small tables, use the reStructuredText grid or simple table formats.
## Cross-linking
There are several directives Sphinx uses for linking, each has its purpose and format.
Follow these guidelines for consistent results:
* Avoid absolute references to internal documents as much as possible (link to source, not html).
* Note that sphinx uses the "back-tick" character and not the "inverted-comma" => ` vs. '
* When a file path starts at the same directory is used, put "./" at its beginning.
* Always add a space before the opening angle bracket ("<") for target files.
Use the following formatting for different links:
* link to an external page / file
* `` `text <url>`__ ``
* use a double underscore for consistency
* link to an internal documentation page / file
* `` :doc:`a docs page <relativefilepath>` ``
* Link to an rst or md file within our documentation, so that it renders properly in html
* link to a header on the same page
* `` 'a header in the same article <this-is-section-header-title>`__ ``
* anchors are created automatically for all existing headers
* such anchor looks like the header, with minor adjustments:
* all letters are lower case,
* remove all special glyphs, like brackets,
* replace spaces with hyphens
* Create an anchor in an article
* `` .. _anchor-in-the target-article:: ``
* put it before the header to which you want to link
* See the rules for naming anchors / labels at the bottom of this article
* link to an anchor on a different page in our documentation
* `` :ref:`the created anchor <anchor-in-thetarget-article>` ``
* link to the anchor using just its name
* anchors / labels
Read about anchors
Sphinx uses labels to create html anchors, which can be linked to from anywhere in documentation.
Although they may be put at the top of any article to make linking to it very easy, we do not use
this approach. Every label definition starts with an underscore, the underscore is not used in links.
Most importantly, every label needs to be globally unique. It means that it is always a good
practice to start their labels with a clear identifier of the article they reside in.
<td>Auto batch plugin performs on-the-fly automatic batching (i.e. grouping inference requests together) to improve device utilization, with no programming effort from the user</td>
See the [OpenVINO Wiki](https://github.com/openvinotoolkit/openvino/wiki#how-to-build) to get more information about the OpenVINO build process.
See [How to build OpenVINO](./docs/dev/build.md) to get more information about the OpenVINO build process.
## How to contribute
@@ -188,7 +189,6 @@ Report questions, issues and suggestions, using:
* [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf) - a suite of advanced algorithms for model inference optimization including quantization, filter pruning, binarization and sparsity
* [OpenVINO™ Training Extensions (OTE)](https://github.com/openvinotoolkit/training_extensions) - convenient environment to train Deep Learning models and convert them using OpenVINO for optimized inference.
* [OpenVINO™ Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) - a scalable, high-performance solution for serving deep learning models optimized for Intel architectures
* [DL Workbench](https://docs.openvino.ai/nightly/workbench_docs_Workbench_DG_Introduction.html) - an alternative, web-based version of OpenVINO designed to facilitate optimization and compression of pre-trained deep learning models.
* [Computer Vision Annotation Tool (CVAT)](https://github.com/opencv/cvat) - an online, interactive video and image annotation tool for computer vision purposes.
* [Dataset Management Framework (Datumaro)](https://github.com/openvinotoolkit/datumaro) - a framework and CLI tool to build, transform, and analyze datasets.
@@ -196,7 +196,7 @@ Report questions, issues and suggestions, using:
\* Other names and brands may be claimed as the property of others.
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
message(FATAL_ERROR"Internal error: apiValidator is found (${ONECORE_API_VALIDATOR}), but UniversalDDIs.xml file has not been found for ${wdk_platform} platform")
# Mute -fsanitize=function Indirect call of a function through a function pointer of the wrong type.
# Sample cases:
# call to function GetAPIVersion through pointer to incorrect function type 'void *(*)()'
# call to function get_api_version through pointer to incorrect function type 'void *(*)()'
# Mute -fsanitize=alignment Use of a misaligned pointer or creation of a misaligned reference. Also sanitizes assume_aligned-like attributes.
# Sample cases:
# VPU_FixedMaxHeapTest.DefaultConstructor test case load of misaligned address 0x62000000187f for type 'const DataType', which requires 4 byte alignment
# Running and Deploying Inference {#openvino_docs_deployment_guide_introduction}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
Run and Deploy Locally <openvino_deployment_guide>
Deploy via Model Serving <ovms_what_is_openvino_model_server>
@endsphinxdirective
Once you have a model that meets both OpenVINO™ and your requirements, you can choose how to deploy it with your application.
@sphinxdirective
.. panels::
:doc:`Deploy via OpenVINO Runtime <openvino_deployment_guide>`
^^^^^^^^^^^^^^
Local deployment uses OpenVINO Runtime that is called from, and linked to, the application directly.
It utilizes resources available to the system and provides the quickest way of launching inference.
---
:doc:`Deploy via Model Server <ovms_what_is_openvino_model_server>`
^^^^^^^^^^^^^^
Deployment via OpenVINO Model Server allows the application to connect to the inference server set up remotely.
This way inference can use external resources instead of those available to the application itself.
@endsphinxdirective
Apart from the default deployment options, you may also [deploy your application for the TensorFlow framework with OpenVINO Integration](./openvino_ecosystem_ovtf.md).
OpenVINO Runtime offers multiple inference modes to allow optimum hardware utilization under different conditions. The most basic one is a single-device mode, which defines just one device responsible for the entire inference workload. It supports a range of Intel hardware by means of plugins embedded in the Runtime library, each set up to offer the best possible performance. For a complete list of supported devices and instructions on how to use them, refer to the [guide on inference devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md).
OpenVINO Runtime offers multiple inference modes to allow optimum hardware utilization under different conditions. The most basic one is a single-device mode, which defines just one device responsible for the entire inference workload. It supports a range of Intel hardware by means of plugins embedded in the Runtime library, each set up to offer the best possible performance. For a complete list of supported devices and instructions on how to use them, refer to the :doc:`guide on inference devices <openvino_docs_OV_UG_Working_with_devices>`.
The remaining modes assume certain levels of automation in selecting devices for inference. Using them in the deployed solution may potentially increase its performance and portability. The automated modes are:
Every deep learning workflow begins with obtaining a model. You can choose to prepare a custom one, use a ready-made solution and adjust it to your needs, or even download and run a pre-trained network from an online database, such as `TensorFlow Hub <https://tfhub.dev/>`__, `Hugging Face <https://huggingface.co/>`__, or `Torchvision models <https://pytorch.org/hub/>`__.
Import a model using ``read_model()``
#################################################
Model files (not Python objects) from :doc:`ONNX, PaddlePaddle, TensorFlow and TensorFlow Lite <Supported_Model_Formats>` (check :doc:`TensorFlow Frontend Capabilities and Limitations <openvino_docs_MO_DG_TensorFlow_Frontend>`) do not require a separate step for model conversion, that is ``mo.convert_model``.
The ``read_model()`` method reads a model from a file and produces `openvino.runtime.Model <api/ie_python_api/_autosummary/openvino.runtime.Model.html>`__. If the file is in one of the supported original framework :doc:`file formats <Supported_Model_Formats>`, the method runs internal conversion to an OpenVINO model format. If the file is already in the :doc:`OpenVINO IR format <openvino_ir>`, it is read "as-is", without any conversion involved.
You can also convert a model from original framework to `openvino.runtime.Model <api/ie_python_api/_autosummary/openvino.runtime.Model.html>`__ using ``convert_model()`` method. More details about ``convert_model()`` are provided in :doc:`model conversion guide <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` .
``ov.Model`` can be serialized to IR using the ``ov.serialize()`` method. The serialized IR can be further optimized using :doc:`Neural Network Compression Framework (NNCF) <ptq_introduction>` that applies post-training quantization methods.
.. note::
``convert_model()`` also allows you to perform input/output cut, add pre-processing or add custom Python conversion extensions.
Convert a model with Python using ``mo.convert_model()``
Model conversion API, specifically, the ``mo.convert_model()`` method converts a model from original framework to ``ov.Model``. ``mo.convert_model()`` returns ``ov.Model`` object in memory so the ``read_model()`` method is not required. The resulting ``ov.Model`` can be inferred in the same training environment (python script or Jupiter Notebook). ``mo.convert_model()`` provides a convenient way to quickly switch from framework-based code to OpenVINO-based code in your inference application.
In addition to model files, ``mo.convert_model()`` can take OpenVINO extension objects constructed directly in Python for easier conversion of operations that are not supported in OpenVINO. The ``mo.convert_model()`` method also has a set of parameters to :doc:`cut the model <openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model>`, :doc:`set input shapes or layout <openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model>`, :doc:`add preprocessing <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`, etc.
The figure below illustrates the typical workflow for deploying a trained deep learning model, where IR is a pair of files describing the model:
* ``.xml`` - Describes the network topology.
* ``.bin`` - Contains the weights and biases binary data.
Every deep learning workflow begins with obtaining a model. You can choose to prepare a custom one, use a ready-made solution and adjust it to your needs, or even download and run a pre-trained network from an online database, such as OpenVINO's [Open Model Zoo](../model_zoo.md).
Convert a model using ``mo`` command-line tool
#################################################
[OpenVINO™ supports several model formats](../MO_DG/prepare_model/convert_model/supported_model_formats.md) and allows to convert them to it's own, OpenVINO IR, providing a tool dedicated to this task.
Another option to convert a model is to use ``mo`` command-line tool. ``mo`` is a cross-platform tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices in the same measure, as the ``mo.convert_model()`` method.
[Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) reads the original model and creates the OpenVINO IR model (.xml and .bin files) so that inference can ultimately be performed without delays due to format conversion. Optionally, Model Optimizer can adjust the model to be more suitable for inference, for example, by [alternating input shapes](../MO_DG/prepare_model/convert_model/Converting_Model.md), [embedding preprocessing](../MO_DG/prepare_model/Additional_Optimizations.md) and [cutting training parts off](../MO_DG/prepare_model/convert_model/Cutting_Model.md).
``mo`` requires the use of a pre-trained deep learning model in one of the supported formats: TensorFlow, TensorFlow Lite, PaddlePaddle, or ONNX. ``mo`` converts the model to the OpenVINO Intermediate Representation format (IR), which needs to be read with the ``ov.read_model()`` method. Then, you can compile and infer the ``ov.Model`` later with :doc:`OpenVINO™ Runtime <openvino_docs_OV_UG_OV_Runtime_User_Guide>`.
The approach to fully convert a model is considered the default choice, as it allows the full extent of OpenVINO features. The OpenVINO IR model format is used by other conversion and preparation tools, such as the Post-Training Optimization Tool, for further optimization of the converted model.
Conversion is not required for ONNX and PaddlePaddle models, as OpenVINO provides C++ and Python APIs for importing them to OpenVINO Runtime directly. It provides a convenient way to quickly switch from framework-based code to OpenVINO-based code in your inference application.
The results of both ``mo`` and ``mo.convert_model()`` conversion methods described above are the same. You can choose one of them, depending on what is most convenient for you. Keep in mind that there should not be any differences in the results of model conversion if the same set of parameters is used.
This section describes how to obtain and prepare your model for work with OpenVINO to get the best inference results:
* [See the supported formats and how to use them in your project](../MO_DG/prepare_model/convert_model/supported_model_formats.md)
* [Convert different model formats to the OpenVINO IR format](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
* [Automate model-related tasks with Model Downloader and additional OMZ Tools](https://docs.openvino.ai/latest/omz_tools_downloader.html).
To begin with, you may want to [browse a database of models for use in your projects](../model_zoo.md).
* :doc:`See the supported formats and how to use them in your project <Supported_Model_Formats>`.
* :doc:`Convert different model formats to the ov.Model format <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>`.
OpenVINO™ is not just one tool. It is an expansive ecosystem of utilities, providing a comprehensive workflow for deep learning solution development. Learn more about each of them to reach the full potential of OpenVINO™ Toolkit.
### Neural Network Compression Framework (NNCF)
**Neural Network Compression Framework (NNCF)**
A suite of advanced algorithms for Neural Network inference optimization with minimal accuracy drop. NNCF applies quantization, filter pruning, binarization and sparsity algorithms to PyTorch and TensorFlow models during training.
A solution empowering TensorFlow developers with OpenVINO's optimization capabilities. With just two lines of code in your application, you can offload inference to OpenVINO, while keeping the TensorFlow API.
A streaming media analytics framework, based on the GStreamer multimedia framework, for creating complex media analytics pipelines.
More resources:
* [documentation on GitHub](https://dlstreamer.github.io/index.html)
* [installation Guide on GitHub](https://github.com/openvinotoolkit/dlstreamer_gst/wiki/Install-Guide)
### DL Workbench
A web-based tool for deploying deep learning models. Built on the core of OpenVINO and equipped with a graphics user interface, DL Workbench is a great way to explore the possibilities of the OpenVINO workflow, import, analyze, optimize, and build your pre-trained models. You can do all that by visiting [Intel® DevCloud for the Edge](https://software.intel.com/content/www/us/en/develop/tools/devcloud.html) and launching DL Workbench on-line.
To learn which device supports the import / export functionality, see the :doc:`feature support matrix <openvino_docs_OV_UG_Working_with_devices>`.
For more details on preprocessing steps, refer to the :doc:`Optimize Preprocessing <openvino_docs_OV_UG_Preprocessing_Overview>`. To compile the model with advanced preprocessing capabilities, refer to the :doc:`Use Case - Integrate and Save Preprocessing Steps Into OpenVINO IR <openvino_docs_OV_UG_Preprocess_Usecase_save>`, which shows how to have all the preprocessing in the compiled blob.
**DL Workbench**
A web-based tool for deploying deep learning models. Built on the core of OpenVINO and equipped with a graphics user interface, DL Workbench is a great way to explore the possibilities of the OpenVINO workflow, import, analyze, optimize, and build your pre-trained models. You can do all that by visiting `Intel® Developer Cloud <https://software.intel.com/content/www/us/en/develop/tools/devcloud.html>`__ and launching DL Workbench online.
**OpenVINO™ integration with TensorFlow (OVTF)**
OpenVINO™ Integration with TensorFlow will no longer be supported as of OpenVINO release 2023.0.As part of the 2023.0 release, OpenVINO will feature a significantly enhanced TensorFlow user experience within native OpenVINO without needing offline model conversions. :doc:`Learn more <openvino_docs_MO_DG_TensorFlow_Frontend>`.
# OpenVINO™ integration with TensorFlow {#ovtf_integration}
**OpenVINO™ integration with TensorFlow** is a solution for TensorFlow developers who want to get started with OpenVINO™ in their inferencing applications. By adding just two lines of code you can now take advantage of OpenVINO™ toolkit optimizations with TensorFlow inference applications across a range of Intel® computation devices.
This is all you need:
```bash
import openvino_tensorflow
openvino_tensorflow.set_backend('<backend_name>')
```
**OpenVINO™ integration with TensorFlow** accelerates inference across many AI models on a variety of Intel® technologies, such as:
- Intel® CPUs
- Intel® integrated GPUs
> **NOTE**: For maximum performance, efficiency, tooling customization, and hardware control, we recommend developers to adopt native OpenVINO™ solutions.
To find out more about the product itself, as well as learn how to use it in your project, check its dedicated [GitHub repository](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/docs).
To see what you can do with **OpenVINO™ integration with TensorFlow**, explore the demos located in the [examples folder](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/examples) in our GitHub repository.
Sample tutorials are also hosted on [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/build/ovtfoverview.html). The demo applications are implemented using Jupyter Notebooks. You can interactively execute them on Intel® DevCloud nodes, compare the results of **OpenVINO™ integration with TensorFlow**, native TensorFlow, and OpenVINO™.
## License
**OpenVINO™ integration with TensorFlow** is licensed under [Apache License Version 2.0](https://github.com/openvinotoolkit/openvino_tensorflow/blob/master/LICENSE).
By contributing to the project, you agree to the license and copyright terms therein
and release your contribution under these terms.
## Support
Submit your questions, feature requests and bug reports via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
## How to Contribute
We welcome community contributions to **OpenVINO™ integration with TensorFlow**. If you have an idea for improvement:
* Share your proposal via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
* Submit a [pull request](https://github.com/openvinotoolkit/openvino_tensorflow/pulls).
We will review your contribution as soon as possible. If any additional fixes or modifications are necessary, we will guide you and provide feedback. Before you make your contribution, make sure you can build **OpenVINO™ integration with TensorFlow** and run all the examples with your fix/patch. If you want to introduce a large feature, create test cases for your feature. Upon our verification of your pull request, we will merge it to the repository provided that the pull request has met the above mentioned requirements and proved acceptable.
---
\* Other names and brands may be claimed as the property of others.
# OpenVINO™ Training Extensions {#ote_documentation}
@sphinxdirective
.. meta::
:description: OpenVINO™ Training Extensions include advanced algorithms used
to create, train and convert deep learning models with OpenVINO
Toolkit for optimized inference.
OpenVINO™ Training Extensions provide a suite of advanced algorithms to train
Deep Learning models and convert them using the `OpenVINO™
toolkit <https://software.intel.com/en-us/openvino-toolkit>`__ for optimized
inference. It allows you to export and convert the models to the needed format. OpenVINO Training Extensions independently create and train the model. It is open-sourced and available on `GitHub <https://github.com/openvinotoolkit/training_extensions>`__. Read the OpenVINO Training Extensions `documentation <https://openvinotoolkit.github.io/training_extensions/stable/guide/get_started/introduction.html>`__ to learn more.
1. To start working with OpenVINO Training Extensions, prepare and annotate your dataset. For example, on CVAT.
2. OpenVINO Training Extensions train the model, using training interface, and evaluate the model quality on your dataset, using evaluation and inference interfaces.
.. note::
Prepare a separate dataset or split the dataset you have for more accurate quality evaluation.
3. Having successful evaluation results received, you have an opportunity to deploy your model or continue optimizing it, using NNCF. For more information about these frameworks, go to :doc:`Optimization Guide <openvino_docs_model_optimization_guide>`.
If the results are unsatisfactory, add datasets and perform the same steps, starting with dataset annotation.
OpenVINO Training Extensions Components
#######################################
*`OpenVINO Training Extensions API <https://github.com/openvinotoolkit/training_extensions/tree/develop/otx/api>`__
*`OpenVINO Training Extensions CLI <https://github.com/openvinotoolkit/training_extensions/tree/develop/otx/cli>`__
*`OpenVINO Training Extensions Algorithms <https://github.com/openvinotoolkit/training_extensions/tree/develop/otx/algorithms>`__
| With Model Downloader and Model Optimizer guides, you will learn to download pre-trained models and convert them for use with OpenVINO™. You can use your own models or choose some from a broad selection provided in the Open Model Zoo.
| With model conversion API guide, you will learn to convert pre-trained models for use with OpenVINO™. You can use your own models or choose some from a broad selection in online databases, such as `TensorFlow Hub <https://tfhub.dev/>`__, `Hugging Face <https://huggingface.co/>`__, `Torchvision models <https://pytorch.org/hub/>`__..
| :doc:`Model Optimization and Compression <openvino_docs_model_optimization_guide>`
| In this section you will find out how to optimize a model to achieve better inference performance. It describes multiple optimization methods for both the training and post-training stages.
| This section explains the process of deploying your own inference application using either OpenVINO Runtime or OpenVINO Model Server. It describes how to run inference which is the most basic form of deployment and the quickest way of launching inference.
# How to Implement Custom GPU Operations {#openvino_docs_Extensibility_UG_GPU}
@sphinxdirective
.. meta::
:description: Learn the details of custom kernel support for the GPU device to
enable operations not supported by OpenVINO.
To enable operations not supported by OpenVINO™ out of the box, you may need an extension for OpenVINO operation set, and a custom kernel for the device you will target. This article describes custom kernel support for the GPU device.
The GPU codepath abstracts many details about OpenCL. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
There are two options for using the custom operation configuration file:
* Include a section with your kernels into the automatically-loaded `<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file.
* Call the `ov::Core::set_property()` method from your application with the `"CONFIG_FILE"` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
@sphinxtabset
* Include a section with your kernels into the automatically-loaded ``<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml`` file.
* Call the ``:ref:`ov::Core::set_property()<doxid-classov_1_1_core_1aa953cb0a1601dbc9a34ef6ba82b8476e>``` method from your application with the ``"CONFIG_FILE"`` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
All OpenVINO samples, except the trivial `hello_classification`, and most Open Model Zoo demos
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
## Configuration File Format <a name="config-file-format"></a>
All OpenVINO samples, except the trivial ``hello_classification``, and most Open Model Zoo demos
feature a dedicated command-line option ``-c`` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
The configuration file is expected to follow the `.xml` file structure
with a node of the type `CustomLayer` for every custom operation you provide.
The ``Kernel`` node contains all kernel source code configuration.
The `Source` node points to a single OpenCL source file.
**Sub-nodes**: ``Source`` (1+), ``Define`` (0+)
| Attribute Name | \# |Description|
|-----|-----|-----|
| `filename` | (1) | Name of the file containing OpenCL source code. The path is relative to your executable. Multiple source nodes will have their sources concatenated in order. |
Source Node and Sub-Node Structure
++++++++++++++++++++++++++++++++++
The ``Source`` node points to a single OpenCL source file.
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``filename``
- (1)
- Name of the file containing OpenCL source code. The path is relative to your executable. Multiple source nodes will have their sources concatenated in order.
**Sub-nodes**: None
### Define Node and Sub-Node Structure
Define Node and Sub-Node Structure
++++++++++++++++++++++++++++++++++
The `Define` node configures a single `#‍define` instruction to be added to
The ``Define`` node configures a single ``#define`` instruction to be added to
the sources during compilation (JIT).
| Attribute Name | \# | Description |
|------|-------|------|
| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well, which is taken as a string. |
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. |
| `default` | (0/1) | The default value to be used if the specified parameters are missing from the operation in the OpenVINO IR. |
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``name``
- (1)
- The name of the defined JIT. For static constants, this can include the value as well, which is taken as a string.
* - ``param``
- (0/1)
- This parameter value is used as the value of this JIT definition.
* - ``type``
- (0/1)
- The parameter type. Accepted values: ``int`` , ``float`` , and ``int[]`` , ``float[]`` for arrays.
* - ``default``
- (0/1)
- The default value to be used if the specified parameters are missing from the operation in the OpenVINO IR.
**Sub-nodes:** None
The resulting JIT has the following form:
`#‍define [name] [type] [value/default]`.
``#define [name] [type] [value/default]``.
### Buffers Node and Sub-Node Structure
Buffers Node and Sub-Node Structure
+++++++++++++++++++++++++++++++++++
The `Buffers` node configures all input/output buffers for the OpenCL entry
The ``Buffers`` node configures all input/output buffers for the OpenCL entry
function. No buffers node structure exists.
**Sub-nodes:** `Data` (0+), `Tensor` (1+)
**Sub-nodes:** ``Data`` (0+), ``Tensor`` (1+)
### Data Node and Sub-Node Structure
Data Node and Sub-Node Structure
++++++++++++++++++++++++++++++++
The `Data` node configures a single input with static data, for example,
The ``Data`` node configures a single input with static data, for example,
weights or biases.
| Attribute Name | \# | Description |
|----|-----|------|
| `name` | (1) | Name of a blob attached to an operation in the OpenVINO IR. |
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``name``
- (1)
- Name of a blob attached to an operation in the OpenVINO IR.
* - ``arg-index``
- (1)
- 0-based index in the entry function arguments to be bound to.
**Sub-nodes**: None
### Tensor Node and Sub-Node Structure
Tensor Node and Sub-Node Structure
++++++++++++++++++++++++++++++++++
The `Tensor` node configures a single input or output tensor.
The ``Tensor`` node configures a single input or output tensor.
| Attribute Name | \# | Description |
|------|-------|-------|
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
| `type` | (1) | `input` or `output` |
| `port-index` | (1) | 0-based index in the operation input/output ports in the OpenVINO IR |
| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB`(also in lowercase). The default value: `BFYX` |
.. list-table::
:header-rows: 1
### CompilerOptions Node and Sub-Node Structure
* - Attribute Name
- #
- Description
* - ``arg-index``
- (1)
- 0-based index in the entry function arguments to be bound to.
* - ``type``
- (1)
- ``input`` or ``output``
* - ``port-index``
- (1)
- 0-based index in the operation input/output ports in the OpenVINO IR
* - ``format``
- (0/1)
- Data layout declaration for the tensor. Accepted values: ``BFYX`` , ``BYXF`` , ``YXFB`` , ``FYXB`` , and same values in all lowercase. Default value: ``BFYX``.
The `CompilerOptions` node configures the compilation flags for the OpenCL
CompilerOptions Node and Sub-Node Structure
+++++++++++++++++++++++++++++++++++++++++++
The ``CompilerOptions`` node configures the compilation flags for the OpenCL
sources.
| Attribute Name | \# | Description |
|--------|-----|------|
| `options` | (1) | Options string to be passed to the OpenCL compiler |
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``options``
- (1)
- Options string to be passed to the OpenCL compiler
**Sub-nodes**: None
### WorkSizes Node and Sub-Node Structure
WorkSizes Node and Sub-Node Structure
+++++++++++++++++++++++++++++++++++++
The `WorkSizes` node configures the global/local work sizes to be used when
The ``WorkSizes`` node configures the global/local work sizes to be used when
queuing an OpenCL program for execution.
| Attribute Name | \# | Description |
|-----|------|-----|
| `global`<br>`local` | (0/1)<br>(0/1) | An array of up to three integers or formulas for defining OpenCL work-sizes to be used during execution.<br> The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,%. All operators are evaluated in integer arithmetic. <br>Default value: `global=”B*F*Y*X” local=””` |
| `dim` | (0/1) | A tensor to take the work-size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. The default value: `output` |
.. list-table::
:header-rows: 1
* - Attribute Name
- #
- Description
* - ``global`` ``local``
- (0/1) (0/1)
- An array of up to three integers or formulas for defining OpenCL work-sizes to be used during execution. The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,%. All operators are evaluated in integer arithmetic. Default value: ``global=”B\*F\*Y\*X” local=””``
* - ``dim``
- (0/1)
- A tensor to take the work-size from. Accepted values: ``input N`` , ``output`` , where ``N`` is an index of input tensor starting with 0. Default value: ``output``
**Sub-nodes**: None
## Example Configuration File
Example Configuration File
##########################
The following code sample provides an example configuration file in XML
format. For information on the configuration file structure, see the
The following table includes definitions that are attached before
user sources.
For an example, see [Example Kernel](#example-kernel).
For an example, see `Example Kernel<#example-kernel>`__.
| Name | Value |
|---|---|
| `NUM_INPUTS` | Number of the input tensors bound to this kernel. |
| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel. |
| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array. |
| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel. |
| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array. |
| `<TENSOR>_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX`. |
| `<TENSOR>_DIMS_SIZE`| The size of the `<TENSOR>_DIMS` array.|
| `<TENSOR>_TYPE`| The datatype of the tensor: `float`, `half`, or `char`. |
| `<TENSOR>_FORMAT_<TENSOR_FORMAT>` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. |
| `<TENSOR>_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.|
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array. |
| `<TENSOR>_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. |
| `<TENSOR>_UPPER_PADDING_SIZE` | The size of the `<TENSOR>_UPPER_PADDING` array. |
| `<TENSOR>_PITCHES` | The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX. |
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array. |
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element, bypassing the lower padding. |
.. list-table::
:header-rows: 1
All `<TENSOR>` values are automatically defined for every tensor
bound to this operation, such as `INPUT0`, `INPUT1`, and `OUTPUT0`, as shown
* - Name
- Value
* - ``NUM_INPUTS``
- Number of the input tensors bound to this kernel
* - ``GLOBAL_WORKSIZE``
- An array of global work sizes used to execute this kernel
* - ``GLOBAL_WORKSIZE_SIZE``
- The size of the ``GLOBAL_WORKSIZE`` array
* - ``LOCAL_WORKSIZE``
- An array of local work sizes used to execute this kernel
* - ``LOCAL_WORKSIZE_SIZE``
- The size of the ``LOCAL_WORKSIZE`` array
* - ``<TENSOR>_DIMS``
- An array of the tensor dimension sizes. Always ordered as ``BFYX``
* - ``<TENSOR>_DIMS_SIZE``
- The size of the ``<TENSOR>_DIMS`` array.
* - ``<TENSOR>_TYPE``
- The datatype of the tensor: ``float`` , ``half`` , or ``char``
* - ``<TENSOR>_FORMAT_<TENSOR_FORMAT>``
- The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with ``#ifdef/#endif`` .
* - ``<TENSOR>_LOWER_PADDING``
- An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.
* - ``<TENSOR>_LOWER_PADDING_SIZE``
- The size of the ``<TENSOR>_LOWER_PADDING`` array
* - ``<TENSOR>_UPPER_PADDING``
- An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX.
* - ``<TENSOR>_UPPER_PADDING_SIZE``
- The size of the ``<TENSOR>_UPPER_PADDING`` array
* - ``<TENSOR>_PITCHES``
- The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.
* - ``<TENSOR>_PITCHES_SIZE``
- The size of the ``<TENSOR>_PITCHES`` array
* - ``<TENSOR>_OFFSET``
- The number of elements from the start of the tensor to the first valid element, bypassing the lower padding.
All ``<TENSOR>`` values are automatically defined for every tensor
bound to this operation, such as ``INPUT0``, ``INPUT1``, and ``OUTPUT0``, as shown
in the following example:
```c
#define INPUT0_DIMS_SIZE 4
#define INPUT0_DIMS (int []){ 1,96,55,55, }
```
.. code-block:: c
## Example Kernel<a name="example-kernel"></a>
#define INPUT0_DIMS_SIZE 4
#define INPUT0_DIMS (int []){ 1,96,55,55, }
```c
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
__kernelvoidexample_relu_kernel(
const__globalINPUT0_TYPE*input0,
__globalOUTPUT0_TYPE*output)
{
constuintidx=get_global_id(0);
constuintidy=get_global_id(1);
constuintidbf=get_global_id(2);// batches*features, as OpenCL supports 3D nd-ranges only
constuintfeature=idbf%OUTPUT0_DIMS[1];
constuintbatch=idbf/OUTPUT0_DIMS[1];
//notice that pitches are in elements, not in bytes!
// neg_slope (which is non-zero for leaky ReLU) is put automatically as #define, refer to the config xml
output[out_id] = value < 0 ? value * neg_slope : value;
}
**Using `printf` in the OpenCL™ Kernels**.
To debug the specific values, use `printf` in your kernels.
.. _debugging-tips:
.. note::
As described in the previous section, all items such as the ``INPUT0_TYPE`` are actually defined as OpenCL (pre-)compiler inputs by OpenVINO for efficiency reasons. See the `Debugging Tips <#debugging-tips>`__ below for information on debugging the results.
Debugging Tips
##############
**Using ``printf`` in the OpenCL™ Kernels**.
To debug the specific values, use ``printf`` in your kernels.
However, be careful not to output excessively, which
could generate too much data. The `printf` output is typical, so
could generate too much data. The ``printf`` output is typical, so
your output can be truncated to fit the buffer. Also, because of
buffering, you actually get an entire buffer of output when the
execution ends.<br>
execution ends.
For more information, refer to the [printf Function](https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/printfFunction.html).
For more information, refer to the `printf Function<https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/printfFunction.html>`__.
The Intel® Distribution of OpenVINO™ toolkit supports neural-network models trained with various frameworks, including
TensorFlow, PyTorch, ONNX, TensorFlow Lite, and PaddlePaddle (OpenVINO support for Apache MXNet, Caffe, and Kaldi is currently
being deprecated and will be removed entirely in the future). The list of supported operations is different for each of the supported frameworks.
To see the operations supported by your framework, refer to :doc:`Supported Framework Operations <openvino_resources_supported_operations_frontend>`.
Custom operations, which are not included in the list, are not recognized by OpenVINO out-of-the-box. The need for custom operation may appear in two cases:
1. A new or rarely used regular framework operation is not supported in OpenVINO yet.
2. A new user operation that was created for some specific model topology by the author of the model using framework extension capabilities.
Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations. This allows plugging in your own implementation for them. OpenVINO Extensibility API enables adding support for those custom operations and using one implementation for Model Optimizer and OpenVINO Runtime.
Defining a new custom operation basically consists of two parts:
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). The implementation of execution kernels for [GPU](./GPU_Extensibility.md) is described in separate guides.
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). The implementation of execution kernels for :doc:`GPU <openvino_docs_Extensibility_UG_GPU>` is described in separate guides.
2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics.
The first part is required for inference. The second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part. The following sections will describe them in detail.
## Definition of Operation Semantics
Definition of Operation Semantics
#################################
If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. Refer to the latest OpenVINO operation set, when deciding feasibility of such decomposition. You can use any valid combination of exiting operations. The next section of this document describes the way to map a custom operation.
If such decomposition is not possible or appears too bulky with a large number of constituent operations that do not perform well, then a new class for the custom operation should be implemented, as described in the [Custom Operation Guide](add_openvino_ops.md).
If such decomposition is not possible or appears too bulky with a large number of constituent operations that do not perform well, then a new class for the custom operation should be implemented, as described in the :doc:`Custom Operation Guide <openvino_docs_Extensibility_UG_add_openvino_ops>`.
You might prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise, try to decompose the operation first, as described above. Then, after verifying correctness of inference and resulting performance, you may move on to optional implementation of Bare Metal C++.
## Mapping from Framework Operation
Mapping from Framework Operation
################################
Mapping of custom operation is implemented differently, depending on model format used for import. You may choose one of the following:
1. If a model is represented in the ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with the `--extensions` option in Model Optimizer or when a model is imported directly to OpenVINO runtime using the `read_model` method. Python API is also available for runtime model import.
1. If a model is represented in the ONNX (including models exported from Pytorch in ONNX), TensorFlow Lite, PaddlePaddle or TensorFlow formats, then one of the classes from :doc:`Frontend Extension API <openvino_docs_Extensibility_UG_Frontend_Extensions>` should be used. It consists of several classes available in C++ which can be used with the ``--extensions`` option in Model Optimizer or when a model is imported directly to OpenVINO runtime using the ``read_model`` method. Python API is also available for runtime model import.
2. If a model is represented in the TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only.
2. If a model is represented in the Caffe, Kaldi or MXNet formats (as legacy frontends), then :doc:`[Legacy] Model Optimizer Extensions <openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer>` should be used. This approach is available for model conversion in Model Optimizer only.
Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle) and legacy frontends (TensorFlow, Caffe, Kaldi and Apache MXNet). Model Optimizer can use both front-ends in contrast to the direct import of model with `read_model` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.
Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle, TensorFlow Lite, and TensorFlow) and legacy frontends (Caffe, Kaldi, and Apache MXNet). Model Optimizer can use both frontends in contrast to the direct import of model with ``read_model`` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.
If you are implementing extensions for new ONNX or PaddlePaddle frontends and plan to use the `--extensions` option in Model Optimizer for model conversion, then the extensions should be:
If you are implementing extensions for new ONNX, PaddlePaddle, TensorFlow Lite or TensorFlow frontends and plan to use the ``--extensions`` option in Model Optimizer for model conversion, then the extensions should be:
1. Implemented in C++ only.
@@ -69,109 +78,125 @@ Model Optimizer does not support new frontend extensions written in Python API.
Remaining part of this guide describes application of Frontend Extension API for new frontends.
## Registering Extensions
Registering Extensions
######################
A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime.
> **NOTE**: This documentation is derived from the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/src/core/template_extension/new), which demonstrates the details of extension development. It is based on minimalistic `Identity` operation that is a placeholder for your real custom operation. Review the complete, fully compilable code to see how it works.
.. note::
This documentation is derived from the `Template extension <https://github.com/openvinotoolkit/openvino/tree/master/src/core/template_extension/new>`__, which demonstrates the details of extension development. It is based on minimalistic ``Identity`` operation that is a placeholder for your real custom operation. Review the complete, fully compilable code to see how it works.
Use the `ov::Core::add_extension` method to load the extensions to the `ov::Core` object. This method allows loading library with extensions or extensions from the code.
Use the ``:ref:`ov::Core::add_extension<doxid-classov_1_1_core_1a68d0dea1cbcd42a67bea32780e32acea>``` method to load the extensions to the ``:ref:`ov::Core <doxid-classov_1_1_core>``` object. This method allows loading library with extensions or extensions from the code.
### Load Extensions to Core
Load Extensions to Core
+++++++++++++++++++++++
Extensions can be loaded from a code with the `ov::Core::add_extension` method:
Extensions can be loaded from a code with the ``:ref:`ov::Core::add_extension<doxid-classov_1_1_core_1a68d0dea1cbcd42a67bea32780e32acea>``` method:
The `Identity` is a custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is sufficient to enable reading OpenVINO IR which uses the `Identity` extension operation emitted by Model Optimizer. In order to load original model directly to the runtime, add a mapping extension:
The ``Identity`` is a custom operation class defined in :doc:`Custom Operation Guide <openvino_docs_Extensibility_UG_add_openvino_ops>`. This is sufficient to enable reading OpenVINO IR which uses the ``Identity`` extension operation emitted by Model Optimizer. In order to load original model directly to the runtime, add a mapping extension:
When Python API is used, there is no way to implement a custom OpenVINO operation. Even if custom OpenVINO operation is implemented in C++ and loaded into the runtime by a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. In this case, use C++ shared library approach to implement both operations semantics and framework mapping.
Python can still be used to map and decompose operations when only operations from the standard OpenVINO operation set are used.
### Create a Librarywith Extensions
.. _create_a_library_with_extensions:
Create a Library with Extensions
++++++++++++++++++++++++++++++++
An extension library should be created in the following cases:
- Conversion of a model with custom operations in Model Optimizer.
- Loading a model with custom operations in a Python application. This applies to both framework model and OpenVINO IR.
- Loading models with custom operations in tools that support loading extensions from a library, for example the `benchmark_app`.
* Conversion of a model with custom operations in Model Optimizer.
* Loading a model with custom operations in a Python application. This applies to both framework model and OpenVINO IR.
* Loading models with custom operations in tools that support loading extensions from a library, for example the ``benchmark_app``.
To create an extension library, for example, to load the extensions into Model Optimizer, perform the following:
1. Create an entry point for extension library. OpenVINO provides the `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO Extensions.
1. Create an entry point for extension library. OpenVINO provides the ``:ref:`OPENVINO_CREATE_EXTENSIONS()<doxid-core_2include_2openvino_2core_2extension_8hpp_1acdadcfa0eff763d8b4dadb8a9cb6f6e6>``` macro, which allows to define an entry point to a library with OpenVINO Extensions.
This macro should have a vector of all OpenVINO Extensions as an argument.
Based on that, the declaration of an extension class might look like the following:
OpenVINO™ Extension API allows you to register custom operations to support models with operations which OpenVINO™ does not support out-of-the-box. This capability requires writing code in C++, so if you are using Python to develop your application you need to build a separate shared library implemented in C++ first and load it in Python using `add_extension` API. Please refer to [Create library with extensions](Intro.md#create-library-with-extensions) for more details on library creation and usage. The remining part of this document describes how to implement an operation class.
@sphinxdirective
## Operation Class
.. meta::
:description: Explore OpenVINO™ Extension API which enables registering
custom operations to support models with operations
not supported by OpenVINO.
To add your custom operation, create a new class that extends `ov::Op`, which is in turn derived from `ov::Node`, the base class for all graph operations in OpenVINO™. To add `ov::Op` please include next file:
OpenVINO™ Extension API allows you to register custom operations to support models with operations which OpenVINO™ does not support out-of-the-box. This capability requires writing code in C++, so if you are using Python to develop your application you need to build a separate shared library implemented in C++ first and load it in Python using ``add_extension`` API. Please refer to :ref:`Create library with extensions <create_library_with_extensions>` for more details on library creation and usage. The remining part of this document describes how to implement an operation class.
To add your custom operation, create a new class that extends ``ov::Op``, which is in turn derived from ``:ref:`ov::Node <doxid-classov_1_1_node>```, the base class for all graph operations in OpenVINO™. To add ``ov::Op``, include the next file:
1. Add the `OPENVINO_OP` macro which defines a `NodeTypeInfo` object that identifies the type of the operation to the graph users and helps with dynamic type resolution. The type info of an operation currently consists of a string operation identifier and a string for operation version.
1. Add the ``OPENVINO_OP`` macro which defines a ``NodeTypeInfo`` object that identifies the type of the operation to the graph users and helps with dynamic type resolution. The type info of an operation currently consists of a string operation identifier and a string for operation version.
2. Implement default constructor and constructors that optionally take the operation inputs and attributes as parameters.
3. Override the shape inference method `validate_and_infer_types`. This method is called multiple times during graph manipulations to determine the shapes and element types of the operations outputs. To access the input shapes and input element types, use the `get_input_partial_shape()` and `get_input_element_type()` methods of `ov::Node`. Set the inferred shape and element type of the output using `set_output_type`.
3. Override the shape inference method ``validate_and_infer_types``. This method is called multiple times during graph manipulations to determine the shapes and element types of the operations outputs. To access the input shapes and input element types, use the ``get_input_partial_shape()`` and ``get_input_element_type()`` methods of ``:ref:`ov::Node <doxid-classov_1_1_node>```. Set the inferred shape and element type of the output using ``set_output_type``.
4. Override the `clone_with_new_inputs` method, which enables graph manipulation routines to create copies of this operation and connect it to different nodes during optimization.
4. Override the ``clone_with_new_inputs`` method, which enables graph manipulation routines to create copies of this operation and connect it to different nodes during optimization.
5. Override the `visit_attributes` method, which enables serialization and deserialization of operation attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector`, and for existing OpenVINO defined types.
5. Override the ``visit_attributes`` method, which enables serialization and deserialization of operation attributes. An ``AttributeVisitor`` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware ``on_attribute`` helper. Helpers are already implemented for standard C++ types like ``int64_t``, ``float``, ``bool``, ``vector``, and for existing OpenVINO defined types.
6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allows to get information about availability of `evaluate` method for the operation.
6. Override ``evaluate``, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains ``evaluate`` method you also need to override the ``has_evaluate`` method, this method allows to get information about availability of ``evaluate`` method for the operation.
Based on that, declaration of an operation class can look as follows:
### Operation Constructors
Operation Constructors
++++++++++++++++++++++
OpenVINO™ operation contains two constructors:
* Default constructor, which enables you to create an operation without attributes
* Constructor that creates and validates an operation with specified inputs and attributes
`ov::Node::validate_and_infer_types` method validates operation attributes and calculates output shapes using attributes of the operation.
``:ref:`ov::Node::validate_and_infer_types<doxid-classov_1_1_node_1ac5224b5be848ec670d2078d9816d12e7>``` method validates operation attributes and calculates output shapes using attributes of the operation.
`ov::Node::clone_with_new_inputs` method creates a copy of the operation with new inputs.
``:ref:`ov::Node::clone_with_new_inputs<doxid-classov_1_1_node_1a04cb103fa069c3b7944ab7c44d94f5ff>``` method creates a copy of the operation with new inputs.
The goal of this chapter is to explain how to use Frontend extension classes to facilitate mapping of custom operations from framework model representation to OpenVINO representation. Refer to [Introduction to OpenVINO Extension](Intro.md) to understand entire flow.
@sphinxdirective
This API is applicable for new frontends only, which exist for ONNX and PaddlePaddle. If a different model format is used, follow legacy [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) guide.
.. meta::
:description: Learn how to use frontend extension classes to facilitate the mapping
of custom operations from the framework model representation to the OpenVINO
representation.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/src/core/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
## Single Operation Mapping with OpExtension
The goal of this chapter is to explain how to use Frontend extension classes to facilitate
mapping of custom operations from framework model representation to OpenVINO representation.
Refer to :doc:`Introduction to OpenVINO Extension <openvino_docs_Extensibility_UG_Intro>` to
understand the entire flow.
This section covers the case when a single operation in framework representation is mapped to a single operation in OpenVINO representation. This is called *one-to-one mapping*. There is `OpExtension` class that works well if all the following conditions are satisfied:
This API is applicable to new frontends only, which exist for ONNX, TensorFlow Lite, PaddlePaddle, and TensorFlow.
If a different model format is used, follow legacy
This documentation is written based on the `Template extension <https://github.com/openvinotoolkit/openvino/tree/master/src/core/template_extension/new>`__,
which demonstrates extension development details based on minimalistic ``Identity``
operation that is a placeholder for your real custom operation. You can review the complete code,
which is fully compilable, to see how it works.
.. note::
You can find more examples of extensions in `openvino_contrib repository <https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/custom_operations>`_.
Single Operation Mapping with OpExtension
#########################################
This section covers the case when a single operation in framework representation is mapped to a single
operation in OpenVINO representation. This is called *one-to-one mapping*. There is ``OpExtension``
class that works well if all the following conditions are satisfied:
1. Number of inputs to operation in the Framework representation is the same as in the OpenVINO representation.
2. Number of outputs is also the same in both representations.
3. Inputs can be indexed and are mapped in order correspondingly, e.g. input with index 0 in framework representation maps to input with index 0 in OpenVINO representation and so on.
3. Inputs can be indexed and are mapped in order correspondingly, e.g.
input with index 0 in framework representation maps to input with index 0 in OpenVINO representation and so on.
4. The same for outputs.
5. Each attribute in OpenVINO operation can be initialized from one of the attributes of original operation or by
some predefined constant value. Value of copied attributes cannot contain expressions, value is accepted as-is,
so type of a value should be compatible.
5. Each attribute in OpenVINO operation can be initialized from one of the attributes of original operation or by some predefined constant value. Value of copied attributes cannot contain expressions, value is accepted as-is, so type of a value should be compatible.
.. note::
> **NOTE**: `OpExtension` class is currently available for ONNX frontend only. PaddlePaddle frontend has named inputs and outputs for operation (not indexed) therefore OpExtension mapping is not applicable for this case.
``OpExtension`` class is currently available for ONNX and TensorFlow frontends.
PaddlePaddle frontend has named inputs and outputs for operation (not indexed)
therefore OpExtension mapping is not applicable for this case.
The next example maps ONNX operation with type [“Identity”]( https://github.com/onnx/onnx/blob/main/docs/Operators.md#Identity) to OpenVINO template extension `Identity` class.
The following example maps ONNX operation with the type of `Identity <https://github.com/onnx/onnx/blob/main/docs/Operators.md#Identity>`__
to OpenVINO template extension ``Identity`` class.
The mapping doesn’t involve any attributes, as operation Identity doesn’t have them.
Extension objects, like just constructed `extension` can be used to add to the OpenVINO runtime just before the loading a model that contains custom operations:
Extension objects, like just constructed ``extension`` can be used to add to the
OpenVINO runtime just before the loading a model that contains custom operations:
Or extensions can be constructed in a separately compiled shared library. Separately compiled library can be used in Model Optimizer or `benchmark_app`. Read about how to build and load such library in chapter “Create library with extensions” in [Introduction to OpenVINO Extension](Intro.md).
Or extensions can be constructed in a separately compiled shared library.
Separately compiled library can be used in Model Optimizer or ``benchmark_app``.
Read about how to build and load such a library in the chapter of “Create library with extensions” in
:doc:`Introduction to OpenVINO Extension <openvino_docs_Extensibility_UG_Intro>`.
If operation have multiple inputs and/or outputs they will be mapped in order. The type of elements in input/output tensors should match expected types in the surrounding operations. For example, if custom operation produces `f32` data type then operation that consumes this output should also support `f32`. Otherwise, model conversion fails with an error, there are no automatic type conversion happens.
If operation have multiple inputs and/or outputs they will be mapped in order.
The type of elements in input/output tensors should match expected types in the surrounding operations.
For example, if a custom operation produces the ``f32`` data type, the operation that consumes this output
should also support ``f32``. Otherwise, model conversion fails with an error, as no automatic type conversion is performed.
### Converting to Standard OpenVINO Operation
Converting to Standard OpenVINO Operation
+++++++++++++++++++++++++++++++++++++++++
`OpExtension` class can be used when mapping to one of the operations from standard OpenVINO operation set is what you need and there is no class like `TemplateExtension::Identity` implemented.
``OpExtension`` class can be used when mapping to one of the operations from standard OpenVINO
operation set is what you need and there is no class like ``TemplateExtension::Identity`` implemented.
Here is an example for a custom framework operation “MyRelu”. Suppose it is mathematically equivalent to standard `Relu` that exists in OpenVINO operation set, but for some reason has type name “MyRelu”. In this case you can directly say that “MyRelu” -> `Relu` mapping should be used:
Here is an example of a custom framework operation 'MyRelu'. Assume it is mathematically equivalent
to standard ``Relu`` that exists in the OpenVINO operation set, but for some reason has the type name of 'MyRelu'.
In this case, you can directly say that 'MyRelu' -> ``Relu`` mapping should be used:
In the resulting converted OpenVINO model, “MyRelu” operation will be replaced by the standard operation `Relu` from the latest available OpenVINO operation set. Notice that when standard operation is used, it can be specified using just a type string (“Relu”) instead of using a `ov::opset8::Relu` class name as a template parameter for `OpExtension`. This method is available for operations from the standard operation set only. For a user custom OpenVINO operation the corresponding class should be always specified as a template parameter as it was demonstrated with `TemplateExtension::Identity`.
As described above, `OpExtension` is useful when attributes can be mapped one by one or initialized by a constant. If the set of attributes in framework representation and OpenVINO representation completely match by their names and types, nothing should be specified in OpExtension constructor parameters. The attributes are discovered and mapped automatically based on `visit_attributes` method that should be defined for any OpenVINO operation.
In the resulting converted OpenVINO model, “MyRelu” operation will be replaced by the standard operation
``Relu`` from the latest available OpenVINO operation set. Notice that when standard operation is used,
it can be specified using just a type string (“Relu”) instead of using a ``ov::opset8::Relu`` class name as a
template parameter for ``OpExtension``. This method is available for operations from the standard operation set only.
For a user custom OpenVINO operation the corresponding class should be always specified as a template parameter
as it was demonstrated with ``TemplateExtension::Identity``.
Imagine you have CustomOperation class implementation that has two attributes with names `attr1` and `attr2`:
As described above, ``OpExtension`` is useful when attributes can be mapped one by one or initialized by a constant.
Attributes in OpenVINO operators are identified by their names, so for frameworks that also have named attributes (like TensorFlow, PaddlePaddle, ONNX),
you can specify name to name mapping. For frameworks where OpenVINO operator's attributes can be mapped to one of the framework
operator inputs (like PyTorch), there's a name to input index mapping.
And original model in framework representation also has operation with name “CustomOperatoin” with the same `attr1` and `attr2` attributes. Then with the following code:
both `attr1` and `attr2` are copied from framework representation to OpenVINO representation automatically. If for some reason names of attributes are different but values still can be copied “as-is” you can pass attribute names mapping in `OpExtension` constructor:
If the set of attributes in framework representation and OpenVINO representation completely match by their names and types,
no attribute mapping has to be specified in OpExtension constructor parameters. The attributes are discovered and mapped automatically
based on ``visit_attributes`` method that should be defined for any OpenVINO operation.
If copying of an attribute is not what you need, `OpExtension` also can set attribute to predefined constant value. For the same `CustomOperation`, imagine you want to set `attr2` to value 5 instead of copying from `fw_attr2`, to achieve that do the following:
And original model in framework representation also has operation with name ``CustomOperation`` with the same
``attr1`` and ``attr2`` attributes. Then with the following code:
So the conclusion is that each attribute of target OpenVINO operation should be initialized either by
1. Setting automatically due to name matching
2. Mapped by attribute name
3. Set to a constant value
This is achieved by specifying maps as arguments for `OpExtension` constructor.
This is achieved by specifying maps as arguments for ``OpExtension`` constructor.
## Mapping to Multiple Operations with ConversionExtension
Attribute mapping with named inputs and outputs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previous sections cover the case when a single operation is mapped to a single operation with optional adjustment in names and attribute values. That is likely enough for your own custom operation with existing C++ kernel implementation. In this case your framework representation and OpenVINO representation for the operation are under your control and inputs/outpus/attributes can be aligned to make `OpExtension` usable.
Mappings in previous examples assume that inputs and outputs of an operator in framework model representation come
with a particular order so you can directly map framework operation input ``0`` to OpenVINO operation input ``0`` and so on.
That's not always the case, for frameworks like PaddlePaddle, operation inputs and outputs are identified by their names
and may be defined in any order. So to map it to OpenVINO operation inputs and outputs, you have to specify that order yourself.
This can be done by creating two vector of strings, one for input and one for output, where framework operation
input name at position ``i`` maps to OpenVINO operation input at position ``i`` (and similarly for outputs).
In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity.
`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
Let's see the following example. Like previously, we'd like to map ``CustomOperation`` in the original model,
to OpenVINO ``CustomOperation`` as is (so their name and attributes names match). This time, that framework operation
inputs and outputs are not stricly ordered and can be identified by their names ``A``, ``B``, ``C`` for inputs
and ``X``, ``Y`` for outputs. Those inputs and outputs can be mapped to OpenVINO operation, such that inputs
``A``, ``B``, ``C`` map to OpenVINO ``CustomOperation`` first, second and third input and ``X`` and ``Y``
outputs map to OpenVINO ``CustomOperation`` first and second output respectively.
The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`.
Given that, such custom operation can be registered by the following:
> **NOTE**: `ThresholdedRelu` is one of the standard ONNX operators which is supported by ONNX frontend natively out-of-the-box. Here we are re-implementing it to illustrate how you can add a similar support for your custom operation instead of `ThresholdedRelu`.
This will map ``alpha`` to the second input and map ``beta`` attribute to constant value ``1.0f``.
Such created extension can be used, e.g. in dynamic library, please refer to :ref:`Create a library with extensions <create_a_library_with_extensions>`.
Mapping custom operations to frontends with OPENVINO_FRAMEWORK_MAP macro
``OPENVINO_FRAMEWORK_MAP`` is a macro that should be used inside OpenVINO operation's class definition and that lets you specify
the mapping between this operation to a frontend operation.
Let's consider the following example. Imagine you have an ONNX model with ``CustomOp`` operation (and this operation has ``mode`` attribute),
a TensorFlow model with ``CustomOpV3`` operation (this operation has ``axis`` attribute) and a PaddlePaddle model with ``CustomOp`` (with ``mode`` attribute)
that has input named "X" and output named "Out" and all of them can be implemented with a single OpenVINO operation ``CustomOp`` like follows:
To access original framework operation attribute value and connect to inputs, ``node`` object of type ``NodeContext`` is used. It has three main methods:
* ``NodeContext::get_input`` to get input with a given index,
* ``NodeContext::get_attribute`` to get attribute value with a given name,
* ``NodeContext::get_values_from_const_input`` to get an attribute with a given input index.
The conversion function should return a vector of node outputs that are mapped to
corresponding outputs of the original framework operation in the same order.
Some frameworks require output names of the operation to be provided during conversion.
For PaddlePaddle operations, it is generally necessary to provide names for all outputs using the ``NamedOutputs`` container.
Usually those names can be found in source code of the individual operation in PaddlePaddle code.
The next example shows such conversion for the ``top_k_v2`` operation.
The conversion function should return a vector of node outputs that are mapped to corresponding outputs of the original framework operation in the same order.
`ov::pass::GraphRewrite` serves for running multiple matcher passes on `ov::Model` in a single graph traversal.
@sphinxdirective
.. meta::
:description: Get to know how Graph Rewrite handles running multiple matcher passes on
ov::Model in a single graph traversal.
``:ref:`ov::pass::GraphRewrite <doxid-classov_1_1pass_1_1_graph_rewrite>``` serves for running multiple matcher passes on ``:ref:`ov::Model <doxid-classov_1_1_model>``` in a single graph traversal.
In addition, GraphRewrite handles nodes that were registered by MatcherPasses during their execution. This nodes will be added to the beginning of the sequence with nodes for pattern matching.
> **NOTE**: when using `ov::pass::Manager` temporary GraphRewrite is used to execute single MatcherPass.
.. note::
When using ``:ref:`ov::pass::Manager <doxid-classov_1_1pass_1_1_manager>``` temporary GraphRewrite is used to execute single MatcherPass.
GraphRewrite has two algorithms for MatcherPasses execution. First algorithm is straightforward. It applies each MatcherPass in registration order to current node.
But it is not really efficient when you have a lot of registered passes. So first of all GraphRewrite checks that all MatcherPass patterns has type-based root node (it means that type of this node is not hidden into predicate).
And then creates map from registered MatcherPasses. That helps to avoid additional cost of applying each MatcherPass for each node.
To use ``:ref:`ov::pass::MatcherPass <doxid-classov_1_1pass_1_1_matcher_pass>```, you need to complete these steps:
To use `ov::pass::MatcherPass`, you need to complete these steps:
1. Create a pattern
2. Implement a callback
3. Register the pattern and Matcher
@@ -15,87 +29,135 @@ To use `ov::pass::MatcherPass`, you need to complete these steps:
So let's go through each of these steps.
## Create a pattern
Create a pattern
################
Pattern is a single root `ov::Model`. But the only difference is that you do not need to create a model object, you just need to create and connect opset or special pattern operations.
Pattern is a single root ``:ref:`ov::Model <doxid-classov_1_1_model>```. But the only difference is that you do not need to create a model object, you just need to create and connect opset or special pattern operations.
Then you need to take the last created operation and put it as a root of the pattern. This root node will be used as a root node in pattern matching.
> **NOTE**: Any nodes in a pattern that have no consumers and are not registered as root will not be used in pattern matching.
Any nodes in a pattern that have no consumers and are not registered as root will not be used in pattern matching.
The `Parameter` operation in the example above has type and shape specified. These attributes are needed only to create Parameter operation class and will not be used in pattern matching.
For more pattern examples, refer to the [pattern matching](#pattern_matching) section.
The ``Parameter`` operation in the example above has type and shape specified. These attributes are needed only to create Parameter operation class and will not be used in patternmatching.
## Implement callback
For more pattern examples, refer to the `pattern matching section <#pattern-matching>`__.
Implement callback
##################
Callback is an action applied to every pattern entrance. In general, callback is the lambda function that takes Matcher object with detected subgraph.
The example above shows the callback structure and how Matcher can be used for accessing nodes detected by pattern.
Callback return value is `true` if root node was replaced and another pattern cannot be applied to the same root node; otherwise, it is `false`.
> **NOTE**: It is not recommended to manipulate with nodes that are under root node. This may affect GraphRewrite execution as it is expected that all nodes that come after root node in topological order are valid and can be used in pattern matching.
Callback return value is ``true`` if root node was replaced and another pattern cannot be applied to the same root node; otherwise, it is ``false``.
.. note::
It is not recommended to manipulate with nodes that are under root node. This may affect GraphRewrite execution as it is expected that all nodes that come after root node in topological order are valid and can be used in pattern matching.
MatcherPass also provides functionality that allows reporting of the newly created nodes that can be used in additional pattern matching.
If MatcherPass was registered in `ov::pass::Manager` or `ov::pass::GraphRewrite`, these registered nodes will be added for additional pattern matching.
That means that matcher passes registered in `ov::pass::GraphRewrite` will be applied to these nodes.
If MatcherPass was registered in ``:ref:`ov::pass::Manager<doxid-classov_1_1pass_1_1_manager>``` or ``:ref:`ov::pass::GraphRewrite <doxid-classov_1_1pass_1_1_graph_rewrite>```, these registered nodes will be added for additional pattern matching.
That means that matcher passes registered in ``:ref:`ov::pass::GraphRewrite<doxid-classov_1_1pass_1_1_graph_rewrite>``` will be applied to these nodes.
The example below shows how single MatcherPass can fuse sequence of operations using the `register_new_node` method.
The example below shows how single MatcherPass can fuse sequence of operations using the ``register_new_node`` method.
> **NOTE**: If you register multiple nodes, please add them in topological order. We do not topologically sort these nodes as it is a time-consuming operation.
.. note::
If you register multiple nodes, please add them in topological order. We do not topologically sort these nodes as it is a time-consuming operation.
## Register pattern and Matcher
Register pattern and Matcher
############################
The last step is to register Matcher and callback inside the MatcherPass pass. To do this, call the `register_matcher` method.
> **NOTE**: Only one matcher can be registered for a single MatcherPass class.
The last step is to register Matcher and callback inside the MatcherPass pass. To do this, call the ``register_matcher`` method.
```cpp
// Register matcher and callback
register_matcher(m,callback);
```
## Execute MatcherPass
.. note::
Only one matcher can be registered for a single MatcherPass class.
.. code-block:: cpp
// Register matcher and callback
register_matcher(m, callback);
Execute MatcherPass
###################
MatcherPass has multiple ways to be executed:
* Run on a single node - it can be useful if you want to run MatcherPass inside another transformation.
* Run on `ov::Model` using GraphRewrite - this approach gives ability to run MatcherPass on whole `ov::Model`. Moreover, multiple MatcherPass transformation can be registered in a single GraphRewite to be executed in a single graph traversal.
* Run on `ov::Model` using `ov::pass::Manager` - this approach helps you to register MatcherPass for execution on `ov::Model` as another transformation types.
* Run on ``:ref:`ov::Model <doxid-classov_1_1_model>``` using GraphRewrite - this approach gives ability to run MatcherPass on whole ``:ref:`ov::Model <doxid-classov_1_1_model>```. Moreover, multiple MatcherPass transformation can be registered in a single GraphRewite to be executed in a single graph traversal.
* Run on ``:ref:`ov::Model <doxid-classov_1_1_model>``` using ``:ref:`ov::pass::Manager <doxid-classov_1_1pass_1_1_manager>``` - this approach helps you to register MatcherPass for execution on ``:ref:`ov::Model <doxid-classov_1_1_model>``` as another transformation types.
Sometimes patterns cannot be expressed via regular operations or it is too complicated.
For example, if you want to detect **Convolution->Add** sub-graph without specifying particular input type for Convolution operation or you want to create a pattern where some of operations can have different types.
And for these cases OpenVINO™ provides additional helpers to construct patterns for GraphRewrite transformations.
There are two main helpers:
1.`ov::pass::pattern::any_input` - helps to express inputs if their types are undefined.
2.`ov::pass::pattern::wrap_type<T>` - helps to express nodes of pattern without specifying node attributes.
1. ``:ref:`ov::pass::pattern::any_input <doxid-namespaceov_1_1pass_1_1pattern_1a8ed84c3eed4610f117ee10d86d500e02>``` - helps to express inputs if their types are undefined.
2. ``:ref:`ov::pass::pattern::wrap_type <doxid-namespaceov_1_1pass_1_1pattern_1adfcd6031c95d7bace5f084e2aa105af8>`<T>`` - helps to express nodes of pattern without specifying node attributes.
Let's go through the example to have better understanding of how it works:
> **NOTE**: Node attributes do not participate in pattern matching and are needed only for operations creation. Only operation types participate in pattern matching.
.. note::
Node attributes do not participate in pattern matching and are needed only for operations creation. Only operation types participate in pattern matching.
The example below shows basic usage of `ov::passpattern::any_input`.
The example below shows basic usage of ``ov::passpattern::any_input``.
Here we construct Multiply pattern with arbitrary first input and Constant as a second input.
Also as Multiply is commutative operation, it does not matter in which order we set inputs (any_input/Constant or Constant/any_input) because both cases will be matched.
# OpenVINO Model Pass {#openvino_docs_Extensibility_UG_model_pass}
`ov::pass::ModelPass` is used for transformations that take entire `ov::Model` as an input and process it.
@sphinxdirective
.. meta::
:description: Learn how to use Model Pass transformation class to take entire
ov::Model as input and process it.
``:ref:`ov::pass::ModelPass <doxid-classov_1_1pass_1_1_model_pass>``` is used for transformations that take entire ``:ref:`ov::Model <doxid-classov_1_1_model>``` as an input and process it.
Using `ov::pass::ModelPass`, you need to override the `run_on_model` method where you will write the transformation code.
Return value is `true` if the original model has changed during transformation (new operation was added, or operations replacement was made, or node attributes were changed); otherwise, it is `false`.
Also `ov::pass::ModelPass` based transformations can be executed via `ov::pass::Manager`.
Using ``:ref:`ov::pass::ModelPass<doxid-classov_1_1pass_1_1_model_pass>```, you need to override the ``run_on_model`` method where you will write the transformation code.
Return value is ``true`` if the original model has changed during transformation (new operation was added, or operations replacement was made, or node attributes were changed); otherwise, it is ``false``.
Also ``:ref:`ov::pass::ModelPass<doxid-classov_1_1pass_1_1_model_pass>``` based transformations can be executed via ``:ref:`ov::pass::Manager<doxid-classov_1_1pass_1_1_manager>```.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.