* Update doc for AUTO and AUTO_BATCH
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update per the comments
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Move default hint to THROUGHPUT section
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Fixed newAPI for case if core was removed
* Fixed code style
* Fixed typo
* Use new API by default
* Create core with template plugin
* Added doxygen comment
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* fix references
* update links
* update the wording to be more clear
* add the error message about Visual studio back
* update links to static html links of 2022.2
* change memory access pattern of fsv layout for permute
* Fix permute_ref to process F first only when (bf...) => (b...f)
* Refactor
Co-authored-by: si-eun-kim <sieun.kim@intel.com>
* add auto_batch_timeout for MULTI and AUTO
* fix clang-format for ie_core.cpp
* fix coredump
* simplify insert key to deviceConfig logic and parseDeviceNameIntoConfig() check "AUTO" and "AUTO:" only
* check config auto_batch_timeout
* add CleanUpInIECore()
* fix clang-format for ie_core.cpp
* Fix the deconv fused issue on AVX2 and AVX512 and enable deconv test
* Keep GroupDeconv BF16 test cases still disabled.
* Update to also excluding nightly
* Update onednn submodule.
* Update onednn submodule
* Update onednn submodule.
* Update the ONDENN submodule
* Update the ONEDNN commit.
* Update with merged onednn commit.
* Define new ppp API for nv12
* Add new ppp API function
* Add new ppp API unit test
* Add hello nv12 input classification ov
* Define new ppp API for nv12
* Add new ppp API function
* Add new ppp API unit test
* Add hello nv12 input classification ov
* Fix the clang -formate issue
* Modify the function called is_supported_image_size
* Update code as suggested
* Add hello_nv12_input_classification e2e test
* clang-format openvinotoolkit
* Fix the doc error in CI
Co-authored-by: River Li <river.li@intel.com>
Some compiler flags restrict the compiler from making arbitrary decisions while handling undefined C/C++ behaviors.
Therefore they can be used to fix some issues caused by undefined behavior.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* FTZ_and_DAZ_set_for_cpu
* remove DAZ
* fix
* extract to utils
* ie core part changes to add do as property and benchmark_app enable do
* enable brgcov from Luocheng patch
* add debug info
* enable_brgemm_on_avx512
* add python binding
* dlb test
* FTZ_and_DAZ_set_for_cpu
* remove DAZ
* fix
* extract to utils
* ie core part changes to add do as property and benchmark_app enable do
* enable brgcov from Luocheng patch
* add debug info
* enable_brgemm_on_avx512
* add python binding
* dlb test
* revert test code
* revert test code
* Handle in-place failure cases in reshape node
* Disable inplace when non-const reshape connected to constant
* Add comment to reshape_inplace test
* move copy WA into execute() to cover more general in-place failure cases
* enable brgconv f32
* use config to enable brgconv f32
* when brg disabled not init bin-postops
* change prop name for extensive
* use more general field
* fix review comments.
* Add FORCE_TBB_TERMINATE to legacy API
* Put this config into proper place
* fix issue in property test
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* [CPU] Optimize NonZero operation
# Conflicts:
# src/plugins/intel_cpu/src/nodes/non_zero.cpp
* [CPU] Rewrite NonZero implementation, so it will use generic ie_parallel API
* [CPU] NonZero operation: apply an additional optimization
* NonZero operation: add fallback code for inRank >= 6
* NonZero operation: apply review modifications
# Conflicts:
# src/plugins/intel_cpu/src/nodes/non_zero.cpp
* NonZero operation: inShape.getDims().size() -> inRank
* NonZero operation: eliminate input array index calculation by slight modification of ie_parallel API
* Adjust ie_parallel.hpp style for clang-format
* Try to unbreak the build
* Move to parallel_nt and add a cache for nd loops to optimize more
* Add minimal size threshold for threading and reduce warning count
* Try to workaround linter errors
* One more try to unbreak cpplint build
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* Remove vmaxps in store_vector.
This instruction is not needed for dst_prc int8.
And it may lead to wrong result with denormals optimization is on.
* Add vpmaxsd if dst_prc is u8 or u16.
* Enable hint to tput if no property is specified for both AUTO device and target device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Update logic.
2. Add test cases.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update. Set hints to default for target device if no hints setting for AUTO plugin and no specific properties setting for target device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
This extra semicolon creates an output as example below. The extra
'::' is equivalent to add '.' as part of the LD_LIBRARY_PATH. This
breaks glibc build, and very often creates weird issue when launch
commands from different path.
...inference_engine/external/tbb/lib::/opt/intel/openvino_2021/...
We also noticed that :${parameter:+:$parameter} is widely used in
this file. Please review the code and fix as needed.
* ExperimentalDetectronDetectionOutput: refine sorting criteria for NMS stage
This is to ensure the operation produces stable predictable results across
the possible sorting algorithm implementaions.
This property is useful for the operation testing.
* [GPU] Implement ExperimentalDetectronDetectionOutput operation
* [GPU] ExperimentalDetectronDetectionOutput: use vector types and operations in kernel
* Reformat changed files to make clang format checker happy
* [GPU] ExperimentalDetectronDetectionOutput: add another test case to the unit test
* [GPU] ExperimentalDetectronDetectionOutput: Add f16 test
* ExperimentalDetectronDetectionOutput: single-layer test: use all three outputs
* [GPU] ExperimentalDetectronDetectionOutput: increase single layer test coverage
More attribute permutations were added.
* add testcase for plugin properties should not be revised by compile_model
* rename smoke_cpuCompileModelBehaviorTests to smoke_gpuCompileModelBehaviorTests
* remove property EXCLUSIVE_ASYNC_REQUESTS in ov2.0 test
* add testcase for plugin properties should not be revised by loadNetwork
* 1. Enable IE Core filter to promote the secondary properties to first level for hardware device.
2. Enable IE Core filter to pass the secondary properties to AUTO plguin.
3. Enable AUTO Plugin to parse secondary properties to first level and pass them to corresponding target hardware device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Enable MULTI Plugin to support secondary properties.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. Enable HETERO Plugin to support secondary priorities.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Catch the EXPECT_CALL with AVAILABLE_DEVICES argument inputting to GetMetric.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Revert the logic of handling secondary properties for MULTI and HETERO device.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Remove the secondary property flattening logic because this logic has been implemented within AUTO plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* 1. update flatten logic when secondary properties is specified.
2. add the test case with secondary properties for CPU.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* add the test case with secondary properties for GPU plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add debug message to fix the test case failure issue.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add more debug info.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
1. For IE Core, 1st level property overides the 2nd level property.
2. For AUTO plugin, add available device list to check if the secondary properties is vaild.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add CUDA and ARM.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update device name for ARM Plugin and add device name for HPU plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* 1. Enable OPTIMIZATION_CAPABILITIES for AUTO plugin.
2. Add corresponding test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Remove EXPORT_IMPORT as Export is not implemented in the AUTO/MULTI.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* DOCS-structure_workflow
workflow diagram files and formatting
added overview articles on models and deployment
added the ecosystem page and changed the header from addons
* DOCS-structure_dlworkbench
* DOCS-structure_ovtf
* fixed FakeOutputResolver to avoid renaming correctly named nodes
* fixed failed mo_args test: process reverse_input_channels through eltwise with constant with shape=[]
* changed fix to more accuarate to avoid possible issues
* Remove unnecessary iterating over producer outputs
Co-authored-by: sadolini <svetlana.a.dolinina@intel.com>
* Property to force terminate tbb threads
During inference done, tbb threads cannot be closed by itself, which cause memory leak and unload/lingering threads.
Sometimes the tbb threads need to be terminate for resource(memory, thread) consumption
This PR contains:
1. Add a new property to control whether force to terminate tbb threads.
2. Property key is "FORCE_TBB_TERMINATE", default value is false.
3. Explicitly to terminate tbb task scheduler during unload openvino dll if this property is set true.
e.g: core.set_property(device, ov::force_tbb_terminate(true));
4. If not set FORCE_TBB_TERMINATE, there will be no any additional tbb operations.
Change-Id: I32dc0ba122bb19a9dbf3ba12fdd596aad9ac54b4
* Fix executorManager test case
Change executorManager from static to be dynamic, the test case should fit this change.
* Change frontendManger to be non-static instance
Make frontendManger to be non-static instance.
We should guard it is not released before Model, due to Model will use the mem allocated by frontendManger.
So put frontendManager reference in ov::Model to make it work.
* Fix race condition between executor and executorManger
* Add test case for tbb property
1. Add basic test case for ov::force_tbb_terminate property
2. set ov::force_tbb_terminate to be false
* Avoid terminate tbb in case of no tbb thread created
* Fix Constant ops segmentfault issue
There is segmentfault issue during Constant destruction, which is caused by some shared memory is double free
Test case is:
ie = IECore()
net = ie.read_network(model=test_net_xml, weights=test_net_bin)
query_res = ie.query_network(net, device)
func_net = ng.function_from_cnn(net)
ops_net = func_net.get_ordered_ops()
ie and net will be released before ops_net destruction, so Constant will free the shared memory that has been freed
* Make constant::m_data is released before frontendmanager
* tiny format change
* change tbb blocking_terminate to terminate
Tbb blocking_terminate calling will cause some segmentfault during run some special models,
the reason may comes from block_terminate cause current thread block here to wait for tbb exit,
but cannot handle some resource dependencies.
After adopt terminate(), the dependencies can be resolved and no segmentfault any more.
Change-Id: I0b920630a25cd3fd2747c57ec71ca749ba35573b
* Remove unnecessary dependencies
* Disable dynamic lib test case in static library compilation version
As CVS-68982 description, we should disable the test case which will load
dynamic library in openvino static library compilation.
* Fix nested-namespace-definition issue
* Address reviewer's comments
* Refine ov_partial_shape for OV 2.0 C interface
To avoid potential string security problem, remove string pointer from ov_partial_shape structure.
* Remove redundant code
* fix typo issue
* fix shape test issue
* fix some minor issues
* Address reviewing comments
Use Dimension to represent rank of parital shape.
* Appy safer method to parse partialShape string
1. adopt ov::Dimension::value_type to construct ov::Dimension
2. safter method to convert string to dimension value
3. apply std::vector<std::string> to replace std::vector<char *> during pasrsing partialShape string
Change-Id: I0e0b70a915fc5c5fefad51de51f167798854f55e
* Convolution concat sum inplace conflict fix
* Minor refactoring.
* Rebase to OV2.0, build pass.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Remove old file.
Rebase introduce this file by mistake.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Move functional test for subgraph.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Disable some crash test for continue to test others.
* Rename ConcatConvSumInPlaceTest to ReLuConcatConvSumInPlaceTest
fix ci crash issue.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* Revert "Disable some crash test for continue to test others."
This reverts commit f7a8677c002747b45e84f74672f76e2fdfc7ab22.
* Add const for inPlace.
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* fix build issue, missing braces;
Co-authored-by: Maksim Kutakov <maksim.kutakov@intel.com>
* Add signal stack management for AMX in linux python API
* fix wording
* fix empty line
* add AT_MINSIGSTKSZ definition
* Fix misspelling and conditional compiling on __linux__
* Change read_image() into generate_image()
* Move test utils from testdata repo to local files
* Minor changes
* Remove unnecessary code
* Minor changes
* Fix compatibility tests
* Fix imports for Azure pipeline
* Move model generation into test_utils
* Minor changes
* Minor changes
* Update linux.yml CI
* Remove testdata repo from .ci/linux.yml
* Remove testdata repo from pipelines
* Fix Azure compatibility tests
* Reset linux.yml
* Remove testdata repo from linux CI
* Try eliminating one of configs
* Attempt at fixing Azure tests
* Add separate utils for compatibility
* xfail comp if op tests
* Minor changes
* Revert changes to .ci files
* minor changes
* Remove xfails
* Remove unecessary import
* Skip if op tests
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* add paddle op top_k_v2
* rebase
* fix variable support issue for paddle top_k_v2
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* Update src/frontends/paddle/src/op/top_k_v2.cpp
Co-authored-by: Bo Liu <bo4.liu@intel.com>
* format the top_k_v2.cpp
Co-authored-by: meiyang-intel <yang.mei@intel.com>
Co-authored-by: Bo Liu <bo4.liu@intel.com>
They sporadically impact CI... possible reason is the order of paddle and openvino is not guaranteed when more than
one bboxes have equal scores.
Actually there is no need for these random tests as the remainding cases have covered them.
* draft pr for planar and fsv16
* draft pr for general test
* update fusion test (failing)
* update fusing test (pass)
* update fusing test (include exception)
* clean gpu unit test
* review comment applied
* unit test cases added & cpplint applied
* cpplint error fixed
* change gpu test cases for fp16
* fusing test fix generate_unique_indices
* fix typo
* revise cl kernel for occasions when updates shape is altered
* Initial files & cmakefiles for ov 2.0 c api development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add all ov 2.0 C APIs define
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix review comments
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Disable test of OV 2.0 C APIs test for tmp
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add related property key for ov 2.0 C-API
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add description for ov_property_key_e
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add EXECEPTION handling
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* compiledModel add interface
* add inferrequest interface
* solve cpplint problem
* Finished OV 2.0 C-APIs PPP related development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix code review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add ov::tensor API
* add compiled model func
* Finished C-API funs about core, model, node development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] add const to ov_output_node
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] Using define GET_OV_ELEMENT_TYPE & GET_CAPI_ELEMENT_TYPE in tensor APIs
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C-API] add string initialize
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* add inferrequest func
* add move construction to runtime_model
* supplement two infer request interface functions
* [OV 2.0 C-API] Add the common framwork of unit test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* modify ov_infer_request_get_profiling_info
* add tests dir
* restore CMakeLists.txt
* Fix the bug of COPY in Tensor
* [OV 2.0 C API] Finished core related function unite test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Add ov:Tensor API test
* [OV 2.0 C API] fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* add some infer request test
* add compiled model test
* [OV 2.0 C API] Finished preprocess related function unite test
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Fix review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Modify to use default model
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* transfer device_name from fix value to parameter
* add some infer request test
* remove compiled model get_property test
* add infer request tests
* Add ov::model Test and modify Tensor Test name
* Determine whether partial shape meets the standard
* Add get tensor name function and Modify reshape test case
* modify fixed tensor name,remove unnecessary comparison
* add ov_model_get_nodes_info, modify according to comments
* Update reshape test
* extract common function, modify interface about get tensor name,shape and type
* modify according comments
* [OV 2.0 C API] Finished hello classification with ov 2.0 c-api development
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] Fixed hello classification with ov 2.0 c-api review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [OV 2.0 C API] delete inactive code hello classification with ov 2.0 c-api
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix clang format issue
* [OV 2.0 C API] rename
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix windows build erre
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Apply qsort for sorting data
Apply qsort for sarting data
Fix issues of "potentially uninitialized local pointer variable"
* Not use deprecated INSTANTIATE_TEST_CASE_P for c api gtest
INSTANTIATE_TEST_CASE_P is deprecated, should use INSTANTIATE_TEST_SUITE_P.
* Fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* [Ov 2.0 C API] Add error info
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix some review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* Fix review issues
Signed-off-by: xuejun <Xuejun.Zhai@intel.com>
* polish error message for ov c api
* Redefined ov_shape_t, ov_partial_shape_t and ov_layout_t. Modified functions and test cases involving these variables
* Added the conversion between char* and partial_shape
* Add partial_shape_to_shape
* prune code
* modify split
* Use regex to split and search pattern
* Modify str_to_char_array delete
* Add the judgment of rank
* Fix compiling error
Fix issue: address of array 'shape.dims' will always evaluate to 'true' if -Wpointer-bool-conversion
Co-authored-by: xuejun <Xuejun.Zhai@intel.com>
Co-authored-by: sunxiaoxia2022 <xiaoxia.sun@intel.com>
Co-authored-by: ruiqi <ruiqi.yang@intel.com>
* gather blocked format
* enable double blocked
* 5d test
* support cross dimension
* Add some disabled test for later use
* Support non-default planar formats
* It has better performance by using reduction kernel instead of pooling kernel in oneDNN for reduction layer.
* Stop using global pooling instead of reduce primitive
* Use oneDNN reduction if its mode is supported by optimized onedNN kernel
* activation pow is supported
* Use clDNN reduce if 3d or redundant reduce, tensor size mismatch
* Updated thirdparty onednn_gpu
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Co-authored-by: Wei Tang <wei1.tang@intel.com>
Co-authored-by: Chen Kurt <kurt.chen@intel.com>
* [GPU] Implement Roll kernel
* [GPU] Add Roll kernel selector
* [GPU] Add Roll primitive
* [GPU] Add Roll helpers
* [GPU] Implement unit tests for the Roll operation
* [GPU] Add Roll operation to GPU plugin
* [GPU] Add single layer tests for the Roll operation
* [GPU] Add changes after review
* [GPU] Improve cldnn unit test
* Dynamic shape memory reuse solution
* Fix Split node to properly work with dyn mem
* Fix race condition for Memory mgrHandle
* Avoid Memory race condition between GetData and SetDataHandle
Add a lock for race condition between ov::intel_cpu::Memory::GetData() and ov::intel_cpu::Memory::SetDataHandle() is not a good solution,
which will impact the inference performance. We found that it is unnecessary get edge DataPtr in inferRequest::SetBlob or GetBlob, which
only need the tensorDesc, so we can only get tensorDesc to replace get dataPtr to avoid this race condition.
* Resolve reviewer's comments
* Avoid performance impact due to frenquent reset MemMngrHandle
If MemMngrHandle already has been assigned an external buffer, it can be reused.
Else it need create a new one.
* multiclass_nms opset9 spec, api, reference, paddle fe mapper, paddle fe unittest.
* multiclass_nms opset9 cpu node impl.
* multiclass_nms opset9 shape infer fix.
* multiclass_nms opset9: add transform ConvertMulticlassNms8ToMulticlassNms9.
* ConvertMulticlassNmsToMulticlassNmsIE: to MulticlassNmsIEInternal
* add test dependency package paddledet==2.1.0
* 1. fix for roisnum overflow. 2. common shape_infer private function.
Signed-off-by: jialipen <cecilia.peng@intel.com>
* 1. use common infer_shape helper. 2. fix roisnum overflow issue. 3. fix for nmsWithEta.
* test suite for opset9 multiclass_nms smoke tests pass, with both static and dynamic shapes.
code clean for unit test.
* decouple specification from this PR.
* op fuzzy: dynamic input/output
* reference impl refactor
* multiclass_nms_base no need clone_inputs.
* code clean
* restrict ppdet import
* fix clang format error
* change ppdet import to resolve CI fail issue related to its dependency.
* fix CI
* refactor: multiclass_nms_shape_inference for opset9 and reference impl.
TODO: could be applied to opset8 and even matrix_nms.
* fix CI build failure.
* CI fix for ambiguous namespace reference issue when
building static libs.
* update nms save_model python scripts.
* dynamic inputs for NMS with CPU plugin.
* copyright header for test scripts.
* op comformance test for multiclass_nms_9.
* minor update: is_type
* python opset9 and multiclass_nms
* flake8 CI fix
flake8 CI fix
flake8 CI fix
* remove NmsBase. stage1.
flake8 CI fix
remove NmsBase. stage 1 fix.
* rm NmsBase. stage2.
* more multiclass_nms prop tests and fix.
* remove unchanged ops from binding opset9.
* dependcy of paddle_tests.
* fix: add MulticlassNms to op mapper.
* clang format fix
* fix merge error.
* add formats for 3d conv
data formats
-bs_fs_zyx_bsv32_fsv32
-bs_fs_zyx_bsv32_fsv16
-bs_fs_zyx_bsv8_fsv4
-bs_fs_zyx_bsv8_fsv2
-bs_fs_zyx_bsv16_fsv32
-b_fs_zyx_fsv2, b_fs_zyx_fsv4
weight formats
-os_is_zyx_osa2_isa8_osv8_isv2
-os_is_zyx_osv8_isv4
-os_is_zyx_osv8_isv2
-gs_oizyx_gsv32
* add supported formats for primitives
* choose onednn convolution impl for 3d conv
* optimize layout of shallow depth convolution
* remove reorder for conv
* Don't remove reorder between bs_fs_zyx_b32_f16/f32 and bfyx.
* add formats to SetDefault() to optimize gws/lws for quantize/eltwise
* fallback cldnn if onednn pooling's layout is b_fs_zyx_fsv32 and i8.
* fixed wrong position for new weight formats
* restore imad_case()
* This func is used to choose format for fallbacked cldnn
* [GPU] add debug flag: OV_GPU_SerialCompile
0(default): parallel compile
1: serial compile
* add is_mixed_layout
* remove format::bs_fs_zyx_bsv8_fsv4 in needs_onednn_small_ic_to_blocked
* prevent to fuse the reorder which is between quantize and conv
* shallow feature first conv
* Revert "[MO args][ONNX FE]fix cutting graph with input, output or both (#9698)"
This reverts commit 2b03d5fe66.
* Fix cutting the graph when inputs/outputs are passed to the MO
* Check that port exists
* Simplification of getting node port
* Reducing amount of nesting inside searching of node by operation name
* Refactoring
- remove mutable default arg
- changes in code style
- change variables name
* Check that user input data type is dictionary
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
* [GPU] Modify Softmax single layer tests to check Softmax-8 is supported with axes in [-rank, rank) interval
* [GPU] Fix cldnn::softmax::dimension_t documentation
* [GPU] Fix ParamsKey::EnableSoftmaxDim
Support Z dimension.
* [GPU] Add Softmax single layer test that checks 5D case
Since some Softmax kernel code contains ifdef on 5-dimensional case,
a test case is needed that covers this functionality.
* [GPU] Support axis 0 in Softmax
* [GPU] Modify Softmax single layer tests to check axis 0
* [GPU] Modify Softmax items class optimized kernel to handle axis 0 correctly
Modify single layer test accordingly.
* [GPU] Modify Softmax unit-test to check softmax::normalize_b
* Split SoftMaxLayerTest into opset1 and opset8 versions
Use SoftMax8LayerTest in the tests throughout repository.
SoftMaxLayerTest now defaults to SoftMax1LayerTest for compatibility.
* [GPU] Add f16 test-case for Softmax single-layer test
Co-authored-by: tgubanova-lohika <tgubanova@lohika.com>
* dft with single layer test
* idft with single layer test
* fix output param usage in dft
* update dft according to the clang-format
* move output layout setup to calc_output_layout
* add support for other dimensions
* add clDNN unit test for DFT/IDFT
* remove unnecessary original rank
* use defined formats in kernel
* fix dft docs
* changes after review
* Revert "fix dft docs"
This reverts commit 45b05172dfd161d92dae6d26e0f1b74748e56fd5.
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Mykhailo Hnap <mhnap@lohika.com>
With new networkx release (2.8.1) some of MO tests started to fail
with following error:
```
def __setstate__(self, state):
self._graph = G = state["_graph"]
self._adjdict = G._pred if hasattr(G, "pred") else G._adj
AttributeError: 'Graph' object has no attribute '_adj'
```
Seems like regression that was introduced in
f50fc70b8c
convolution_gpu_yxfb_yxio_b16 for fp16 has hardcoded reqd_work_group_size
to (16, 1, 1). On devices where CL_DEVICE_MAX_WORK_GROUP_SIZE is 512
GetOptimalLocalWorkGroupSizes picks (16, 2, 1) for LWS.
That causes issues during clEnqueueNDRangeKernel since LWS doesn't match
with reqd_work_group_size in the kernel.
* Add single layer tests for GPU
* Add GPU primitive for ExperimentalDetectronGenerateProposalsSingleImage
* Add kernel for ExperimentalDetectronGenerateProposalsSingleImage
* Add unit test
* rename abbreviation edgpsi to the full name experimental_detectron_generate_proposal_single_image
* Add f16 support to operation
* Add f16 support to the unit test
* Add notification about the second output in primitive
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
* Added shell for Eye-9
* Updated spec for Eye-9
* Added reference for Eye-9
* eye cpu
* Added op impl check for Eye-9
* Fix unallowed dynamic to static dim conversion in eye shape_infer
* Add template plugin tests for dynamic shapes
* Add template plugin tests for dynamic shapes batch input
* Enable batch shape input dynamic rank
* Uncomment 3D batch cpu Eye tests
* Update assertions and messages
* use ov::element type
* Remove redundant evaluate from eval map
* Style fix
* Add static_cast<T>(1) to cpu eye
* Add defaults to eye cpu class members
* Reuse out_ptr and checks
* Reutrn if onesPerBatchNum == 0
* Add Eye CPU Dynamic shape tests with 2D batch
* Additional test cases for CPU and reference
* Disable 3D batch eye cpu tests
* Fix CPU implementation for matrix with not equal cols and rows
* Update CPU test name
* Disable CPU Eye 3D batch static shapes tests
Co-authored-by: Alexandra Sidorova <alexandra.sidorova@intel.com>
Co-authored-by: Yury Gaydaychuk <yury.gaydaychuk@intel.com>
* Update oneDNN rls-v2.6
* Support weight tag for oneDNN v2.6
* Fix first conv selection issue in oneDNN
* oneDNN v2.6 required specific tags to run jit:ir primitives.
* any_tag can find optimized primitives in oneDNN.
* Enable aBcd2b src tag for oneDNN v2.6
* Add create_memory_desc from format string.
* Apply group depthwise separable conv uses jit:ir in oneDNN v2.6
* Use byxf format.
* Update only use acdb format in shallow group conv
* Fix refconv selection in shallow conv with post operations.
* Enable reshape int8
* Fixed quantize fusing through reorder+reshape : Fixed the condition to check per_tensor_input_shift only when need_input_shift is true
* minor change
* Allow FP quant to be fused to FC/gemm
* Disable reshape tranform for onednn until onednn FC is optimized
* [GPU] Support implicit crop in input transposition.
+ Make the crop in front of quantize implicit by changing output format to bfyx.
+ Use implicit concat after quantize nodes.
* Add unit test for implicit crop and concat.
+ remove unnecessary code.
+ Modified jitter Load for planar input of fused eltwise
+ Bugfix in jitter if planar input has LT_ALIGNED_READ
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Update the branch to be used for 2022.1 and remove reference to
-staticdev package which isn't generated anymore.
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* roi_align_9: ov_core, transformations, template_plugin
* roi_align_9: CPU Plugin
* keep only constructor with enums which is aligned with spec
* remove evaluate function for ROIAlign_9
* Add op check test for operation ROIAlign-9
* Apply suggestions from code review
* fix version name from 'v0' to 'v3' in transform part
* use common shape_infer function for v3 and v9
* remove'tf_' prefix for ROIAlign::AlignedMode to avoid misleading for models from different platforms
* Update Convert_Model_From_TensorFlow.md (#11425)
* Apply suggestions by Yuan
The changes are made in the port PR, so will be published with the 22.2 version.
Co-authored-by: Evan <evan.juras@gmail.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Docs: Add links to specific examples (#11618)
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
* Add links to specific examples
This edit adds links to more example applications, making it easier for users to discover how to build an OpenVINO application around their specific model.
* Add links to MO installation and ONNX examples (#11617)
These edits help make it easier for a new user to find more information on how to convert ONNX models.
* Apply suggestions by Yuan
The changes are made in the port PR, so will be published with the 22.2 version.
Co-authored-by: Evan <evan.juras@gmail.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* selectdevice returns MULTI:device in cumulative_throughput
* load multi with throughput and disable cpu helper in cumulative
* disable cpu helper in cumulative_throughput
* add cumulative to bechmark_app help message
* modify benchmark_app.hpp clang-format
- Add TC for decrease_label_id=true to cover MXNet-style NMS models
- Fix segfault issue that occurs when data precision is fp16
Signed-off-by: Andrew Kwangwoong Park <andrew.kwangwoong.park@intel.com>
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Einsum test helper
* Einsum single layer tests
* Add Einsum decomposition with repeated labels and ellipsis support
to GPU transformations pipeline
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
Check first whether the path specified by --input_dirs is a directory.
Otherwise the argument is always treated as a .lst file,
and in case it is a directory it silently fails,
which causes the test runner to not execute any tests intended.
porting from 22.1 as per Andrey's request from 04.08
* sphinx google search
* fixes
* fixes
* fix version tabs
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
* DOCS-benchmarktool_python_correction
add info on tool installation
* Update docs/OV_Runtime_UG/Samples_Overview.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* Try to improve gflags
* Try to improve gflags: part 2
* Tried to use dependencies on system
* Use nlohmann_jsonConfig from system
* Enabled nlohmann_json from system
* Improvements
* handle system gflags in developer package
* Simplifications
* Simplify dependency management
* Corrected package names
* Fixed subgraphsDumper configure stage
* Try to fix rhel8
* Try to fix macosx
* Fixed VPUX build
* Fixed aliasing issues
* Suppress some wanrings
* export gflags when build it
* Fixed some LTO
* Try to fix Mac
* revert
* use gflags as private dependency
* Aligned targets in developer package
* Fixed frontends tests build on U20 with LTO
* PAssed
* Don't use pkg_search_module(zlib ..) during cross-compilation
* Removed unused variables
* Fixed finding of zlib during cross-compilation
* CVS-83529
* Use nothreads_static
* Fixed python
* Moving PWL to ngraph
* improving the running time of php_search; refactoring the pwl operation
* fixed erros & refactored code
* moved PWL op to GNA
* Update src/plugins/intel_gna/ops/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/reference/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* fixed compilation error
* Update inference-engine/tests/unit/gna/ngraph/transformations/gna_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* added some tests; changed algorithm of checking accuracy of pwl; refactoring
* added first and last segments; added fq and fixed errors
* fixed after review & rewrote some tests on ngraph
* removed debug logs & fixed code style check error
* s/ngraph_helper/ngraph_util
* removed TRANSFORMATIONS_API in PWLApproximation class declaration
* removed OPENVINO_API in Pwl class declaration
* replaced the deprecated version of evaluate() with a new one
* fixed some problems after reviewing
* fixed a problem when a value of function of left point of segment is less than minimum of function
* corrected a value of the right point of last segments
* [GNA] Moved pwl func tests
* Deleted deprecated test
* s/OPENVINO_RTTI/OPENVINO_OP
* Deleted conflicted test file
* fixed after review
Co-authored-by: Dmitrii Khurtin <dmitrii.khurtin@intel.com>
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* [IE Samples] Activating new parameter is compact mode(memory_reuse) in speech sample
* changed format
* renamed the option to memory_reuse
* renamed the option
* DynamicShapeResolver is able to save information about dynamic output in order to pass it in INFER_DYNAMIC_SHAPE mode. Previously, it propagated fully dynamic output shape (however ranks were equal) and dynamic Convolutions and Poolings were performed incorrectly. Now in the case of dynamic batch, DSR propagates only dynamic batch and Convolutions and Poolings are performed properly as a Loop of single-batch operations.
* Fixed dynamicToStaticShapeTranspose transformation. There was a bug: transposition indices could not be applied with Scatter because the formula is not applicable for this. Replaced with Gather.
i.e. Shape of output tensor of Transpose with transition [0,3,1,2] indices (NHWC [1, 224, 224, 3]->NCHW [1, 3, 224, 224]) was calculated by ScatterElementsUpdate. So output_shape[transposition[i]] = input_shape[i] and the result was output_shape=[1, 224, 3, 224] which was wrong. Vise-versa Gather does output_shape[i] = input_shape[transposition[i]] and the result is [1, 3, 224, 224] which is right.
* MaxPool and AvgPool can be sliced for loop in case of dynamic batch
* Convert stage for inputs is not inserted in the VPU model in the case of OV API 2.0. It did not cause a problem with non-dynamic functions because Graph Transformer has a pass to eliminate redundant converts (u8->f16, ~f16->f16~). In the case of dynamic inputs, yet another inserted Convert breaks data<->shape relations.
* Try to improve gflags
* Try to improve gflags: part 2
* Tried to use dependencies on system
* Use nlohmann_jsonConfig from system
* Enabled nlohmann_json from system
* Improvements
* handle system gflags in developer package
* Simplifications
* Simplify dependency management
* Corrected package names
* Fixed subgraphsDumper configure stage
* Try to fix rhel8
* Try to fix macosx
* Fixed VPUX build
* Fixed aliasing issues
* Suppress some wanrings
* export gflags when build it
* Fixed some LTO
* Try to fix Mac
* revert
* use gflags as private dependency
* Aligned targets in developer package
* Fixed frontends tests build on U20 with LTO
* PAssed
* Don't use pkg_search_module(zlib ..) during cross-compilation
* Removed unused variables
* Fixed finding of zlib during cross-compilation
* added recursive run for transformation to fix fp16 IR with Interpolate inside If
* added test for interpolate inside If
* remove useless variable
* fixed transformaion for divide
* fix code style
* commit auto change
* review fix
* add test for recursive call of divide marks
* removed empty line
* [MO] Support TensorFlow Grouped Conv2DBackpropInput
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Correct computation of group number for ConvBackpropInput operation
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix get_conv_backprop_groups function
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Add unit-tests for Deconvolution shape inference
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
Compilation with ENABLE_CPU_DEBUG_CAPS was fixed.
Previous to this change it failed due to undefined dnnl::impl::md2dim_str
(since DNNL_VERBOSE was disabled in the scope of PR #11244).
* Removed a redundant image
* Fixed ops specifications and other issues
* converted html links to anchor links
* converted html links to anchor links
* Fixed a link
* Fixed a link
* Changed anchor links according to dev review
# Conflicts:
# docs/OV_Runtime_UG/Operations_specifications.md
* Right fill in the values of the inputs
* Using create_and_fill_tensor_unique_sequence() instead of create_and_fill_tensor()
* Fixing a problem with a missing parameter when calling the create_and_fill_tensor method
* Fix Bucketize Conformance tests inputs generation for Template plugin
* Correct filling of the first port (data)
* Correct the order of passing arguments to the InputGenerateData constructor
* Full range correction for random numbers
* Refactoring the argument sequence of the InputGenerateData class constructor
* A small imperfection
* Rollback changes that are related to range
PR for 22.1 made, now porting to release...
some discrepancy between this version and the 22.1 branch seems to exist, so I adjusted the conflicting link to avoid build check errors...
the overview has been merged, the remaining articles are reviewed here
* Paddle FasterRCNN Ops Conversion: roi_align, strided_slice, where
* add check for 'aligned' feature of 'roi_align' op; use common function for idx_node in 'striede_slice' op
* Apply suggestions from code review
* use common funciton for stride_slice and slice, OP_CHECK for 'where' op conversion
* Apply suggestions from code review
* Fix batchability check of MAX_BATCH_SIZE
* Applied review comment
* clonenetwork in auto
Signed-off-by: fishbell <bell.song@intel.com>
* clone in correct way
Signed-off-by: fishbell <bell.song@intel.com>
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
* Frontend exception safety
Every call to frontend's API (except Places) can throw exception. If during exception handling, FrontEndManager is destroyed and calls 'dlclose' for plugin - call stack will be corrupted and crash will occur.
Solution is to wrap 'plugins' calls with try/catch and throw new exception in 'openvino' context
TODO: currently "Place" objects don't have 'actual' wrappers, so exception in 'place' objects will potentially cause such crash (if exception handler destroys FrontEndManager). Workaround for user would be to try/catch any calls of Place API on their side.
We're not expecting users to use Place API directly, so this workaround looks acceptable
* Add check for exception message
* Keep type of frontend exception during rethrow
* IR FE tests: don't expect InferenceEngine::exception as it be not propagated as is by FrontEndManager
* [Python API] Remove old api class from the new api
* start working on refactoring of OVAny
* fix tests
* fix code-style
* remove tuple test
* fix test
* fix omz hash
* one more overload
* fix pyfloat
* move from_ov_any to utils
* code-style
* move function from common to utils
* Build with system TBB
* Fixes
* Check whether system TBB is available
* Try to fix ONNX Runtime build with system TBB
* Test
* Fixed compilation of threading.cpp
* Fixed unset of cache dirs
* Limit dearch paths of TBB
* Try to enable pip packages with custom TBB
* Fix for TBB 2021.2
* Install only needed TBB libraries
* Install TBB from system to pip package
* Reverted usage of TBBROOT
* Fixed oneTBB case
* Try to fix Android
* Escape some paths
* Added samples path
* Fixed TBBBind usage for case of system TBB
* Added specification for EyeLike-9
* Update docs/ops/generation/EyeLike_9.md
* removed batch from TF
* minor fix
* Applied comment by Anton
* Added new example with dynamic output, added corner case
* Fixed corner case description
* Rename matrix
* applied comments by Yuan
* Added diag_idx as input, minor fixes, renaming
* added support of batch_shape from TF
Co-authored-by: Andrei Kochin <andrei.kochin@intel.com>
* [GNA] Fuse all FakeQuantize layers with their previous layers
* [GNA] Fuse FQ with previous layer if it's not required for precision change
* [GNA] Fixed MatMulOverloadCorrectionTest
* New command line parameters format for speech sample
* fixed notes
* changed format for scale factor
* changed format for scale factor in tests
* added more variants, when name is directy specified for i/o/r like it is done for sf
* removed nthreads flag
* fixed notes
* changed output params
* updated tests with new format
Co-authored-by: Alexander Zhogov <alexander.zhogov@intel.com>
* Fix for str_to_container if string value has whitespaces
* Add test
* Add trim for leading and trailing whitespaces
* Apply comments
* Apply comments 2
* Apply comments 3
* Enable explicit TBlob declaration in all compilers
This fixes problems when linking gcc compiled IE with clang compiled
applications.
Previous to this change, only clang compilers would consider TBlob<T>
templated types as declared externally. When *declared* explictly (with
the `extern template` syntax), the C++ spec says
that any inline methods of the templated class (such as TBlob<T>
constructors) should be ignored in favor of the externally instantiated
version of that templated type:
"An explicit instantiation declaration (an extern template) skips
implicit instantiation step: the code that would otherwise cause an
implicit instantiation instead uses the explicit instantiation
definition provided elsewhere (resulting in link errors if no such
instantiation exists)."
However, when IE is compiled with gcc, it does not see the explicit
`extern template` declarations of TBlob<T> (due to the `#ifdef
__clang__` guards in `ie_blob.h`). As an end result, presumably due to
link-time-optimizations during IE library compilation(?), none of the
TBlob<T> implementations are actually included in the IE dynamic
libraries.
* Fix warnings for windows
* Fix typo
* revert previous version of convert_seq_to_ti transformation
* try to check that outputs of TI are connected to Result nodes
* add unit tests
* fix codestyle
* fix Memory tests
* revert local change
* revert local change
* replace duplicated code with lambda
* Written nGraph reference for the operation RDFT.
* Used std::reverse() algorithm to simplify the function reverse_shape() from fft_common.cpp.
* Added assert into the function offset_from_coords_and_strides().
* Deleted redundant variable.
* Deleted redundant functions from the reference implementation of (I)DFT.
* Renamed the method reverse_shape() in fft_common.hpp.
* Code style fix.
* Paddle FasterRCNN Ops Conversion: greater_than, less_than, gather, floor
* Apply suggestions from code review
* fix 'gather' testcase failure issue on CI
* implement 'axis' input for 'Gather' Op conversion with testcase comment;use common function for all elementwise Ops
* Fix setupvars.bat patching
setupvars.bat shoudl not be patched for regular Debug and Release
configurations.
* Use SRTEQUAL for cmake string comparison
* Improve performance for 'ov::Model::add_output'
On first call of `add_output(tensor_name)` all available tensor names are cached.
Next calls take nodes from cache which significantly reduces complexity.
Cache is invalidated if topological cache is not valid or cache points to incorrect output (no tensor name of this node anymore)
The same caching is done for 'add_output(op_name, output_index)'
Tests:
- Verifies that adding outputs to all nodes has linear complexity O(N), not O(N^2)
- Verifies cache invalidation scenarios
* Fix python tests
* Update topological cache after add_output(Output<Node>) by adding result to the end of cached ops
* Add 'm_shared_rt_info' to 'result node just for consistency (there is actually no scenario which may fail due to absence of this info for Result
* Added test cases to verify that names cache should be cleared on refresh of 'get_ordered_ops'
* wip remote tests2, fixed smoke_canInferOnUserContext
* completed the OV 1.0 tests for remote blobs
* updated OV 2.0 tests for remote blobs with auto-batching (using the ngraph func that is reshape-able by the batch)
* re-using the DetectionOutput-based ngraph func that is 100% batch-reshapeble
* Add test case for the loadNetwork with Auto Batching.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Enable logic test case for GPU.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Enable property for config key 'AUTO_BATCH_DEVICE_CONFIG'.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Omit {}.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add commont test for the property ALLOW_AUTO_BATCHING.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add commont test for AUTO Batching plugin.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Moving PWL to ngraph
* improving the running time of php_search; refactoring the pwl operation
* fixed erros & refactored code
* moved PWL op to GNA
* Update src/plugins/intel_gna/ops/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/reference/pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/ops/pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.hpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* Update src/plugins/intel_gna/transformations/transpose_to_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* fixed compilation error
* Update inference-engine/tests/unit/gna/ngraph/transformations/gna_pwl.cpp
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
* added some tests; changed algorithm of checking accuracy of pwl; refactoring
* added first and last segments; added fq and fixed errors
* fixed after review & rewrote some tests on ngraph
* removed debug logs & fixed code style check error
* s/ngraph_helper/ngraph_util
* removed TRANSFORMATIONS_API in PWLApproximation class declaration
* removed OPENVINO_API in Pwl class declaration
* replaced the deprecated version of evaluate() with a new one
* fixed some problems after reviewing
* fixed a problem when a value of function of left point of segment is less than minimum of function
* corrected a value of the right point of last segments
* s/OPENVINO_RTTI/OPENVINO_OP
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
+ Fix colorization-sig accuracy issue using oneDNN
Memory crash in case reuse_eltwise_sum_post in oneDNN and memory_pool
And print node in/out gpu_usm_mem addr at OV_GPU_Verbose >= 1
+ Check the size of z spatial axis for checking fulltensor.
+ Remove program_helpers's functions.
Co-authored-by: hyunback <hyunback.kim@intel.com>
Scenario:
- Node "Split" with multiple outputs (e.g. 3). All outputs are connected to "Result"s
- Add post-processing step (e.g. convert element type, can be also implicit)
Issue: after post-processing, 3 new results will be created, each will have "Split" friendly name - inconsistency with IRv10 rules
Fix:
- For nodes with multiple outputs, add '.<idx>' suffix to new output's friendly name
- If no post-processing is applied, return immediately, keeping original results as is
Tests:
- Split with 3 outputs where 2 outputs have post-processing.
- Split with 3 outputs, post-processing doesn't create any nodes
* [XLink] - tests to smoke scope
* [XLink] - small change in XLink related file to trigger ie-tests-windows-myriadx
* [XLink] - azure windows and linux
* [XLink] - azure windows and linux
* [XLink] - azure windows and linux - change dir?
* [XLink] - azure windows and linux - change dir?
* [XLink] - azure windows and linux - install?
* [XLink] - azure windows and linux - xlink cmake
* [XLink] - azure windows and linux - XLinkTests because another target with the same name already exists
* [XLink] - azure windows and linux - XLinkTests because another target with the same name already exists
* [XLink] - azure windows and linux - install TARGETS given target XLinkTests which does not exist
* [XLink] - azure windows and linux - remove smoke
Inserting padding into oneDNN primitive has issue with implicit concat behavior.
Deconv onedNN initialized output buffer to 0 including padding area. Padding area should be reserved.
Use oneDNN offset from program_node in/out lower_padding instead of oneDNN memory desc.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* add 3D shape to test and rename crop4d to strided_slice
* remove ConvertStridedSliceToCropNegative2 since 3D is now supported
* add myriad functional tests to skip-list
* update Auto docs
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update python snippets
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove vpu, fix a mistaken in python code
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update MYRIAD device full name
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update API name
old API use name Inference Engine API
NEW API usen name OpenVINO Runtime API 2.0
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update tab name, and code format
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix AUTO4 format issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* update set_property code
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* auto draft
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* mv code into .cpp and .py
modify the devicelist part accoding to the review
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove priority list in code and document
modify the begning of the document
remove perfomance data
remove old API
use compile_model instead of set_property
add a image about cpu accelerate
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix mis print and code is not match document
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* try to fix doc build issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix snippets code compile issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Added migration for deployment (#10800)
* Added migration for deployment
* Addressed comments
* more info after the What's new Sessions' questions (#10803)
* more info after the What's new Sessions' questions
* generalizing the optimal_batch_size vs explicit value message
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Perf Hints docs and General Opt Guide refactoring (#10815)
* Brushed the general optimization page
* Opt GUIDE, WIP
* perf hints doc placeholder
* WIP
* WIP2
* WIP 3
* added streams and few other details
* fixed titles, misprints etc
* Perf hints
* movin the runtime optimizations intro
* fixed link
* Apply suggestions from code review
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* some details on the FIL and other means when pure inference time is not the only factor
* shuffled according to general->use-case->device-specifics flow, minor brushing
* next iter
* section on optimizing for tput and latency
* couple of links to the features support matrix
* Links, brushing, dedicated subsections for Latency/FIL/Tput
* had to make the link less specific (otherwise docs compilations fails)
* removing the Temp/Should be moved to the Opt Guide
* shuffled the tput/latency/etc info into separated documents. also the following docs moved from the temp into specific feature, general product desc or corresponding plugins
- openvino_docs_IE_DG_Model_caching_overview
- openvino_docs_IE_DG_Int8Inference
- openvino_docs_IE_DG_Bfloat16Inference
- openvino_docs_OV_UG_NoDynamicShapes
* fixed toc for ov_dynamic_shapes.md
* referring the openvino_docs_IE_DG_Bfloat16Inference to avoid docs compilation errors
* fixed main product TOC, removed ref from the second-level items
* reviewers remarks
* reverted the openvino_docs_OV_UG_NoDynamicShapes
* reverting openvino_docs_IE_DG_Bfloat16Inference and openvino_docs_IE_DG_Int8Inference
* "No dynamic shapes" to the "Dynamic shapes" as TOC
* removed duplication
* minor brushing
* Caching to the next level in TOC
* brushing
* more on the perf counters ( for latency and dynamic cases)
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Updated common IE pipeline infer-request section (#10844)
* Updated common IE pipeline infer-reqest section
* Update ov_infer_request.md
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* DOCS: Removed useless 4 spaces in snippets (#10870)
* Updated snippets
* Added link to encryption
* [DOCS] ARM CPU plugin docs (#10885)
* initial commit
ARM_CPU.md added
ARM CPU is added to the list of supported devices
* Update the list of supported properties
* Update Device_Plugins.md
* Update CODEOWNERS
* Removed quotes in limitations section
* NVIDIA and Android are added to the list of supported devices
* Added See Also section and reg sign to arm
* Added Preprocessing acceleration section
* Update the list of supported layers
* updated list of supported layers
* fix typos
* Added support disclaimer
* update trade and reg symbols
* fixed typos
* fix typos
* reg fix
* add reg symbol back
Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>
* Try to fix visualization (#10896)
* Try to fix visualization
* New try
* Update Install&Deployment for migration guide to 22/1 (#10933)
* updates
* update
* Getting started improvements (#10948)
* Onnx updates (#10962)
* onnx changes
* onnx updates
* onnx updates
* fix broken anchors api reference (#10976)
* add ote repo (#10979)
* DOCS: Increase content width (#10995)
* fixes
* fix
* Fixed compilation
Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Aleksandr Voron <aleksandr.voron@intel.com>
Co-authored-by: Vitaly Tuzov <vitaly.tuzov@intel.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Victoria Yashina <victoria.yashina@intel.com>
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
* CPU device documentation refresh
* Bfloat16 inference page aligned with the new API
* Bfloat16 inference section moved to CPU main
* First review comments applied
* Second review step comments applied
* OneDNN reference changed to the GitHub page
* AvgPool added to the oneDNN ops list
* Add readvalue, assign to templte plugin test
* Fix clang error
* Fix clang error
* Remove unnecessary comment
* Fix type-casting error
* Fix ci issue regarding const value
* Change Function to Model
* Fix op scope
* Change way to get variable
* Fix type-casting error
* Set variable id to const
* Fix side-effect in ieFuncTests
* Implement Assign-3, ReadValue-3 in evaluates_map
* Correct setting attribute
* Correct setting attribute
* Remove unnecessarily added method
* Roll back v6
* Use member variable for variable_id in assign-3, read_value-3
* Get data pointer from host tensor
* Remove visitor API test for ReadValue-6, Assign-6
* Implement visitor api test for read_value-6, assign-6
* Fix clang error
* Split read_value and assign into each file for visitor test
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
This behavior is already used by default because ONNX is enabled by default and thirdparty/onnx/onnx/CMakeLists.txt forcing CMAKE_BUILD_TYPE to Release if it is not set
It fixes the following issues:
- When ONNX frontend is disabled - source is built for Debug, which is very unexpected comparing to Release with ONNX frontend enabled
- When ONNX frontend is disabled, even libopenvino.so could not be built due to some generated makefiles issues
It is set to 'Release' (not to 'Debug') to comply with default behavior when ONNX is enabled (it is default option working for most users)
* Performance improvement for constant creation
The issue is that 'are_all_data_elements_bitwise_identical()' is called every time in Constant constructor, and it potentially checks all buffer which is O(N) complexity.
While it is needed only if client uses 'get_all_data_elements_bitwise_identical'
Solution:
- Defer calculation until first call of 'get_all_data_elements_bitwise_identical'
- Store calculated value in mutable class member to reuse it on next calls of 'get_all_data_elements_bitwise_identical'
Test verifies both cases:
a) that constant creation with shared memory data (now O(1)) is significantly faster than creation+bitwiseCheck O(N)
b) Than once calculated, value is taken from cache, which is significantly faster than re-calculation
* fix clang-format
* Stash - Linux implementation
* Windows mmap implementation + unicode
* Clang for windows
* removed debug print
* Add handling of empty bin file
* fix windows includes
* Fix python test
* Unit tests
Fix for Constant with size > 4GB
* Fix review comments
* refactoring: get bias shape in bc and fbc algoritms
* use scipy to take most frequent shape
* pylint
* update reference
* pylint
* Update test_sanity.py
* update test_sanity.py
* Update test_sanity.py
* [GNA] Added SW_FP32 mode w/o SF for BasicLSTM
* deleted additional test
added sw_fp32 mode for exisiting test
changed reference output for new mode
* [GNA] Fixed according to review
* [GNA] Parametrized weights range
* fixed after review
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* Written header files for the nGraph operations RDFT and IRDFT.
* Written nGraph shell for the operation RDFT.
* Added missed include.
* Added RDFT to opset9 table.
* Code style fixes.
* Written the nGraph shell of the operation IRDFT.
* Added IRDFT to opset9 table.
* Started to write shape infer tests for RDFT.
* Refactoring: shape infer functions of RDFT and IRDFT moved into separate files.
* Written shape infer tests for RDFT.
* Written shape infer tests for IRDFT operation.
* Fixed code style.
* Fixes in the shape infer function of RDFT.
* Fixes in the shape infer function of RDFT.
* Fixes in the shape infer function of IRDFT.
* Deleted redundant includes in include/ngraph/op/irdft.hpp and include/ngraph/op/rdft.hpp
* Deleted redundant includes in include/openvino/op/rdft.hpp and include/openvino/op/irdft.hpp.
* Deleted redundant includes in cpp-files of nGraph shells of operations IRDFT and RDFT.
* Code style fixes.
* Shape inference functions of operations RDFT and IRDFT moved to the namespace ov::op::util.
* Deleted RDFT and IRDFT from docs/template_plugin/backend/opset_int_tbl.hpp.
* Deleted 'using namespace ngraph' from cpp-files of nGraph shells of operations RDFT and IRDFT.
* Fixed typos.
* Merged some loops in shape inference functions of RDFT and IRDFT.
* Written visitor tests for RDFT and IRDFT.
* Small change.
* Common part of RDFT and IRDFT shape validation moved into the separate file.
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* don't check dynamic shape when there is only one device
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* remove redundant if
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* mod docs/_static/images/dataset.png and docs/_static/images/inputs.png
* add new hint cumulative_throughput
* clang format properties.hpp
* add set properties and get properties test case for CUMULATIVE_THROUGHPUT
* reset docs/_static/images/dataset.png and docs/_static/images/inputs.png
* reset docs/_static/images/dataset.png and docs/_static/images/inputs.png
* reset dataset.png and inputs.png
* reset dataset.png and inputs.png
* remove test value cumulative_throughput from gpuplugin and cpuplugin testcase
* rollback dataset.png and inputs.png to 41818a377
* add fps log
add format '%lf' for log
add INFO_RUN and DEBUG_RUN, code only run when greater than special log level
add fps log for device
print device config info with DEBUG_RUN
add mock test for DEBUG_RUN and INFO_RUN
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* use n / end -start instead of (n-1) / ((nst start) -(1st start))
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Mark `get_type_info_static()` as hidden
Each plugin linked with openvino library contains `type_info_static` symbols. In case when one of the libraries is unloaded and app tries to get opset, it leads to segfault. So mark `get_type_info_static()` as hidden to use only one implementation exactly from openvino lib
* Fix "'visibility' attribute ignored" issue by moving `TestPass` out of test scope
* Fix clang format
* Small update of `If` op
* Revert "fix 79520 (#10449)" to correctly compare DiscreteTypeInfo via `==`
This reverts commit 29883a152a.
The change fixes FQ fusions for subgraphs like 'Const weights'->FQ->Transpose->Multiply.
After PullTransposeThroughFQUp transformation, we end up with following:
'Const weights'->Transpose->FQ->Multiply. Because of the Transpose on first
FakeQuantize inputs, Multiply could not be fused since FakeQuantizeMulFusion
expected that weights is a Constant node.
Ticket: 77785
* Performance improvement for constant creation
The issue is that 'are_all_data_elements_bitwise_identical()' is called every time in Constant constructor, and it potentially checks all buffer which is O(N) complexity.
While it is needed only if client uses 'get_all_data_elements_bitwise_identical'
Solution:
- Defer calculation until first call of 'get_all_data_elements_bitwise_identical'
- Store calculated value in mutable class member to reuse it on next calls of 'get_all_data_elements_bitwise_identical'
Test verifies both cases:
a) that constant creation with shared memory data (now O(1)) is significantly faster than creation+bitwiseCheck O(N)
b) Than once calculated, value is taken from cache, which is significantly faster than re-calculation
* fix clang-format
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* InputTensorInfo::from implementation
If user's application already has `ov::runtime::Tensor` object created,
it will be possible to reuse basic characteristics for input (shape, precision) from tensor using InputTensorInfo::from method
* Rename 'from' to 'set_from' as in Python 'from' keyword is used for import modules
Python bindings: from ov.Tensor and from numpy array
* Style fix (quotes)
* Apply suggestions from code review
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
* Fix code style
* Use set_from in hello_classification CPP sample
Co-authored-by: Ilya Churaev <ilyachur@gmail.com>
* add placeholder for python version of first snippet
* fix problem with placeholder
* fix wrong file name
* fix fragment name
* update python snippets
* move imports to the top of the code fragments
* [GNA] Single lstm-cell test added
* Added additional config for test
* one more input and hidden shape
* Added cell with ReLU
Deleted deprecated test
* test added as lstm_cell_basic
* Enabled gna_compact_mode
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* enabled compact_mode in all tests
Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
* fix .ncc_style target names
it was breaking configure on system with libclang-12-dev, clang-12,
ninja and cmake 3.17+(ninja complains about duplicate
target). with lower cmake version configure succeeds, but build exits
immediately with error. by replacing ninja with make error becomes
warning(it's still significant, make just skips duplicate rules, i.e.
doesn't check style of some source files, rule duplication is genuine
bug). without libclang-12-dev and clang-12 ENABLE_NCC_STYLE is OFF and
bug is not triggered
* silence uninitialized warning in core_integration
probably it was always initialized before use, but compiler wasn't made
aware of it
* fix function spelling to unbreak code style checks in benchmark_app
* include <thread> for std::this_thread
existing code was relying on namespace pollution by old libstdc++
* replace is_pod with is_standard_layout && is_trivial
is_pod is deprecated, it breaks build on current gcc
Co-authored-by: Serhii Pavlovskyi <spavlovskyi@lohika.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
When partial build is called for dryrun, do constant propagate too.
In normal case, partial build is not doing constant propate for saving build time of internal program.
However, if partial build is called with dryrun, it will fail at transfer_constants due to the generic nodes which does not have impl.
* Update IEEngine with the Dynamic models support
* Update with the batch
* Method naming fix
* Update image_loader & tests with dynamic models
* Update test_sanity.py
* Replace custom_mo_config from the model
* Modified the workflow diagram
* Moved supported topology lists to separate topics
* Additional changes
* Removed Supported Topologies list and Deprecated pages
* Created the Model Conversion Tutorials section for instructions for specific models
* Topic names alignment, removed Default_Model_Optimizer_Optimizations.md
* Additional structural changes
* Fixed links
* heading fixes
* [MO] Remove IR frontend from available frontend list in MO
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix issue - forget to pass FEM
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix issue for TF with new FE and default legacy
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix result saving when batch size is not 1
* Remove useless if statement
* improved processing scores for model with more than one outputs
* added checking on count of model outputs
* improve if statements
* divide fix for model with several outputs to other PR
Co-authored-by: Maxim Gordeev <maxim.gordeev@intel.com>
* [GPU] update the condition for minimize_local_reorders
* Update to check needs reorder condition in quantize.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Add coommont test of the key PERFORMANCE_HINT for AUTO plugin API 2.0.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Add common test case for config check.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Update.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Use the implemented property test case.
Signed-off-by: Wang, Yang <yang4.wang@intel.com>
* Written the draft of the specification of the operation RFFT.
* Started to write the specification of the operation IRFFT.
* Small fix.
* Renamed RFFT operation as RDFT.
* Fix in Operations_specifications.md.
* Written the specification of the operation IRDFT.
* Fixes in examples.
* Fixes in opset9.md and Operations_specifications.md.
* Small fix.
* Replaced opset8 by opset9 in opset9.md.
* Deleted redundant sentences.
* Small fix.
* Replaced input_shape by data_shape.
* Fixed mistypes.
* Fixes of mistypes.
* Fixed typo.
* Fixed RDFT specification, in order to perform signal_size input as in TF and PyTorch.
* Fixes in examples for RDFT.
* Fixes in the output shape calculation of IRDFT. Now this calculation is as in TF and PyTorch.
* auto-batching- bare min of the info
* renaming BATCH.MD to the automatic_batching.md, also aligned the link to the new naming convention
* more info and brushed
* added openvino_docs_OV_UG_Automatic_Batching to the main TOC
* Apply suggestions from code review
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* close on the comments, added the code examples
* Apply suggestions from code review
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Update example
* Update format
* Update docs format
* added couple of more perf considerations
* more code examples
* Apply suggestions from code review
* Apply the rest from code review
* Update header
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
* Remove fp16 of Convert layer test from skip_tests.config.cpp as it works now
* update repo
* add initial op impl check tests
* add op imple check tests
* add op impl check tests
* add rnn cell based ops
* modify lstmsequence
* update rnn cell base op test
* add priorbox, priorboxclustered, proposal
* add ROIAlign to ReverseSequence
* add Roll to ScatterElementsUpdate
* add select to swish tests
* add tensoriterator to variadicsplit test
* temporary block of LSTMCell v1 due to crash in mkldnn
* use ov namespace instead of ngraph as possible
* update indexing of vector array
* update multiple parameter vector
* add loop test
* fix cpplint errors
* fix build error
* Fix in Preprocessing python bindings - add correct default arguments for:
- PreProcessSteps::convert_element_type
- PostProcessSteps::convert_element_type
- InputTensorInfo::set_color_format
Otherwise, python users must always specify optional params
E.g. instead of writing `tensor().set_color_format(ColorFormat.RGB)` python users will have to write `tensor().set_color_format(ColorFormat.RGB, [])`
* Corrected 'help' output
* Exposing 'openvino.runtime.Type.undefined' and use it in 'convert_element_type' documentation
* Fixed Apple install
* Update path to libs in setupvars.sh
* Fix IE_CPACK_RUNTIME_PATH for Apple
* Fix wheels packaging
Co-authored-by: Alexey Suhov <alexey.suhov@intel.com>
* [DOCS] hddl update
include info on hddl and myriad working at the same time
* Update docs/OV_Runtime_UG/supported_plugins/MYRIAD.md
Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com>
* Update HDDL.md
* Update MYRIAD.md
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com>
* Added inputs argument to all compare() function overloads
* Rewritten compare() function for NMS
* Implemented sorting by name of expected outputs
* Implemented sorting by name of actual outputs
* Added accounting for simultaneous dynamism and the need to convert outputs in Template plugin
* Added a separate case to the GetBlob function for correct dimensions
* Rewritten Expected outputs sorting to work correctly on cpuFuncTests
* Fixing code style problems
* Implemented sorting by name of actual outputs for functional tests
* Debug prints removed
* Replacing a raw pointer with a vector
* Fixing code style problems
* Shifting the sorting place Expected outputs
* Added sorting of Expected exits in one more place
* Quality transition to SLT2.0
* Removing unnecessary code after SLT2.0
* Fix soft_nms_sigma argument
* Removing unnecessary parts after SLT2.0
* Remove unnecessary outputs sorting
* Removing parts from the code for debugging
* Fix for NMS
* Trying to make CI green
* Checking test passage without adding convert precision
* Checking CI
* There is an algorithm that adds Convert only if there is f16, fp16 in inputs
* Add Convert Op in cases where inputs are not already installed f32
* Check that the CI will go away if you put everything back
* Revert changes, validate f32 change on ci
* Adding Convert f16-f32 only if there is a function parameter of type f16
* The presence of f16/bf16 as a parameter type is now mandatory to add Convert
* Added prints for params, inputs, outputs
* Logic checking the absence of Convert
* Cosmetic fixes
* Setting the correct value for selected_scores_type NMS-5
* Fix bf
* Increased readability
* Missing parts added
* Removed the static for the vector
* [MO] Clean-up MO cmd-line options
Remove the following Model Optimizer deprecated options that are no longer used for several releases: disable_fusing, disable_gfusing, generate_deprecated_IR_V7,
legacy_ir_generation, keep_shape_ops, move_to_preprocess
Deprecate through CLI the following options for which functionality triggered from POT or automatically: disable_weights_compression, disable_nhwc_to_nchw,
disable_resnet_optimization, finegrain_fusing.
Correct and extend description of each MO option to be printed during model conversion.
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Correct documentation about input shapes
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Perform final corrections in documentation
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Remove legacy_ir_generation overall
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Clean-up tests from deprecated options
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Recover disable_fusing option as deprecated
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fix keys for static_shape and extensions
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Remove extension key that does not work
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Apply feedback: remove disable_gfusing, correct docs
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Recover disable_fusing option for unit-tests
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Apply feedback for documentation
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Apply feedback about parameters use_legacy_frontend and use_new_frontend
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* DO minor fixes for indentation of MO logs
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Revert log.error for fallback message
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Revert disable_weights_compression parameter for tests
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Fixed some comments about transformations
* Changed transformation guide
* Fixed typo
* Moved transformation doc to extensibility
* Moved images to Extensibility_UG
* Added separate document for each pass
* Added see also section
* Fixed comments
* Checking compatibility between 'pyopenvino' and 'libopenvino' on 'import phase'
This fix is to prevent undefined behavior when user loads OpenVINO from python, but pyopenvino loads different version of 'libopenvino'
This may happen if user has several releases installed and played around PATH/PYTHONPATH environment variables.
In such case, user may have undefined behavior - application may crash in the middle of the usage or use incorrect release.
Fix checks build versions for pyopenvino and ov::get_openvino_version. If mismatch occurs, exception is thrown.
This logic is disabled if user has built OpenVINO locally, experienced developers probably know what they're doing, so if version has 'custom_' prefix - this logic is disabled
* Removed custom logic for CI_BUILD_NUMBER, it is reused from already included version.cmake
* Use addVersionDefines macro
* Update samples and samplers with the new DataLoader format
* Update with utils
* Pylint updates
* Update metric with the exception
* Pylint
* Update with the exception
* Pylint
* Revert index sampler changes
* Update ImageLoader & SimplifiedEngine
* Update with the different solution
* Remove utils
* Pylint
* Remove list wrapping
* Remove list from meta_data
* Improve `-o` and `-oname` flags
* Apply clang-format tool
* fix saving output files
* Apply clang-format
* Fix error when `-oname` not specified
* apply clang format
* Fix error `-oname`
* Use output name with port to find model output
* fix comment line breaking
* fix comparison with reference for multiple outputs
* Fix output name printing error
* try to fix clang format
* fix problem with bs > 1
* minimal change to rerun test pipeline
* clang format
* Revert "Fix error `-oname`"
This reverts commit c33d5f16e8.
* Used new config for streams and threads
* Fixed review coments in ba
* format fix
* fixed hello_query_device
* Added STL string io
* fixed tests
* Fixed test
* Fixed build
* fixed format
* Fixed build
* try fix win
* other any io specialization
* Fixed after merge
* renamed streams
* build fixed
* fixed build
* fixed format
* fix for old mac build
* Fixed type of exception
* test fix
* Added ov configuration test
* Added common OV properties tests
* fix mklnn
* fixed foramat
* merge conflicts
* Remoed compile_model tests
* removed duplicated test
* [GPU] Enable deconv with oneDNN
remove post-op data_type into oneDNN.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Update to use data_type in conv sum post-op.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* checking the network batchability (internal helper func on top of batch tracking) before doing hetero
* more general logic with respect to batch-ability of the network
* a dynamism check that I've owed from the PR-10560
* using the DO-detached mechanism for early hetero exit, also fixed this flag in the Batching plugin (although minor, as the DO is removed by HETERO)
* adding the dimension tracking logic depending on whether implicitly/expicitly the auto-batching is enabled
* changed the DetectionOutput affinity markup to go over results, also accomodate Convert, so only 2 subgraphs are made by the HETERO
* installing-openvino-yocto: fix documentation links
Point to the new Yocto docs website.
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
* Update installing-openvino-yocto.md
* installing-openvino-yocto: add step to checkout specific branch
Request users to checkout specific branch of meta-intel where this
version of OpenVINO is available.
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Anuj Mittal <anuj.mittal@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Revised unique ID setting scheme. Previously it was using program id to distinguish the loop body networks' id.
However, it results in cl cache miss for same network loaded multiple time, because program ids are differnt.
Now revised it to use parent primitive id instead of program_id for unique id of nodes in body networks.
* Revised adding unique_id to entry points to have a temporal number as unique id
* Revert the canceld change
* Added test to check whether two networks loaded from same function creates same cl cache
Transformation insert Transpose for MatMul's weights and
sets its transpose_b attribute to true.
If executed by MO, it helps to reduce LoadNetwork time on CPU plugin,
since ConvertMatMulToFC doesn't have to insert Transpose by itself.
Ticket: 78635
* support config key device priority
for example:
if AUTO:CPU,GPU
the priority of CPU will be higher than GPU
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add test and fix compile and test error
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add an info for device priority and add lost [AUTOPLUGIN] on log
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* parseMetaDevice return all DEVICE of GPU, when use AUTO:GPU
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix compile issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* modify test and add test case, fix code issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix a bug and mutli with HETERO test failed
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix mock test faild issue
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix misprint
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* Disable AUTO:MYRIAD case
MYRIAD/CoreThreadingTests.smoke_QueryNetwork/targetDevice=MULTI_config=MULTI_DEVICE_PRIORITIES:MYRIAD_
faild on windows
the error is
myriadFuncTests-0 INFO: [E:] [BSL] found 0 ioexpander device
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* use ov::device::priorities key in this PR
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* fix a logic bug in key_network_priority after enable device priority
add test case cover it
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* [MO] Upgrade TensorFlow version dependency due to SNYK hits
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Still use 2.5.0 TensorFlow for Python 3.6 and older
Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>
* Upgrade protobuf to 3.19.4
* Upgdate precompiled protoc version
* Update protobuf to v3.18.2
Updating further peding this fix to be released
https://github.com/protocolbuffers/protobuf/pull/9437
* Disable warnings for protobuf
* Fixing SetUp for SLT tests of ShapeOF
* Attempting to pass outputPrecision into the test
* Correcting deficiencies, taking into account in the name of the test of the output precision
* remove formatTimeMilli from time_utils.cpp
* add traceCallStacks test case
* add traceCallStacks test case in format_test.cpp
* add param:"test" to function TraceCallStacks()
* rollback file in master branch
* add traceCallStacks test case in format_test.cpp
* remove formatTimeMilli from time_utils.cpp and add traceCallStacks test case in format_test.cpp
[GPU] Enable shuffle and fsv32 in implicit concat
* Support shuffle fsv32
* Check feature depths in first input depedency.
* Add to select onednn convolution in case block format in get_preferred_impl_type func.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Initial commit. Need to remove debug code
* Remove extra flags. Fix comparation in the matchers
* Fix small issue with the default args
* Update eltwise.hpp
* Update ov_subgraph.cpp
* Optimized any compilation time
* Fixed Any compilation time
* any::addressof
* reverted
* Fixed read write
* format fix
* Fixed build
* format fix
* Moved any tests back
* removed inline
* fix format
* used static inline
* format fix
* removed inline static
* fixed merge confkicts
After enabling deconv b32 onednn, colorization-siggraph f16 b32 has regresison,
Fix it. Add to check sum post ops in case deconv onednn.
Signed-off-by: hyunback <hyunback.kim@intel.com>
* Added info on DockerHub CI Framework
* Feature/azaytsev/change layout (#3295)
* Changes according to feedback comments
* Replaced @ref's with html links
* Fixed links, added a title page for installing from repos and images, fixed formatting issues
* Added links
* minor fix
* Added DL Streamer to the list of components installed by default
* Link fixes
* Link fixes
* ovms doc fix (#2988)
* added OpenVINO Model Server
* ovms doc fixes
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
* Updated openvino_docs.xml
* Updated the link to software license agreements
* Revert "Updated the link to software license agreements"
This reverts commit 706dac500e.
* Removed the Intel logo
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
- Replace find with compare func to avoid dumping all layers that contain layer name
Signed-off-by: Andrew Kwangwoong Park <andrew.kwangwoong.park@intel.com>
When post-op has pattern like below, binary_mul was ignored previously.
1. binary_add
2. eltwise_linear
3. binary_mul
4. binary_add
It happens when prev_post_op_idx == 2, cur_post_op_idx == 4.
prev_post_op_idx was supposed to proceed to idx 3, but it did not.
* Use tensor names instead of friendly names, handle one output tensor to several Result ops case
* fix python tests
* fix python test
* fix incorrect merge
* remove redundant files
* fix variable names generation, fix python test
* Apply review comments
* fix python test
Even though it is not possible to hit into this situation using existing plugins - there is theoretical possibility that some plugin may return 'nullptr' as it is allowed.
So this check shall remain in generic part which should not rely on plugin-specific behavior
We welcome community contributions to OpenVINO™. Please read the following guide to learn how to find ideas for contribution, practices for good pull requests, checking your changes with our tests and more.
## Before you start contributing you should
- Make sure you agree to contribute your code under [OpenVINO™ (Apache 2.0)](https://github.com/openvinotoolkit/openvino/blob/master/LICENSE) license.
- Figure out what you’re going to contribute. If you don’t know what you are going to work on, navigate to the [Github "Issues" tab](https://github.com/openvinotoolkit/openvino/issues). Make sure that there isn't someone working on it. In the latter case you might provide support or suggestion in the issue or in the linked pull request.
- If you are going to fix a bug, check that it's still exists in the latest release. This can be done by building the latest master branch, and make sure that the error is still reproducible there. We do not fix bugs that only affect older non-LTS releases like 2020.2 for example (more details about [branching strategy](https://github.com/openvinotoolkit/openvino/wiki/Branches)).
## "Fork & Pull Request model" for code contribution
### [](https://github.com/openvinotoolkit/openvino/blob/master/CONTRIBUTING.md#the-instruction-in-brief)The instruction in brief
- Register at GitHub. Create your fork of OpenVINO™ repository [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) (see [https://help.github.com/articles/fork-a-repo](https://help.github.com/articles/fork-a-repo) for details).
- Install Git.
- Set your user name and email address in a Git configuration according to GitHub account (see [https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup](https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup) for details).
- Choose a task for yourself. It could be a bugfix or some new code.
- Choose a base branch for your work. More details about branches and policies are here: [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches)
- Clone your fork to your computer.
- Create a new branch (with a meaningful name) from the base branch you chose.
- Modify / add the code following our [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines).
- If you want to add a new sample, please look at this [Guide for contributing to C++/C/Python IE samples](https://github.com/openvinotoolkit/openvino/wiki/SampleContribute)
- If you want to contribute to the documentation and want to add a new guide, follow that instruction [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation)
- Run testsuite locally:
- execute each test binary from the artifacts directory, e.g. `<source dir>/bin/intel64/Release/ieFuncTests`
- When you are done, make sure that your branch is to date with latest state of the branch you want to contribute to (e.g. `git fetch upstream && git merge upstream/master`), push your branch to your GitHub fork; then create a pull request from your branch to the base branch (see [https://help.github.com/articles/using-pull-requests](https://help.github.com/articles/using-pull-requests) for details).
## Making a good pull request
Following these guidelines will increase the likelihood of your pull request being accepted:
- One PR – one issue.
- Build perfectly on your local system.
- Choose the right base branch [Branches](https://github.com/openvinotoolkit/openvino/wiki/Branches).
- Follow the [Coding Style Guide](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLines) for your code.
- Update documentation using [Documentation guidelines](https://github.com/openvinotoolkit/openvino/wiki/CodingStyleGuideLinesDocumentation) if needed.
- Cover your changes with test.
- Add license at the top of new files [C++ example](https://github.com/openvinotoolkit/openvino/blob/master/samples/cpp/classification_sample_async/main.cpp#L1-L2), [Python example](https://github.com/openvinotoolkit/openvino/blob/master/samples/python/hello_classification/hello_classification.py#L3-L4).
- Add enough information: a meaningful title, the reason why you made the commit and a link to the issue page if exists.
- Remove unrelated to PR changes.
- If it is still WIP and you want to check CI test results early then use _Draft_ PR.
- Submit your PR and become an OpenVINO™ contributor!
## Testing and merging pull requests
Your pull request will be automatically tested by OpenVINO™'s precommit (testing status are automatically reported as "green" or "red" circles in precommit steps on PR's page). If any builders have failed, you need fix the issue. To rerun the automatic builds just push changes to your branch on GitHub. No need to close pull request and open a new one!
## Merging PR
As soon as the reviewer is fine with the pull request and precommit shows "green" status, the "Approved" review status is put, which signals OpenVINO™ maintainers that they can merge your pull request.
This toolkit allows developers to deploy pre-trained deep learning models
through a high-level OpenVINO™ Runtime C++ and Python APIs integrated with application logic.
## Contents:
This open source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics.
- [Products which use OpenVINO](#products-which-use-openvino)
- [System requirements](#system-requirements)
- [How to build](#how-to-build)
- [How to contribute](#how-to-contribute)
- [Get a support](#get-a-support)
- [See also](#see-also)
## What is OpenVINO toolkit?
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
- Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks
- Use models trained with popular frameworks like TensorFlow, PyTorch and more
- Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud
This open-source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics.
It supports pre-trained models from the [Open Model Zoo], along with 100+ open
source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
## Repository components
* [OpenVINO™ Runtime]
* [Model Optimizer]
* [Post-Training Optimization Tool]
### Components
* [OpenVINO™ Runtime] - is a set of C++ libraries with C and Python bindings providing a common API to deliver inference solutions on the platform of your choice.
* [core](https://github.com/openvinotoolkit/openvino/tree/master/src/core) - provides the base API for model representation and modification.
* [inference](https://github.com/openvinotoolkit/openvino/tree/master/src/inference) - provides an API to infer models on device.
* [transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/transformations) - contains the set of common transformations which are used in OpenVINO plugins.
* [low precision transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/low_precision_transformations) - contains the set of transformations which are used in low precision models
* [bindings](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings) - contains all awailable OpenVINO bindings which are maintained by OpenVINO team.
* [c](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/c) - provides C API for OpenVINO™ Runtime
* [python](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/python) - Python API for OpenVINO™ Runtime
* [Plugins](https://github.com/openvinotoolkit/openvino/tree/master/src/plugins) - contains OpenVINO plugins which are maintained in open-source by OpenVINO team. For more information please taje a look to the [list of supported devices](#supported-hardware-matrix).
* [Frontends](https://github.com/openvinotoolkit/openvino/tree/master/src/frontends) - contains available OpenVINO frontends which allow to read model from native framework format.
* [Model Optimizer] - is a cross-platform command-line tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.
* [Post-Training Optimization Tool] - is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization.
* [Samples] - applications on C, C++ and Python languages which shows basic use cases of OpenVINO usages.
## Supported Hardware matrix
The OpenVINO™ Runtime can infer models on different hardware devices. This section provides the list of supported devices.
<td>Auto batch plugin performs on-the-fly automatic batching (i.e. grouping inference requests together) to improve device utilization, with no programming effort from the user</td>
* [Intel® Distribution of OpenVINO™ toolkit Product Page](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
* [Intel® Distribution of OpenVINO™ toolkit Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
## Documentation
### User documentation
The latest documentation for OpenVINO™ Toolkit is availabe [here](https://docs.openvino.ai/). This documentation contains detailed information about all OpenVINO components and provides all important information which could be needed if you create an application which is based on binary OpenVINO distribution or own OpenVINO version without source code modification.
### Developer documentation
[Developer documentation](#todo-add) contains information about architectural decisions which are applied inside the OpenVINO components. This documentation has all necessary information which could be needed in order to contribute to OpenVINO.
Please take a look to [OpenVINO Wiki](https://github.com/openvinotoolkit/openvino/wiki#how-to-build) to get more information about OpenVINO build process.
## How to contribute
See [CONTRIBUTING](./CONTRIBUTING.md) for details. Thank you!
## Get a support
## Support
Please report questions, issues and suggestions using:
* The [`openvino`](https://stackoverflow.com/questions/tagged/openvino) tag on StackOverflow\*
* [Intel® Distribution of OpenVINO™ toolkit Product Page](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
* [Intel® Distribution of OpenVINO™ toolkit Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
* [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf) - a suite of advanced algorithms for model inference optimization including quantization, filter pruning, binarization and sparsity
* [OpenVINO™ Training Extensions (OTE)](https://github.com/openvinotoolkit/training_extensions) - convenient environment to train Deep Learning models and convert them using OpenVINO for optimized inference.
* [OpenVINO™ Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) - a scalable, high-performance solution for serving deep learning models optimized for Intel architectures
* [DL Workbench](https://docs.openvino.ai/nightly/workbench_docs_Workbench_DG_Introduction.html) - An alternative, web-based version of OpenVINO designed to make production of pretrained deep learning models significantly easier.
* [Computer Vision Annotation Tool (CVAT)](https://github.com/openvinotoolkit/cvat) - an online, interactive video and image annotation tool for computer vision purposes.
* [Dataset Management Framework (Datumaro)](https://github.com/openvinotoolkit/datumaro) - a framework and CLI tool to build, transform, and analyze datasets.
---
\* Other names and brands may be claimed as the property of others.
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
User Guide <workbench_docs_Workbench_DG_User_Guide>
workbench_docs_Workbench_DG_Troubleshooting
@endsphinxdirective
Deep Learning Workbench (DL Workbench) is an official OpenVINO™ graphical interface designed to make the production of pretrained deep learning Computer Vision and Natural Language Processing models significantly easier.
Minimize the inference-to-deployment workflow timing for neural models right in your browser: import a model, analyze its performance and accuracy, visualize the outputs, optimize and make the final model deployment-ready in a matter of minutes. DL Workbench takes you through the full OpenVINO™ workflow, providing the opportunity to learn about various toolkit components.
DL Workbench enables you to get a detailed performance assessment, explore inference configurations, and obtain an optimized model ready to be deployed on various Intel® configurations, such as client and server CPU, Intel® Processor Graphics (GPU), Intel® Movidius™ Neural Compute Stick 2 (NCS 2), and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
DL Workbench also provides the [JupyterLab environment](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Jupyter_Notebooks.html#doxid-workbench-docs-workbench-d-g-jupyter-notebooks) that helps you quick start with OpenVINO™ API and command-line interface (CLI). Follow the full OpenVINO workflow created for your model and learn about different toolkit components.
DL Workbench helps achieve your goals depending on the stage of your deep learning journey.
If you are a beginner in the deep learning field, the DL Workbench provides you with
learning opportunities:
* Learn what neural networks are, how they work, and how to examine their architectures.
* Learn the basics of neural network analysis and optimization before production.
* Get familiar with the OpenVINO™ ecosystem and its main components without installing it on your system.
If you have enough experience with neural networks, DL Workbench provides you with a
convenient web interface to optimize your model and prepare it for production:
* Measure and interpret model performance.
* Tune the model for enhanced performance.
* Analyze the quality of your model and visualize output.
## General Workflow
The diagram below illustrates the typical DL Workbench workflow. Click to see the full-size image:

Get a quick overview of the workflow in the DL Workbench User Interface:

## OpenVINO™ Toolkit Components
The intuitive web-based interface of the DL Workbench enables you to easily use various
OpenVINO™ toolkit components:
Component | Description
|------------------|------------------|
| [Open Model Zoo](https://docs.openvinotoolkit.org/latest/omz_tools_downloader.html)| Get access to the collection of high-quality pre-trained deep learning [public](https://docs.openvinotoolkit.org/latest/omz_models_group_public.html) and [Intel-trained](https://docs.openvinotoolkit.org/latest/omz_models_group_intel.html) models trained to resolve a variety of different tasks.
| [Model Optimizer](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) |Optimize and transform models trained in supported frameworks to the IR format. <br>Supported frameworks include TensorFlow\*, Caffe\*, Kaldi\*, MXNet\*, and ONNX\* format.
| [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html)| Estimate deep learning model inference performance on supported devices.
| [Accuracy Checker](https://docs.openvinotoolkit.org/latest/omz_tools_accuracy_checker.html)| Evaluate the accuracy of a model by collecting one or several metric values.
| [Post-Training Optimization Tool](https://docs.openvinotoolkit.org/latest/pot_README.html)| Optimize pretrained models with lowering the precision of a model from floating-point precision(FP32 or FP16) to integer precision (INT8), without the need to retrain or fine-tune models. |
# Introduction to Model Processing {#openvino_docs_model_processing_introduction}
Every deep learning workflow begins with obtaining a model. You can choose to prepare a custom one, use a ready-made solution and adjust it to your needs, or even download and run a pre-trained network from an online database, such as OpenVINO's [Open Model Zoo](../model_zoo.md).
This section describes how to obtain and prepare your model for work with OpenVINO to get the best inference results:
* [Browse a database of models for use in your projects](../model_zoo.md).
* [Convert different model formats to the OpenVINO IR format](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
* [Automate model-related tasks with Model Downloader and additional OMZ Tools](https://docs.openvino.ai/latest/omz_tools_downloader.html).
OpenVINO™ is not just one tool. It is an expansive ecosystem of utilities, providing a comprehensive workflow for deep learning solution development. Learn more about each of them to reach the full potential of OpenVINO™ Toolkit.
### OpenVINO™ Model Server (OVMS)
OpenVINO Model Server is a scalable, high-performance solution for serving deep learning models optimized for Intel® architectures. The server uses Inference Engine libraries as a backend and exposes gRPC and HTTP/REST interfaces for inference that are fully compatible with TensorFlow Serving.
* [Red Hat Ecosystem Catalog](https://catalog.redhat.com/software/container-stacks/detail/60649e41ccfb383fe395a167)
### Neural Network Compression Framework (NNCF)
A suite of advanced algorithms for Neural Network inference optimization with minimal accuracy drop. NNCF applies quantization, filter pruning, binarization and sparsity algorithms to PyTorch and TensorFlow models during training.
A solution empowering TensorFlow developers with OpenVINO's optimization capabilities. With just two lines of code in your application, you can offload inference to OpenVINO, while keeping the TensorFlow API.
A streaming media analytics framework, based on the GStreamer multimedia framework, for creating complex media analytics pipelines.
More resources:
* [documentation on GitHub](https://openvinotoolkit.github.io/dlstreamer_gst/)
* [installation Guide on GitHub](https://github.com/openvinotoolkit/dlstreamer_gst/wiki/Install-Guide)
### DL Workbench
A web-based tool for deploying deep learning models. Built on the core of OpenVINO and equipped with a graphics user interface, DL Workbench is a great way to explore the possibilities of the OpenVINO workflow, import, analyze, optimize, and build your pre-trained models. You can do all that by visiting [Intel® DevCloud for the Edge](https://software.intel.com/content/www/us/en/develop/tools/devcloud.html) and launching DL Workbench on-line.
# OpenVINO™ integration with TensorFlow {#ovtf_integration}
**OpenVINO™ integration with TensorFlow** is a solution for TensorFlow developers who want to get started with OpenVINO™ in their inferencing applications. By adding just two lines of code you can now take advantage of OpenVINO™ toolkit optimizations with TensorFlow inference applications across a range of Intel® computation devices.
This is all you need:
```bash
import openvino_tensorflow
openvino_tensorflow.set_backend('<backend_name>')
```
**OpenVINO™ integration with TensorFlow** accelerates inference across many AI models on a variety of Intel® technologies, such as:
- Intel® CPUs
- Intel® integrated GPUs
- Intel® Movidius™ Vision Processing Units - referred to as VPU
- Intel® Vision Accelerator Design with 8 Intel Movidius™ MyriadX VPUs - referred to as VAD-M or HDDL
> **NOTE**: For maximum performance, efficiency, tooling customization, and hardware control, we recommend developers to adopt native OpenVINO™ solutions.
To find out more about the product itself, as well as learn how to use it in your project, check its dedicated [GitHub repository](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/docs).
To see what you can do with **OpenVINO™ integration with TensorFlow**, explore the demos located in the [examples folder](https://github.com/openvinotoolkit/openvino_tensorflow/tree/master/examples) in our GitHub repository.
Sample tutorials are also hosted on [Intel® DevCloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/build/ovtfoverview.html). The demo applications are implemented using Jupyter Notebooks. You can interactively execute them on Intel® DevCloud nodes, compare the results of **OpenVINO™ integration with TensorFlow**, native TensorFlow, and OpenVINO™.
## License
**OpenVINO™ integration with TensorFlow** is licensed under [Apache License Version 2.0](https://github.com/openvinotoolkit/openvino_tensorflow/blob/master/LICENSE).
By contributing to the project, you agree to the license and copyright terms therein
and release your contribution under these terms.
## Support
Submit your questions, feature requests and bug reports via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
## How to Contribute
We welcome community contributions to **OpenVINO™ integration with TensorFlow**. If you have an idea for improvement:
* Share your proposal via [GitHub issues](https://github.com/openvinotoolkit/openvino_tensorflow/issues).
* Submit a [pull request](https://github.com/openvinotoolkit/openvino_tensorflow/pulls).
We will review your contribution as soon as possible. If any additional fixes or modifications are necessary, we will guide you and provide feedback. Before you make your contribution, make sure you can build **OpenVINO™ integration with TensorFlow** and run all the examples with your fix/patch. If you want to introduce a large feature, create test cases for your feature. Upon our verification of your pull request, we will merge it to the repository provided that the pull request has met the above mentioned requirements and proved acceptable.
---
\* Other names and brands may be claimed as the property of others.
# How to Implement Custom GPU Operations {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel}
# How to Implement Custom GPU Operations {#openvino_docs_Extensibility_UG_GPU}
To enable operations not supported by OpenVINO™ out of the box, you need a custom extension for Model Optimizer, a custom nGraph operation set, and a custom kernel for the device you will target. This page describes custom kernel support for the GPU device.
To enable operations not supported by OpenVINO out of the box, you may need an extension for an OpenVINO operation set, and a custom kernel for the device you will target. This page describes custom kernel support for the GPU device.
The GPU codepath abstracts many details about OpenCL\*. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
The GPU codepath abstracts many details about OpenCL. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
There are two options for using the custom operation configuration file:
* Include a section with your kernels into the global automatically-loaded `cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file, which is hosted in the `<INSTALL_DIR>/runtime/bin` folder
* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
* Include a section with your kernels into the automatically-loaded `<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file.
* Call the `ov::Core::set_property()` method from your application with the `"CONFIG_FILE"` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
@snippet snippets/GPU_Kernel.cpp part0
@sphinxtabset
All Inference Engine samples, except the trivial `hello_classification`, and most Open Model Zoo demos
All OpenVINO samples, except the trivial `hello_classification`, and most Open Model Zoo demos
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
## Configuration File Format <aname="config-file-format"></a>
The configuration file is expected to follow the `.xml` file structure
with a node of the type `CustomLayer` for every custom operation you provide.
with a node of the `CustomLayer` type for every custom operation you provide.
The definitions described in the sections below use the following notations:
@@ -47,8 +57,7 @@ Notation | Description
### Kernel Node and Sub-Node Structure
`Kernel` node contains all kernel source code configuration. No kernel
node structure exists.
`Kernel` node contains all kernel source code configuration.
**Sub-nodes**: `Source` (1+), `Define` (0+)
@@ -134,7 +143,7 @@ queuing an OpenCL program for execution.
## Example Configuration File
The following code sample provides an example configuration file in XML
The following code sample provides an example configuration file in XML
format. For information on the configuration file structure, see
[Configuration File Format](#config-file-format).
```xml
@@ -155,8 +164,7 @@ format. For information on the configuration file structure, see
## Built-In Definitions for Custom Layers
The following table includes definitions that are attached before
user sources, where `<TENSOR>` is the actual input and output, for
example, `INPUT0` or `OUTPUT0`.
user sources.
For an example, see [Example Kernel](#example-kernel).
@@ -170,19 +178,20 @@ For an example, see [Example Kernel](#example-kernel).
| `<TENSOR>_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX` |
| `<TENSOR>_DIMS_SIZE`| The size of the `<TENSOR>_DIMS` array.|
| `<TENSOR>_TYPE`| The datatype of the tensor: `float`, `half`, or `char`|
| `<TENSOR>_FORMAT_` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. |
| `<TENSOR>_FORMAT_<TENSOR_FORMAT>` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. |
| `<TENSOR>_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.|
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array |
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array |
| `<TENSOR>_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. |
| `<TENSOR>_UPPER_PADDING_SIZE` | The size of the `<TENSOR>_UPPER_PADDING` array |
| `<TENSOR>_PITCHES` | The number of elements between adjacent elements in each dimension. Always ordered as BFYX.|
| `<TENSOR>_PITCHES` | The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.|
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array |
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element, bypassing the lower padding. |
All `<TENSOR>` values are automatically defined for every tensor
bound to this operation, such as `INPUT0`, `INPUT1`, and `OUTPUT0`, as shown
Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. The need for a custom operation may appear in two main cases:
1. A regular framework operation that is new or rarely used, which is why it hasn’t been implemented in OpenVINO yet.
2. A new user operation that was created for some specific model topology by a model author using framework extension capabilities.
Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations, allowing you to plug in your own implementation for them. OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime.
Defining a new custom operation basically consist of two parts:
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). How to implement execution kernels for [GPU](./GPU_Extensibility.md) and [VPU](./VPU_Extensibility.md) is described in separate guides.
2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics.
The first part is required for inference, the second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part, the next sections will describe them in detail.
## Definition of Operation Semantics
If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. When deciding feasibility of such decomposition refer to the latest OpenVINO operation set. You can use any valid combination of exiting operations. How to map a custom operation is described in the next section of this document.
If such decomposition is not possible or appears too bulky with lots of consisting operations that are not performing well, then a new class for the custom operation should be implemented as described in the [Custom Operation Guide](add_openvino_ops.md).
Prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise try to decompose the operation first as described above and then after verifying correctness of inference and resulting performance, optionally invest to implementing bare metal C++ implementation.
## Mapping from Framework Operation
Depending on model format used for import, mapping of custom operation is implemented differently, choose one of:
1. If model is represented in ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with Model Optimizer `--extensions` option or when model is imported directly to OpenVINO run-time using read_model method. Python API is also available for run-time model importing.
2. If model is represented in TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only.
Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle) and legacy frontends (TensorFlow, Caffe, Kaldi and Apache MXNet). Model Optimizer can use both front-ends in contrast to the direct import of model with `read_model` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.
If you are implementing extensions for ONNX or PaddlePaddle new frontends and plan to use Model Optimizer `--extension` option for model conversion, then the extensions should be
1. Implemented in C++ only
2. Compiled as a separate shared library (see details how to do that later in this guide).
You cannot write new frontend extensions using Python API if you plan to use them with Model Optimizer.
Remaining part of this guide uses Frontend Extension API applicable for new frontends.
## Registering Extensions
A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
To load the extensions to the `ov::Core` object, use the `ov::Core::add_extension` method, this method allows to load library with extensions or extensions from the code.
### Load extensions to core
Extensions can be loaded from code with `ov::Core::add_extension` method:
`Identity` is custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is enough to enable reading IR which uses `Identity` extension operation emitted by Model Optimizer. To be able to load original model directly to the runtime, you need to add also a mapping extension:
When Python API is used there is no way to implement a custom OpenVINO operation. Also, even if custom OpenVINO operation is implemented in C++ and loaded to the runtime through a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. Use C++ shared library approach to implement both operations semantics and framework mapping in this case.
You still can use Python for operation mapping and decomposition in case if operations from the standard OpenVINO operation set is used only.
### Create library with extensions
You need to create extension library in the following cases:
- Convert model with custom operations in Model Optimizer
- Load model with custom operations in Python application. It is applicable for both framework model and IR.
- Loading models with custom operations in tools that support loading extensions from a library, for example `benchmark_app`.
If you want to create an extension library, for example in order to load these extensions to the Model Optimizer, you need to do next steps:
Create an entry point for extension library. OpenVINO™ provides an `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO™ Extensions.
This macro should have a vector of all OpenVINO™ Extensions as an argument.
Based on that, the declaration of an extension class can look as follows:
# How to Implement Custom Layers for VPU (Intel® Neural Compute Stick 2) {#openvino_docs_IE_DG_Extensibility_DG_VPU_Kernel}
# How to Implement Custom Layers for VPU (Intel® Neural Compute Stick 2) {#openvino_docs_Extensibility_UG_VPU_Kernel}
To enable operations not supported by OpenVINO™ out of the box, you need a custom extension for Model Optimizer, a custom nGraph operation set, and a custom kernel for the device you will target. This page describes custom kernel support for one the VPU, the Intel® Neural Compute Stick 2 device, which uses the MYRIAD device plugin.
> **NOTES:**
> * OpenCL\* custom layer support is available in the preview mode.
> * This section assumes you are familiar with developing kernels using OpenCL.
To customize your topology with an OpenCL layer, carry out the tasks described on this page:
1. Write and compile your OpenCL code with the standalone offline OpenCL compiler (`clc`).
2. Write a configuration file to bind the OpenCL kernel to the topology file (`.xml`) of the model IR.
3. Pass the configuration file to the Inference Engine with the model IR.
3. Pass the configuration file to the OpenVINO™ Runtime with the model IR.
> **NOTE**: OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE* processor only, is redistributed with OpenVINO.
OpenCL support is provided by ComputeAorta* and is distributed under a license agreement between Intel® and Codeplay* Software Ltd.
The OpenCL toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only, so first compile OpenCL C code using the standalone `clc` compiler. You can find the compiler binary at `<INSTALL_DIR>/tools/cl_compiler`.
> **NOTE**: By design, custom OpenCL layers support any OpenCL kernels written assuming OpenCL version 1.2. It also supports half float extension and is optimized for this type, because it is a native type for Intel® Movidius™ VPUs.
1. Prior to running a compilation, make sure that the following variables are set:
2. Run the compilation with the command below. You should use `--strip-binary-header` to make an OpenCL runtime-agnostic binary runnable with the Inference Engine.
2. Run the compilation with the command below. You should use `--strip-binary-header` to make an OpenCL runtime-agnostic binary runnable with the OpenVINO™ Runtime.
@@ -34,7 +31,7 @@ The OpenCL toolchain for the Intel® Neural Compute Stick 2 supports offline com
## Write a Configuration File
To tie the topology IR for a layer you customize, prepare a configuration file, so that the Inference Engine can find parameters for your kernel and the execution work grid is described.
To tie the topology IR for a layer you customize, prepare a configuration file, so that the OpenVINO™ Runtime can find parameters for your kernel and the execution work grid is described.
For example, consider the following OpenCL kernel signature:
```cpp
__kernel void reorg_nhwc(__global const half *src, __global half *out, int w, int h, int c, int stride);
@@ -58,7 +55,7 @@ A configuration file for this kernel might be the following:
```
Each custom layer is described with the `CustomLayer` node. It has the following nodes and attributes:
- Root node `CustomLayer` contains the following attributes:
- `name`– (Required) The name of the Inference Engine layer to bind the kernel with.
- `name`– (Required) The name of the OpenVINO™ Runtime layer to bind the kernel with.
- `type` and `version`– (Required) Reserved for future use. Set them to `MVCL` and `1` respectively.
- `max-shaves`– (Optional) The maximum number of SHAVE cores that should be dedicated for the layer. It is useful for debugging concurrency issues or for resource saving that memory bound kernel does not scale well with the number of cores, so more resources can be left for the rest of a topology.
- Sub-node `Kernel` must contain the following attributes:
@@ -158,25 +155,12 @@ Each custom layer is described with the `CustomLayer` node. It has the following
</CustomLayer>
```
## Pass Configuration File to Inference Runtime
## Pass Configuration File to OpenVINO™ Runtime
> **NOTE**: If both native and custom layer implementations are present, the custom kernel has a priority over the native one.
Before loading the network that features the custom layers, provide a separate configuration file and load it using the ov::Core::set_property() method with the "CONFIG_KEY" key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
Before loading the network that features the custom layers, provide a separate configuration file and load it using the InferenceEngine::Core::SetConfig() method with the PluginConfigParams::KEY_CONFIG_FILE key and the configuration file name as a value:
@@ -446,7 +397,7 @@ from/to a `__blobal` pointer since work-group copying could be done in a vector
}
}
```
This kernel can be rewritten to introduce special data binding `__dma_preload` and `__dma_postwrite intrinsics`. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName`, and `__dma_postwrite_kernelName`. `__dma_preload_kernelName` for a particular work group `n` is guaranteed to be executed before the `n`-th work group itself, while `__dma_postwrite_kernelName` is guaranteed to be executed after a corresponding work group. You can define one of those functions that are intended to be used to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA.
OpenVINO™ Extension API allows you to register custom operations to support models with operations which OpenVINO™ does not support out-of-the-box.
## Operation Class
To add your custom operation, create a new class that extends `ov::Op`, which is in turn derived from `ov::Node`, the base class for all graph operations in OpenVINO™. To add `ov::Op` please include next file:
1. Add the `OPENVINO_OP` macro which defines a `NodeTypeInfo` object that identifies the type of the operation to the graph users and helps with dynamic type resolution. The type info of an operation currently consists of a string operation identifier and a string for operation version.
2. Implement default constructor and constructors that optionally take the operation inputs and attributes as parameters.
3. Override the shape inference method `validate_and_infer_types`. This method is called multiple times during graph manipulations to determine the shapes and element types of the operations outputs. To access the input shapes and input element types, use the `get_input_partial_shape()` and `get_input_element_type()` methods of `ov::Node`. Set the inferred shape and element type of the output using `set_output_type`.
4. Override the `clone_with_new_inputs` method, which enables graph manipulation routines to create copies of this operation and connect it to different nodes during optimization.
5. Override the `visit_attributes` method, which enables serialization and deserialization of operation attributes. An `AttributeVisitor` is passed to the method, and the implementation is expected to walk over all the attributes in the op using the type-aware `on_attribute` helper. Helpers are already implemented for standard C++ types like `int64_t`, `float`, `bool`, `vector`, and for existing OpenVINO defined types.
6. Override `evaluate`, which is an optional method that enables fallback of some devices to this implementation and the application of constant folding if there is a custom operation on the constant branch. If your operation contains `evaluate` method you also need to override the `has_evaluate` method, this method allows to get information about availability of `evaluate` method for the operation.
Based on that, declaration of an operation class can look as follows:
### Operation Constructors
OpenVINO™ operation contains two constructors:
* Default constructor, which enables you to create an operation without attributes
* Constructor that creates and validates an operation with specified inputs and attributes
The goal of this chapter is to explain how to use Frontend extension classes to facilitate mapping of custom operations from framework model representation to OpenVINO representation. Refer to [Introduction to OpenVINO Extension](Intro.md) to understand entire flow.
This API is applicable for new frontends only, which exist for ONNX and PaddlePaddle. If a different model format is used, follow legacy [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) guide.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
## Single Operation Mapping with OpExtension
This section covers the case when a single operation in framework representation is mapped to a single operation in OpenVINO representation. This is called *one-to-one mapping*. There is `OpExtension` class that works well if all the following conditions are satisfied:
1. Number of inputs to operation in the Framework representation is the same as in the OpenVINO representation.
2. Number of outputs is also the same in both representations.
3. Inputs can be indexed and are mapped in order correspondingly, e.g. input with index 0 in framework representation maps to input with index 0 in OpenVINO representation and so on.
4. The same for outputs.
5. Each attribute in OpenVINO operation can be initialized from one of the attributes of original operation or by some predefined constant value. Value of copied attributes cannot contain expressions, value is accepted as-is, so type of a value should be compatible.
> **NOTE**: `OpExtension` class is currently available for ONNX frontend only. PaddlePaddle frontend has named inputs and outputs for operation (not indexed) therefore OpExtension mapping is not applicable for this case.
The next example maps ONNX operation with type [“Identity”]( https://github.com/onnx/onnx/blob/main/docs/Operators.md#Identity) to OpenVINO template extension `Identity` class.
The mapping doesn’t involve any attributes, as operation Identity doesn’t have them.
Extension objects, like just constructed `extension` can be used to add to the OpenVINO runtime just before the loading a model that contains custom operations:
Or extensions can be constructed in a separately compiled shared library. Separately compiled library can be used in Model Optimizer or `benchmark_app`. Read about how to build and load such library in chapter “Create library with extensions” in [Introduction to OpenVINO Extension](Intro.md).
If operation have multiple inputs and/or outputs they will be mapped in order. The type of elements in input/output tensors should match expected types in the surrounding operations. For example, if custom operation produces `f32` data type then operation that consumes this output should also support `f32`. Otherwise, model conversion fails with an error, there are no automatic type conversion happens.
### Converting to Standard OpenVINO Operation
`OpExtension` class can be used when mapping to one of the operations from standard OpenVINO operation set is what you need and there is no class like `TemplateExtension::Identity` implemented.
Here is an example for a custom framework operation “MyRelu”. Suppose it is mathematically equivalent to standard `Relu` that exists in OpenVINO operation set, but for some reason has type name “MyRelu”. In this case you can directly say that “MyRelu” -> `Relu` mapping should be used:
In the resulting converted OpenVINO model, “MyRelu” operation will be replaced by the standard operation `Relu` from the latest available OpenVINO operation set. Notice that when standard operation is used, it can be specified using just a type string (“Relu”) instead of using a `ov::opset8::Relu` class name as a template parameter for `OpExtension`. This method is available for operations from the standard operation set only. For a user custom OpenVINO operation the corresponding class should be always specified as a template parameter as it was demonstrated with `TemplateExtension::Identity`.
### Attributes Mapping
As described above, `OpExtension` is useful when attributes can be mapped one by one or initialized by a constant. If the set of attributes in framework representation and OpenVINO representation completely match by their names and types, nothing should be specified in OpExtension constructor parameters. The attributes are discovered and mapped automatically based on `visit_attributes` method that should be defined for any OpenVINO operation.
Imagine you have CustomOperation class implementation that has two attributes with names `attr1` and `attr2`:
And original model in framework representation also has operation with name “CustomOperatoin” with the same `attr1` and `attr2` attributes. Then with the following code:
both `attr1` and `attr2` are copied from framework representation to OpenVINO representation automatically. If for some reason names of attributes are different but values still can be copied “as-is” you can pass attribute names mapping in `OpExtension` constructor:
Where `fw_attr1` and `fw_attr2` are names for corresponding attributes in framework operation representation.
If copying of an attribute is not what you need, `OpExtension` also can set attribute to predefined constant value. For the same `CustomOperation`, imagine you want to set `attr2` to value 5 instead of copying from `fw_attr2`, to achieve that do the following:
So the conclusion is that each attribute of target OpenVINO operation should be initialized either by
1. Setting automatically due to name matching
2. Mapped by attribute name
3. Set to a constant value
This is achieved by specifying maps as arguments for `OpExtension` constructor.
## Mapping to Multiple Operations with ConversionExtension
Previous sections cover the case when a single operation is mapped to a single operation with optional adjustment in names and attribute values. That is likely enough for your own custom operation with existing C++ kernel implementation. In this case your framework representation and OpenVINO representation for the operation are under your control and inputs/outpus/attributes can be aligned to make `OpExtension` usable.
In case if one-to-one mapping is not possible, *decomposition to multiple operations* should be considered. It is achieved by using more verbose and less automated `ConversionExtension` class. It enables writing arbitrary code to replace a single framework operation by multiple connected OpenVINO operations constructing dependency graph of any complexity.
`ConversionExtension` maps a single operation to a function which builds a graph using OpenVINO operation classes. Follow chapter [Build a Model in OpenVINO Runtime](@ref ov_ug_build_model) to learn how to use OpenVINO operation classes to build a fragment of model for replacement.
The next example illustrates using `ConversionExtension` for conversion of “ThresholdedRelu” from ONNX according to the formula: `ThresholdedRelu(x, alpha) -> Multiply(x, Convert(Greater(x, alpha), type=float))`.
> **NOTE**: `ThresholdedRelu` is one of the standard ONNX operators which is supported by ONNX frontend natively out-of-the-box. Here we are re-implementing it to illustrate how you can add a similar support for your custom operation instead of `ThresholdedRelu`.
To access original framework operation attribute value and connect to inputs, `node` object of type `NodeContext` is used. It has two main methods:
*`NodeContext::get_input` to get input with a given index,
*`NodeContext::get_attribute` to get attribute value with a given name.
The conversion function should return a vector of node outputs that are mapped to corresponding outputs of the original framework operation in the same order.
In addition, GraphRewrite handles nodes that were registered by MatcherPasses during their execution. This nodes will be added to the beginning of the sequence with nodes for pattern matching.
> **NOTE**: when using `ov::pass::Manager` temporary GraphRewrite is used to execute single MatcherPass.
GraphRewrite has two algorithms for MatcherPasses execution. First algorithm is straightforward. It applies each MatcherPass in registration order to current node.
![graph_rewrite_execution]
But it is not really efficient when you have a lot of registered passes. So first of all GraphRewrite checks that all MatcherPass patterns has type-based root node (it means that type of this node is not hidden into predicate).
And then creates map from registered MatcherPasses. That helps to avoid additional cost of applying each MatcherPass for each node.
![graph_rewrite_efficient_search]
> **NOTE**: GraphRewrite execution algorithm cannot be set manually and depends only on root nodes registered inside MatcherPasses.
To use `ov::pass::MatcherPass`, you need to complete these steps:
1. Create a pattern
2. Implement a callback
3. Register the pattern and Matcher
4. Execute MatcherPass
So let's go through each of these steps.
## Create a pattern
Pattern is a single root `ov::Model`. But the only difference is that you do not need to create a model object, you just need to create and connect opset or special pattern operations.
Then you need to take the last created operation and put it as a root of the pattern. This root node will be used as a root node in pattern matching.
> **NOTE**: Any nodes in a pattern that have no consumers and are not registered as root will not be used in pattern matching.
The `Parameter` operation in the example above has type and shape specified. These attributes are needed only to create Parameter operation class and will not be used in pattern matching.
For more pattern examples, refer to the [pattern matching](#pattern_matching) section.
## Implement callback
Callback is an action applied to every pattern entrance. In general, callback is the lambda function that takes Matcher object with detected subgraph.
The example above shows the callback structure and how Matcher can be used for accessing nodes detected by pattern.
Callback return value is `true` if root node was replaced and another pattern cannot be applied to the same root node; otherwise, it is `false`.
> **NOTE**: It is not recommended to manipulate with nodes that are under root node. This may affect GraphRewrite execution as it is expected that all nodes that come after root node in topological order are valid and can be used in pattern matching.
MatcherPass also provides functionality that allows reporting of the newly created nodes that can be used in additional pattern matching.
If MatcherPass was registered in `ov::pass::Manager` or `ov::pass::GraphRewrite`, these registered nodes will be added for additional pattern matching.
That means that matcher passes registered in `ov::pass::GraphRewrite` will be applied to these nodes.
The example below shows how single MatcherPass can fuse sequence of operations using the `register_new_node` method.
> **NOTE**: If you register multiple nodes, please add them in topological order. We do not topologically sort these nodes as it is a time-consuming operation.
## Register pattern and Matcher
The last step is to register Matcher and callback inside the MatcherPass pass. To do this, call the `register_matcher` method.
> **NOTE**: Only one matcher can be registered for a single MatcherPass class.
```cpp
// Register matcher and callback
register_matcher(m,callback);
```
## Execute MatcherPass
MatcherPass has multiple ways to be executed:
* Run on a single node - it can be useful if you want to run MatcherPass inside another transformation.
* Run on `ov::Model` using GraphRewrite - this approach gives ability to run MatcherPass on whole `ov::Model`. Moreover, multiple MatcherPass transformation can be registered in a single GraphRewite to be executed in a single graph traversal.
* Run on `ov::Model` using `ov::pass::Manager` - this approach helps you to register MatcherPass for execution on `ov::Model` as another transformation types.
Sometimes patterns cannot be expressed via regular operations or it is too complicated.
For example, if you want to detect **Convolution->Add** sub-graph without specifying particular input type for Convolution operation or you want to create a pattern where some of operations can have different types.
And for these cases OpenVINO™ provides additional helpers to construct patterns for GraphRewrite transformations.
There are two main helpers:
1.`ov::pass::pattern::any_input` - helps to express inputs if their types are undefined.
2.`ov::pass::pattern::wrap_type<T>` - helps to express nodes of pattern without specifying node attributes.
Let's go through the example to have better understanding of how it works:
> **NOTE**: Node attributes do not participate in pattern matching and are needed only for operations creation. Only operation types participate in pattern matching.
The example below shows basic usage of `ov::passpattern::any_input`.
Here we construct Multiply pattern with arbitrary first input and Constant as a second input.
Also as Multiply is commutative operation, it does not matter in which order we set inputs (any_input/Constant or Constant/any_input) because both cases will be matched.
Using `ov::pass::ModelPass`, you need to override the `run_on_model` method where you will write the transformation code.
Return value is `true` if the original model has changed during transformation (new operation was added, or operations replacement was made, or node attributes were changed); otherwise, it is `false`.
Also `ov::pass::ModelPass` based transformations can be executed via `ov::pass::Manager`.
# Overview of Transformations API {#openvino_docs_transformations}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
openvino_docs_Extensibility_UG_model_pass
openvino_docs_Extensibility_UG_matcher_pass
openvino_docs_Extensibility_UG_graph_rewrite_pass
@endsphinxdirective
OpenVINO Transformation mechanism allows to develop transformation passes to modify `ov::Model`. You can use this mechanism to apply additional optimizations to the original Model or transform unsupported subgraphs and operations to new operations which are supported by the plugin.
This guide contains all necessary information that you need to start implementing OpenVINO™ transformations.
## Working with Model
Before the moving to transformation part it is needed to say several words about functions which allow to modify `ov::Model`.
This chapter extends the [model representation guide](../OV_Runtime_UG/model_representation.md) and shows an API that allows us to manipulate with `ov::Model`.
### Working with node input and output ports
First of all let's talk about `ov::Node` input/output ports. Each OpenVINO™ operation has input and output ports except cases when operation has `Parameter` or `Constant` type.
Every port belongs to its node, so using a port we can access parent node, get shape and type for particular input/output, get all consumers in case of output port, and get producer node in case of input port.
With output port we can set inputs for newly created operations.
Lets look at the code example.
@snippet ov_model_snippets.cpp ov:ports_example
### Node replacement
OpenVINO™ provides two ways for node replacement: via OpenVINO™ helper function and directly via port methods. We are going to review both of them.
Let's start with OpenVINO™ helper functions. The most popular function is `ov::replace_node(old_node, new_node)`.
We will review real replacement case where Negative operation is replaced with Multiply.
![ngraph_replace_node]
@snippet ov_model_snippets.cpp ov:replace_node
`ov::replace_node` has a constraint that number of output ports for both of ops must be the same; otherwise, it raises an exception.
The alternative way to do the same replacement is the following:
@snippet ov_model_snippets.cpp ov:manual_replace
Another transformation example is insertion.
![ngraph_insert_node]
@snippet ov_model_snippets.cpp ov:insert_node
The alternative way to the insert operation is to make a node copy and use `ov::replace_node()`:
* [Graph rewrite pass](./graph_rewrite_pass.md) - container for matcher passes needed for efficient execution
![transformations_structure]
## Transformation conditional compilation
Transformation library has two internal macros to support conditional compilation feature.
*`MATCHER_SCOPE(region)` - allows to disable the MatcherPass if matcher isn't used. The region name should be unique. This macro creates a local variable `matcher_name` which you should use as a matcher name.
*`RUN_ON_MODEL_SCOPE(region)` - allows to disable run_on_model pass if it isn't used. The region name should be unique.
When developing a transformation, you need to follow these transformation rules:
###1. Friendly Names
Each `ov::Node` has an unique name and a friendly name. In transformations we care only about friendly name because it represents the name from the model.
To avoid losing friendly name when replacing node with other node or subgraph, set the original friendly name to the latest node in replacing subgraph. See the example below.
In more advanced cases, when replaced operation has several outputs and we add additional consumers to its outputs, we make a decision how to set friendly name by arrangement.
###2. Runtime Info
Runtime info is a map `std::map<std::string, ov::Any>` located inside `ov::Node` class. It represents additional attributes in `ov::Node`.
These attributes can be set by users or by plugins and when executing transformation that changes `ov::Model` we need to preserve these attributes as they will not be automatically propagated.
In most cases, transformations have the following types: 1:1 (replace node with another node), 1:N (replace node with a sub-graph), N:1 (fuse sub-graph into a single node), N:M (any other transformation).
Currently, there is no mechanism that automatically detects transformation types, so we need to propagate this runtime information manually. See the examples below.
When transformation has multiple fusions or decompositions, `ov::copy_runtime_info` must be called multiple times for each case.
**Note**: copy_runtime_info removes rt_info from destination nodes. If you want to keep it, you need to specify them in source nodes like this: copy_runtime_info({a, b, c}, {a, b})
###3. Constant Folding
If your transformation inserts constant sub-graphs that need to be folded, do not forget to use `ov::pass::ConstantFolding()` after your transformation or call constant folding directly for operation.
The example below shows how constant subgraph can be constructed.
## Common mistakes in transformations <a name="common_mistakes"></a>
In transformation development process:
* Do not use deprecated OpenVINO™ API. Deprecated methods has the `OPENVINO_DEPRECATED` macros in its definition.
* Do not pass `shared_ptr<Node>` as an input for other node if type of node is unknown or it has multiple outputs. Use explicit output port.
* If you replace node with another node that produces different shape, remember that new shape will not be propagated until the first `validate_nodes_and_infer_types` call for `ov::Model`. If you are using `ov::pass::Manager`, it will automatically call this method after each transformation execution.
* Do not forget to call the `ov::pass::ConstantFolding` pass if your transformation creates constant subgraphs.
* Use latest OpSet if you are not developing downgrade transformation pass.
* When developing a callback for `ov::pass::MatcherPass`, do not change nodes that come after the root node in topological order.
## Using pass manager <a name="using_pass_manager"></a>
`ov::pass::Manager` is a container class that can store the list of transformations and execute them. The main idea of this class is to have high-level representation for grouped list of transformations.
It can register and apply any [transformation pass](#transformations_types) on model.
In addition, `ov::pass::Manager` has extended debug capabilities (find more information in the [how to debug transformations](#how_to_debug_transformations) section).
The example below shows basic usage of `ov::pass::Manager`
## How to debug transformations <a name="how_to_debug_transformations"></a>
If you are using `ngraph::pass::Manager` to run sequence of transformations, you can get additional debug capabilities by using the following environment variables:
```
OV_PROFILE_PASS_ENABLE=1 - enables performance measurement for each transformation and prints execution status
OV_ENABLE_VISUALIZE_TRACING=1 - enables visualization after each transformation. By default, it saves dot and svg files.
```
> **Note**: Make sure that you have dot installed on your machine; otherwise, it will silently save only dot file without svg file.
## See Also
* [OpenVINO™ Model Representation](../OV_Runtime_UG/model_representation.md)
Custom operations, that is those not included in the list, are not recognized by Model Optimizer out-of-the-box. Therefore, creating Intermediate Representation (IR) for a model using them requires additional steps. This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to plug in your own implementation for existing or completely new operations.
> **NOTE**: *Layer* is a legacy term for *operation* which came from Caffe\* framework. Currently it is not used.
> Refer to the [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md)
> for more information on the topic.
## Terms Used in This Guide
- *Intermediate Representation (IR)* — OpenVINO's Neural Network format used by Inference Engine. It abstracts different frameworks and describs model topology, operations parameters, and weights.
- *Operation* — an abstract concept of a math function selected for a specific purpose. Operations supported by
OpenVINO™ are listed in the supported operation set provided in the [Available Operations Sets](../ops/opset.md).
Examples of the operations are: [ReLU](../ops/activation/ReLU_1.md), [Convolution](../ops/convolution/Convolution_1.md),
[Add](../ops/arithmetic/Add_1.md), etc.
- *Kernel* — The implementation of an operation function in the OpenVINO™ plugin, in this case, the math programmed (in
C++ and OpenCL) to perform the operation for a target hardware (CPU or GPU).
- *Inference Engine Extension* — Device-specific module implementing custom operations (a set of kernels).
## Custom Operation Support Overview
There are three steps to support inference of a model with custom operation(s):
1. Add support for a custom operation in the [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) so
the Model Optimizer can generate the IR with the operation.
2. Create an operation set and implement a custom nGraph operation in it as described in the
> **NOTE**: If a device doesn't support a particular operation, an alternative to creating a new operation is to target
> an additional device using the HETERO plugin. The [Heterogeneous Plugin](../OV_Runtime_UG/supported_plugins/HETERO.md) may be
> used to run an inference model on multiple devices allowing the unsupported operations on one device to "fallback" to
> run on another device (e.g., CPU) that does support those operations.
### Custom Operation Support for the Model Optimizer
Model Optimizer model conversion pipeline is described in detail in "Model Conversion Pipeline" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md). It is best to read that article first for a better understanding of the following material.
Model Optimizer provides an extensions mechanism to support new operations and implement custom model transformations to generate optimized IR. This mechanism is described in the "Model Optimizer Extensions" section of
Two types of Model Optimizer extensions should be implemented to support custom operations, at a minimum:
1. Operation class for a new operation. This class stores information about the operation, its attributes, shape inference function, attributes to be saved to an IR and some others internally used attributes. Refer to the "Model Optimizer Operation" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for detailed instructions on how to implement it.
2. Operation attributes extractor. The extractor is responsible for parsing framework-specific representation of the
operation and uses corresponding operation class to update graph node attributes with necessary attributes of the
operation. Refer to the "Operation Extractor" section of
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for detailed instructions on how to implement it.
> **NOTE**: In some cases you may need to implement some transformation to support the operation. This topic is covered in the "Graph Transformation Extensions" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
## Custom Operations Extensions for the Inference Engine
Inference Engine provides an extension mechanism to support new operations. This mechanism is described in [Inference Engine Extensibility Mechanism](../OV_Runtime_UG/Extensibility_DG/Intro.md).
Each device plugin includes a library of optimized implementations to execute known operations which must be extended to execute a custom operation. The custom operation extension is implemented according to the target device:
- Custom Operation CPU Extension
- A compiled shared library (`.so` or `.dll`) needed by the CPU Plugin for executing the custom operation
on a CPU. Refer to the [How to Implement Custom CPU Operations](../OV_Runtime_UG/Extensibility_DG/CPU_Kernel.md) for more
details.
- Custom Operation GPU Extension
- OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the GPU along with an operation description file (.xml) needed by the GPU Plugin for the custom operation kernel. Refer to the [How to Implement Custom GPU Operations](../OV_Runtime_UG/Extensibility_DG/GPU_Kernel.md) for more details.
- Custom Operation VPU Extension
- OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the VPU along with an operation description file (.xml) needed by the VPU Plugin for the custom operation kernel. Refer to [How to Implement Custom Operations for VPU](../OV_Runtime_UG/Extensibility_DG/VPU_Kernel.md) for more details.
Also, it is necessary to implement nGraph custom operation according to [Custom nGraph Operation](../OV_Runtime_UG/Extensibility_DG/AddingNGraphOps.md) so the Inference Engine can read an IR with this
operation and correctly infer output tensor shape and type.
## Enabling Magnetic Resonance Image Reconstruction Model
This chapter provides step-by-step instructions on how to enable the magnetic resonance image reconstruction model implemented in the [repository](https://github.com/rmsouza01/Hybrid-CS-Model-MRI/) using a custom operation on CPU. The example is prepared for a model generated from the repository with hash `2ede2f96161ce70dcdc922371fe6b6b254aafcc8`.
### Download and Convert the Model to a Frozen TensorFlow\* Model Format
The original pre-trained model is provided in the hdf5 format which is not supported by OpenVINO directly and needs to be converted to TensorFlow\* frozen model format first.
As a result the TensorFlow\* frozen model file "wnet_20.pb" is generated.
### Convert the Frozen TensorFlow\* Model to Intermediate Representation
Firstly, open the model in TensorBoard or other TensorFlow* model visualization tool. The model supports dynamic
batch dimension because the value for the batch dimension is not hardcoded in the model. Model Optimizer need to set all
dynamic dimensions to some specific value to create the IR, therefore specify the command line parameter `-b 1` to set
the batch dimension equal to 1. The actual batch size dimension can be changed at runtime using the Inference Engine API
described in the [Using Shape Inference](../OV_Runtime_UG/ShapeInference.md). Also refer to the General Conversion Parameters section in [Converting a Model to Intermediate Representation (IR)](../MO_DG/prepare_model/convert_model/Converting_Model.md) and [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md)
for more details and command line parameters used for the model conversion.
```sh
mo --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1
```
> **NOTE**: This conversion guide is applicable for the 2021.3 release of OpenVINO and that starting from 2021.4
> the OpenVINO supports this model out of the box.
Model Optimizer produces the following error:
```bash
[ ERROR ] List of operations that cannot be converted to Inference Engine IR:
[ ERROR ] Complex (1)
[ ERROR ] lambda_2/Complex
[ ERROR ] IFFT2D (1)
[ ERROR ] lambda_2/IFFT2D
[ ERROR ] ComplexAbs (1)
[ ERROR ] lambda_2/Abs
[ ERROR ] Part of the nodes was not converted to IR. Stopped.
```
The error means that the Model Optimizer doesn't know how to handle 3 types of TensorFlow\* operations: "Complex",
"IFFT2D" and "ComplexAbs". In order to see more details about the conversion process run the model conversion with
additional parameter `--log_level DEBUG`. It is worth to mention the following lines from the detailed output:
```bash
[ INFO ] Called "tf_native_tf_node_infer"for node "lambda_2/Complex"
[ <TIMESTAMP> ][ DEBUG ][ tf:228 ] Added placeholder with name 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
[ <TIMESTAMP> ][ DEBUG ][ tf:228 ] Added placeholder with name 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
This model uses complex numbers during the inference but Inference Engine does not support tensors of this data type. So
it is necessary to find a way how to avoid using tensors of such a type in the model. Fortunately, the complex tensor
appear as a result of "Complex" operation, is used as input in the "IFFT2D" operation then is passed to "ComplexAbs"
which produces real value tensor as output. So there are just 3 operations consuming/producing complex tensors in the
model.
Let's design an OpenVINO operation "FFT" which get a single real number tensor describing the complex number and
produces a single real number tensor describing output complex tensor. This way the fact that the model uses complex
numbers is hidden inside the "FFT" operation implementation. The operation gets a tensor of shape `[N, H, W, 2]` and
produces the output tensor with the same shape, where the innermost dimension contains pairs of real numbers describing
the complex number (its real and imaginary part). As we will see further this operation will allow us to support the
model. The implementation of the Model Optimizer operation should be saved to `mo_extensions/ops/FFT.py` file:
@snippet FFT.py fft:operation
The attribute `inverse` is a flag specifying type of the FFT to apply: forward or inverse.
See the "Model Optimizer Operation" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for detailed instructions on how to implement the operation.
Now it is necessary to implement extractor for the "IFFT2D" operation according to the
"Operation Extractor" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md). The
following snippet provides two extractors: one for "IFFT2D", another one for "FFT2D", however only on of them is used in this example. The implementation should be saved to the file `mo_extensions/front/tf/FFT_ext.py`.
@snippet FFT_ext.py fft_ext:extractor
> **NOTE**: The graph is in inconsistent state after extracting node attributes because according to original operation
> "IFFT2D" semantic it should have an input consuming a tensor of complex numbers, but the extractor instantiated an
> operation "FFT" which expects a real tensor with specific layout. But the inconsistency will be resolved during
> applying front phase transformations discussed below.
The output shape of the operation "AddV2" from the picture above is `[N, H, W, 2]`. Where the innermost dimension
contains pairs of real numbers describing the complex number (its real and imaginary part). The following "StridedSlice"
operations split the input tensor into 2 parts to get a tensor of real and a tensor of imaginary parts which are then
consumed with the "Complex" operation to produce a tensor of complex numbers. These "StridedSlice" and "Complex"
operations can be removed so the "FFT" operation will get a real value tensor encoding complex numbers. To achieve this
we implement the front phase transformation which searches for a pattern of two "StridedSlice" operations with specific
attributes producing data to "Complex" operation and removes it from the graph. Refer to the
"Pattern-Defined Front Phase Transformations" section of [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for more
information on how this type of transformation works. The code snippet should be saved to the file
`mo_extensions/front/tf/Complex.py`.
@snippet Complex.py complex:transformation
> **NOTE**: The graph is in inconsistent state because the "ComplexAbs" operation consumes complex value tensor but
> "FFT" produces real value tensor.
Now lets implement a transformation which replace a "ComplexAbs" operation with a sub-graph of primitive operations
which calculate the result using the following formulae: \f$module(z) = \sqrt{real(z) \cdot real(z) + imag(z) \cdot imag(z)}\f$.
Original "IFFT2D" operation produces tensor of complex values, but the "FFT" operation produces a real value tensor with
the same format and shape as the input for the operation. So the input shape for the "ComplexAbs" will be `[N, H, W, 2]`
with the innermost dimension containing tuple with real and imaginary part of a complex number. In order to calculate
absolute values for the complex tensor we do the following:
1. Raise all elements in the power of 2.
2. Calculate a reduced sum over the innermost dimension.
3. Calculate a square root.
The implementation should be saved to the file `mo_extensions/front/tf/ComplexAbs.py` and provided below:
@snippet ComplexAbs.py complex_abs:transformation
Now it is possible to convert the model using the following command line:
```sh
mo --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1 --extensions mo_extensions/
```
The sub-graph corresponding to the originally non-supported one is depicted in the image below:
- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit)
Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure.
Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
- The class has the `_pipeline` field of `std::vector<std::pair<ITaskExecutor::Ptr, Task> >`, which contains pairs of an executor and executed task.
- All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
@@ -10,7 +10,7 @@ Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequest
`AsyncInferRequest` Class
------------------------
Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
@@ -37,8 +37,8 @@ The implementation `CompileNetwork` is fully device-specific.
The function accepts a const shared pointer to `ngraph::Function` object and performs the following steps:
1. Applies ngraph passes using `TransformNetwork` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_IE_DG_lpt) guide.
2. Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
1. Applies nGraph passes using `TransformNetwork` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_OV_UG_lpt) guide.
2. Maps the transformed graph to a backend specific graph representation (for example, to CPU plugin internal graph representation).
3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
Inference Engine Plugin usually represents a wrapper around a backend. Backends can be:
- OpenCL-like backend (e.g. clDNN library) for GPU devices.
-MKLDNN backend for Intel CPU devices.
-oneDNN backend for Intel CPU devices.
- NVIDIA cuDNN for NVIDIA GPUs.
The responsibility of Inference Engine Plugin:
@@ -30,7 +30,7 @@ Based on that, declaration of a plugin class can look as follows:
The provided plugin class also has several fields:
*`_backend` - a backend engine that is used to perform actual computations for network inference. For `Template` plugin `ngraph::runtime::Backend` is used which performs computations using ngraph reference implementations.
*`_backend` - a backend engine that is used to perform actual computations for network inference. For `Template` plugin `ngraph::runtime::Backend` is used which performs computations using OpenVINO™ reference implementations.
*`_waitExecutor` - a task executor that waits for a response from a device about device tasks completion.
*`_cfg` of type `Configuration`:
@@ -67,7 +67,7 @@ which holds a backend-dependent compiled graph in an internal representation:
Before a creation of an `ExecutableNetwork` instance via a constructor, a plugin may check if a provided
InferenceEngine::ICNNNetwork object is supported by a device. In the example above, the plugin checks precision information.
The very important part before creation of `ExecutableNetwork` instance is to call `TransformNetwork` method which applies ngraph transformation passes.
The very important part before creation of `ExecutableNetwork` instance is to call `TransformNetwork` method which applies OpenVINO™ transformation passes.
Actual graph compilation is done in the `ExecutableNetwork` constructor. Refer to the [ExecutableNetwork Implementation Guide](@ref openvino_docs_ie_plugin_dg_executable_network) for details.
@@ -77,27 +77,27 @@ Actual graph compilation is done in the `ExecutableNetwork` constructor. Refer t
### `TransformNetwork()`
The function accepts a const shared pointer to `ngraph::Function` object and performs the following steps:
The function accepts a const shared pointer to `ov::Model` object and performs the following steps:
1. Deep copies a const object to a local object, which can later be modified.
2. Applies common and plugin-specific transformations on a copied graph to make the graph more friendly to hardware operations. For details how to write custom plugin-specific transformation, please, refer to [Writing ngraph transformations](@ref ngraph_transformation) guide. See detailed topics about network representation:
2. Applies common and plugin-specific transformations on a copied graph to make the graph more friendly to hardware operations. For details how to write custom plugin-specific transformation, please, refer to [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide. See detailed topics about network representation:
* [Intermediate Representation and Operation Sets](../_docs_MO_DG_IR_and_opsets.html)
> **NOTE**: After all these transformations, a `ngraph::Function` object contains operations which can be perfectly mapped to backend kernels. E.g. if backend has kernel computing `A + B` operations at once, the `TransformNetwork` function should contain a pass which fuses operations `A` and `B` into a single custom operation `A + B` which fits backend kernels set.
> **NOTE**: After all these transformations, a `ov::Model` object contains operations which can be perfectly mapped to backend kernels. E.g. if backend has kernel computing `A + B` operations at once, the `TransformNetwork` function should contain a pass which fuses operations `A` and `B` into a single custom operation `A + B` which fits backend kernels set.
### `QueryNetwork()`
Use the method with the `HETERO` mode, which allows to distribute network execution between different
devices based on the `ngraph::Node::get_rt_info()` map, which can contain the `"affinity"` key.
devices based on the `ov::Node::get_rt_info()` map, which can contain the `"affinity"` key.
The `QueryNetwork` method analyzes operations of provided `network` and returns a list of supported
operations via the InferenceEngine::QueryNetworkResult structure. The `QueryNetwork` firstly applies `TransformNetwork` passes to input `ngraph::Function` argument. After this, the transformed network in ideal case contains only operations are 1:1 mapped to kernels in computational backend. In this case, it's very easy to analyze which operations is supposed (`_backend` has a kernel for such operation or extensions for the operation is provided) and not supported (kernel is missed in `_backend`):
operations via the InferenceEngine::QueryNetworkResult structure. The `QueryNetwork` firstly applies `TransformNetwork` passes to input `ov::Model` argument. After this, the transformed network in ideal case contains only operations are 1:1 mapped to kernels in computational backend. In this case, it's very easy to analyze which operations is supposed (`_backend` has a kernel for such operation or extensions for the operation is provided) and not supported (kernel is missed in `_backend`):
1. Store original names of all operations in input `ngraph::Function`
1. Store original names of all operations in input `ov::Model`
2. Apply `TransformNetwork` passes. Note, the names of operations in a transformed network can be different and we need to restore the mapping in the steps below.
3. Construct `supported` and `unsupported` maps which contains names of original operations. Note, that since the inference is performed using ngraph reference backend, the decision whether the operation is supported or not depends on whether the latest OpenVINO opset contains such operation.
3. Construct `supported` and `unsupported` maps which contains names of original operations. Note, that since the inference is performed using OpenVINO™ reference backend, the decision whether the operation is supported or not depends on whether the latest OpenVINO opset contains such operation.
4.`QueryNetworkResult.supportedLayersMap` contains only operations which are fully supported by `_backend`.
3.**Sub-graph tests** (`subgraph_tests` sub-folder). This group of tests is designed to tests small patterns or combination of layers. E.g. when a particular topology is being enabled in a plugin e.g. TF ResNet-50, there is no need to add the whole topology to test tests. In opposite way, a particular repetitive subgraph or pattern can be extracted from `ResNet-50` and added to the tests. The instantiation of the sub-graph tests is done in the same way as for single layer tests.
> **Note**, such sub-graphs or patterns for sub-graph tests should be added to `IE::ngraphFunctions` library first (this library is a pre-defined set of small `ngraph::Function`) and re-used in sub-graph tests after.
> **Note**, such sub-graphs or patterns for sub-graph tests should be added to `IE::ngraphFunctions` library first (this library is a pre-defined set of small `ov::Model`) and re-used in sub-graph tests after.
4.**HETERO tests** (`subgraph_tests` sub-folder) contains tests for `HETERO` scenario (manual or automatic affinities settings, tests for `QueryNetwork`).
@@ -41,18 +41,14 @@ To use these tests for your own plugin development, link the `IE::funcSharedTest
To build test binaries together with other build artifacts, use the `make all` command. For details, see
[Build Plugin Using CMake*](@ref openvino_docs_ie_plugin_dg_plugin_build).
### Tests for plugin-specific ngraph transformations
Please, refer to [Transformation testing](@ref ngraph_transformation) guide.
### How to Extend Inference Engine Plugin Tests
Inference Engine Plugin tests are open for contribution.
Add common test case definitions applicable for all plugins to the `IE::funcSharedTests` target within the DLDT repository. Then, any other plugin supporting corresponding functionality can instantiate the new test.
All Inference Engine per-layer tests check test layers functionality. They are developed using nGraph functions
All Inference Engine per-layer tests check test layers functionality. They are developed using ov::Model.
as input graphs used by tests. In this case, to test a new layer with layer tests, extend
the `IE::ngraphFunctions` library, which is also included in the Inference Engine Developer package, with a new nGraph function
the `IE::ngraphFunctions` library, which is also included in the Inference Engine Developer package, with a new model.
including the corresponding operation.
> **NOTE**: When implementing a new subgraph test, add new single-layer tests for each operation of the subgraph if such test does not exist.
@@ -9,7 +9,7 @@ For more details about low-precision model representation please refer to this [
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
- Independently based on the definition of *FakeQuantize* operation.
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](@ref openvino_docs_IE_DG_Int8Inference).
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into models with low-precision operations.
Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.
@@ -72,11 +72,7 @@ For example, if you would like to infer a model with `Convolution` operation in
> There are several supported quantization approaches on activations and on weights. All supported approaches are described in [Quantization approaches](#quantization-approaches) section below. In demonstrated model [FakeQuantize operation quantization](#fakequantize-operation) approach is used.
Additionally, low precision transformations can handle ONNX quantized models.
For more details on how to get a quantized model, refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
## Quantization approaches
LPT transformations support two quantization approaches:
@@ -115,63 +111,63 @@ Inside each step LPT transformations handle input model operation by operation,
As result, usually all operations are inferred by plugin in low precision. If plugin doesn't support an operation inference in low precision, then corresponding LPT transformation can be disabled, and input tensor precisions for the operation will not be changed. In this case the operation is inferred in the original precision.
Low precision transformations pipeline includes four steps:
The model on this step is changed. There are more details in developer guide [Prerequisites transformations](@ref openvino_docs_IE_DG_lpt_step1_prerequisites).
The model on this step is changed. There are more details in developer guide [Prerequisites transformations](@ref openvino_docs_OV_UG_lpt_step1_prerequisites).
### Step 2. Markup
This step creates runtime attributes for operations. These attributes will be used in next step. Transformations:
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide [Markup transformations](@ref openvino_docs_IE_DG_lpt_step2_markup).
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide [Markup transformations](@ref openvino_docs_OV_UG_lpt_step2_markup).
### Step 3. Main transformations, FakeQuantize decomposition and dequantization operations handling
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide [Main transformations](@ref openvino_docs_IE_DG_lpt_step3_main). Transformations:
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide [Main transformations](@ref openvino_docs_OV_UG_lpt_step3_main). Transformations:
Decomposition transformations decompose the `FakeQuantize` operation to: quantize (`FakeQuantize` with low precision output) and dequantization operations (opposite to quantize, with low precision input and the original precision output). For dequantization operations LPT uses three operations: `Convert`, `Subtract` and `Multiply`. Element-wise operations `Subtract` and `Multiply` have constants on the second branches. If dequantization operations are not handled at the end of LPT pipeline, then they will be fused back to the `FakeQuantize`.
@@ -197,14 +193,14 @@ Original `Convolution` operation in FP32 with dequantization operations before:
### Step 4: Cleanup of the result model
LPT cleanup transformations is final stage in LPT pipeline. In this step LPT transformations clean up the result model to avoid not handled dequantization operations: fuse dequantization operations if possible (fuse at least `Convert` operations if not) to other model operations to cleanup result model. Transformations:
There are more details in developer guide [Cleanup transformations](@ref openvino_docs_IE_DG_lpt_step4_cleanup).
There are more details in developer guide [Cleanup transformations](@ref openvino_docs_OV_UG_lpt_step4_cleanup).
`FakeQuantize` operation with not handled dequantization operations:

@@ -220,27 +216,27 @@ Typical transformation pipeline described below.
### Step 1. Common optimizations
This step is optional for LPT but typically is presented in OpenVINO™ plugins. The step doesn't use any LPT transformation. Firstly, the step disables dequantization operations constant folding on constant subgraph on weights to prevent the lost of dequantization info on the next plugin transformations. After that, it optimizes nGraph function and convert operations to operation set 1. Typically, usage of this step is the simplest way to meet LPT requirements for the input quantized model. If plugin can guarantee that LPT input requirements are met, then this step can be skipped.
Let's explore quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) tool to download the `fp16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo):
@@ -259,7 +255,7 @@ Result model depends on different factors:
Information about layer precision is stored in the performance counters that are
available from the Inference Engine API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows:
available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows:
@@ -298,14 +294,14 @@ Low Precision Transformations can be customizable. Build-in customization option
### Operation precision restrictions
This option defines precisions which allowed for the operation input ports. The option value is passed as input argument for `LowPrecision` constructor. For example:
In provided example in result model `Convolution` operation inputs must have specific precisions: `u8` (unsigned int8) precision on input 0 (on activations) and `i8` (signed int8) precision on input 1 (on weights).
### Operation per tensor quantization restrictions
This option defines if operation supports per-tensor quantization only. The option value is passed as input argument for `LowPrecision` constructor. For example:
In provided example in result model `Convolution` operations must have per-tensor quantization on input 0 (on activations).
@@ -316,4 +312,4 @@ This option defines if each LPT transformation updates precision or not. The opt
Plugin specific customization can be implemented via nGraph transformation callbacks. For example: asymmetric quantization support can be easily customizable via `LayerTransformation::isAsymmetricQuantization` and `WeightableLayerTransformation::isAsymmetricOnWeights` methods usage in callbacks. For example:
Prerequisites transformations are optional. The transformations prepare a model before running other low precision transformations. The transformations do not operate with dequantization operations or update precisions. Prerequisites transformations include:
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.