Files
openvino/samples/cpp/benchmark_app/benchmark_app.hpp

415 lines
20 KiB
C++
Raw Normal View History

// Copyright (C) 2018-2022 Intel Corporation
2019-04-12 18:25:53 +03:00
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#if defined(HAVE_GPU_DEVICE_MEM_SUPPORT)
# define HAVE_DEVICE_MEM_SUPPORT
#endif
2019-04-12 18:25:53 +03:00
#include <iostream>
#include <string>
#include <vector>
2019-04-12 18:25:53 +03:00
#include "gflags/gflags.h"
2019-04-12 18:25:53 +03:00
/// @brief message for help argument
static const char help_message[] = "Print a usage message";
/// @brief message for images argument
static const char input_message[] =
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
"Optional. Path to a folder with images and/or binaries or to specific image or binary file.\n"
" In case of dynamic shapes networks with several inputs provide the same number"
" of files for each input (except cases with single file for any input):"
"\"input1:1.jpg input2:1.bin\", \"input1:1.bin,2.bin input2:3.bin input3:4.bin,5.bin \"."
" Also you can pass specific keys for inputs: \"random\" - for fillling input with random data,"
" \"image_info\" - for filling input with image size.";
2019-04-12 18:25:53 +03:00
/// @brief message for model argument
static const char model_message[] =
"Required. Path to an .xml/.onnx file with a trained model or to a .blob files with "
"a trained compiled model.";
2019-04-12 18:25:53 +03:00
OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) (#6993) * rebasing the perf-modes-2021.3 to the 2021.4 Caveats: the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet) (cherry picked from commit 1ae1edc0ed70fdea40f528fdaf8d00a9904d2a5c) * overriding streams (to force the TPUT mode to the DLBenchnark) (cherry picked from commit 7f506cda31abf35ac293d0dce32f602a0188c619) * disabling reducing #streams to fully mimic baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments) (cherry picked from commit 85073dd1dd2c7d43a89c37c8f646313f6ddfc650) * clang/identation (cherry picked from commit 050a4155a923cee294c8689d685b39247b7a172a) * splitting the Transformation to general and CPU specific. Now hopefully,this fully mimics the baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled (cherry picked from commit e98b2c1a67f2542a686543594b75b575ef515196) * disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8 (cherry picked from commit 32b8d80dee18685ebf3d069bb4cd2172af7363b7) * refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations (cherry picked from commit f2b972171b29cf599aae2407ceec2e6adb67e4e9) * isa-based threshold logic (cherry picked from commit b218457e1a93fcb3374eb9da948fdad2175ec33a) * mode->hint (cherry picked from commit ec20aa8ecaf3222f2a6fdfe9153cf6c9dfdd6a54) * optional PERFORMANCE_HINT_NUM_REQUESTS (cherry picked from commit 5a3883e3f36e7928c6391094ae10711c8e4c3b5c) * moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master) (cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776) * AUTO support for PerfHints * MULTI support for PerfHints * Enabling Perf hints for the GPU plugin * brushing settings output a bit * disabling "throughput" perf hint being default (until OV 2.0) * uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default * removing dead and experimental code, and debug printfs * clang/code-style * code-review remarks * Moved the output of the actual params that the hint produced to the right place * aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum * clang * benchmark_app brushing * Update inference-engine/samples/benchmark_app/README.md * propagating the perf hints thru one more scenario in the merged AUTO-MULTI * fixed mispint * Python benchmark_app update for perf hints * addresssing reviewers comments on the python benchmark_app * simplifying/brushing logic a bit * refactor the heuristic to the separate file (to be shared with iGPU soon) * refactor conversion of modes to the specific GPU config per feedback from Vladimir
2021-09-13 15:40:36 +03:00
/// @brief message for performance hint
static const char hint_message[] =
"Optional. Performance hint allows the OpenVINO device to select the right network-specific settings.\n"
" 'throughput' or 'tput': device performance mode will be set to THROUGHPUT.\n"
" 'latency': device performance mode will be set to LATENCY.\n"
" 'none': no device performance mode will be set.\n"
" Using explicit 'nstreams' or other device-specific options, please set hint to "
"'none'";
OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) (#6993) * rebasing the perf-modes-2021.3 to the 2021.4 Caveats: the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet) (cherry picked from commit 1ae1edc0ed70fdea40f528fdaf8d00a9904d2a5c) * overriding streams (to force the TPUT mode to the DLBenchnark) (cherry picked from commit 7f506cda31abf35ac293d0dce32f602a0188c619) * disabling reducing #streams to fully mimic baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments) (cherry picked from commit 85073dd1dd2c7d43a89c37c8f646313f6ddfc650) * clang/identation (cherry picked from commit 050a4155a923cee294c8689d685b39247b7a172a) * splitting the Transformation to general and CPU specific. Now hopefully,this fully mimics the baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled (cherry picked from commit e98b2c1a67f2542a686543594b75b575ef515196) * disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8 (cherry picked from commit 32b8d80dee18685ebf3d069bb4cd2172af7363b7) * refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations (cherry picked from commit f2b972171b29cf599aae2407ceec2e6adb67e4e9) * isa-based threshold logic (cherry picked from commit b218457e1a93fcb3374eb9da948fdad2175ec33a) * mode->hint (cherry picked from commit ec20aa8ecaf3222f2a6fdfe9153cf6c9dfdd6a54) * optional PERFORMANCE_HINT_NUM_REQUESTS (cherry picked from commit 5a3883e3f36e7928c6391094ae10711c8e4c3b5c) * moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master) (cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776) * AUTO support for PerfHints * MULTI support for PerfHints * Enabling Perf hints for the GPU plugin * brushing settings output a bit * disabling "throughput" perf hint being default (until OV 2.0) * uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default * removing dead and experimental code, and debug printfs * clang/code-style * code-review remarks * Moved the output of the actual params that the hint produced to the right place * aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum * clang * benchmark_app brushing * Update inference-engine/samples/benchmark_app/README.md * propagating the perf hints thru one more scenario in the merged AUTO-MULTI * fixed mispint * Python benchmark_app update for perf hints * addresssing reviewers comments on the python benchmark_app * simplifying/brushing logic a bit * refactor the heuristic to the separate file (to be shared with iGPU soon) * refactor conversion of modes to the specific GPU config per feedback from Vladimir
2021-09-13 15:40:36 +03:00
2019-04-12 18:25:53 +03:00
/// @brief message for execution mode
OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) (#6993) * rebasing the perf-modes-2021.3 to the 2021.4 Caveats: the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet) (cherry picked from commit 1ae1edc0ed70fdea40f528fdaf8d00a9904d2a5c) * overriding streams (to force the TPUT mode to the DLBenchnark) (cherry picked from commit 7f506cda31abf35ac293d0dce32f602a0188c619) * disabling reducing #streams to fully mimic baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments) (cherry picked from commit 85073dd1dd2c7d43a89c37c8f646313f6ddfc650) * clang/identation (cherry picked from commit 050a4155a923cee294c8689d685b39247b7a172a) * splitting the Transformation to general and CPU specific. Now hopefully,this fully mimics the baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled (cherry picked from commit e98b2c1a67f2542a686543594b75b575ef515196) * disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8 (cherry picked from commit 32b8d80dee18685ebf3d069bb4cd2172af7363b7) * refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations (cherry picked from commit f2b972171b29cf599aae2407ceec2e6adb67e4e9) * isa-based threshold logic (cherry picked from commit b218457e1a93fcb3374eb9da948fdad2175ec33a) * mode->hint (cherry picked from commit ec20aa8ecaf3222f2a6fdfe9153cf6c9dfdd6a54) * optional PERFORMANCE_HINT_NUM_REQUESTS (cherry picked from commit 5a3883e3f36e7928c6391094ae10711c8e4c3b5c) * moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master) (cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776) * AUTO support for PerfHints * MULTI support for PerfHints * Enabling Perf hints for the GPU plugin * brushing settings output a bit * disabling "throughput" perf hint being default (until OV 2.0) * uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default * removing dead and experimental code, and debug printfs * clang/code-style * code-review remarks * Moved the output of the actual params that the hint produced to the right place * aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum * clang * benchmark_app brushing * Update inference-engine/samples/benchmark_app/README.md * propagating the perf hints thru one more scenario in the merged AUTO-MULTI * fixed mispint * Python benchmark_app update for perf hints * addresssing reviewers comments on the python benchmark_app * simplifying/brushing logic a bit * refactor the heuristic to the separate file (to be shared with iGPU soon) * refactor conversion of modes to the specific GPU config per feedback from Vladimir
2021-09-13 15:40:36 +03:00
static const char api_message[] = "Optional (deprecated). Enable Sync/Async API. Default value is \"async\".";
2019-04-12 18:25:53 +03:00
/// @brief message for assigning cnn calculation to device
static const char target_device_message[] =
"Optional. Specify a target device to infer on (the list of available devices is shown below). "
"Default value is CPU. Use \"-d HETERO:<comma-separated_devices_list>\" format to specify "
"HETERO plugin. "
"Use \"-d MULTI:<comma-separated_devices_list>\" format to specify MULTI plugin. "
"The application looks for a suitable plugin for the specified device.";
2019-04-12 18:25:53 +03:00
/// @brief message for iterations count
static const char iterations_count_message[] =
"Optional. Number of iterations. "
"If not specified, the number of iterations is calculated depending on a device.";
2019-04-12 18:25:53 +03:00
/// @brief message for requests count
static const char infer_requests_count_message[] =
"Optional. Number of infer requests. Default value is determined automatically for device.";
2019-08-09 19:02:42 +03:00
/// @brief message for execution time
static const char execution_time_message[] = "Optional. Time in seconds to execute topology.";
2019-04-12 18:25:53 +03:00
/// @brief message for #threads for CPU inference
static const char infer_num_threads_message[] = "Optional. Number of threads to use for inference on the CPU "
2019-10-04 19:26:43 +03:00
"(including HETERO and MULTI cases).";
2019-08-09 19:02:42 +03:00
/// @brief message for #streams for CPU inference
static const char infer_num_streams_message[] =
"Optional. Number of streams to use for inference on the CPU, GPU or MYRIAD devices "
"(for HETERO and MULTI device cases use format <dev1>:<nstreams1>,<dev2>:<nstreams2> or just "
"<nstreams>). "
"Default value is determined automatically for a device.Please note that although the "
"automatic selection "
"usually provides a reasonable performance, it still may be non - optimal for some cases, "
"especially for "
"very small networks. See sample's README for more details. "
"Also, using nstreams>1 is inherently throughput-oriented option, "
"while for the best-latency estimations the number of streams should be set to 1.";
2019-04-12 18:25:53 +03:00
/// @brief message for latency percentile settings
static const char infer_latency_percentile_message[] =
"Optional. Defines the percentile to be reported in latency metric. The valid range is [1, 100]. The default value "
"is 50 (median).";
/// @brief message for enforcing of BF16 execution where it is possible
static const char enforce_bf16_message[] =
"Optional. By default floating point operations execution in bfloat16 precision are enforced "
"if supported by platform.\n"
" 'true' - enable bfloat16 regardless of platform support\n"
" 'false' - disable bfloat16 regardless of platform support";
2019-04-12 18:25:53 +03:00
/// @brief message for user library argument
static const char custom_cpu_library_message[] =
"Required for CPU custom layers. Absolute path to a shared library with the kernels "
"implementations.";
2019-04-12 18:25:53 +03:00
/// @brief message for clDNN custom kernels desc
static const char custom_cldnn_message[] =
"Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.";
2019-04-12 18:25:53 +03:00
static const char batch_size_message[] =
"Optional. Batch size value. If not specified, the batch size value is determined from "
"Intermediate Representation.";
2019-04-12 18:25:53 +03:00
// @brief message for CPU threads pinning option
Openvino hybrid awareness (#5261) * change the deprecated method to the recent * first ver of the hybrid cores aware CPU streams (+debug info) * more debug and fixed sum threads * disabled NUMA pinning to experiment with affinity via OS * further brushing of stream to core type logic * hybrid CPU-aware getNumberOfCPUCores * adding check on the efficiency * experimental TBB package (that cmake should pull from the internal server) * iterating over core types in the reversed order (so the big cores are populated first in case user specified less than all #threads) * adding back the NUMA affinity code-path for the full validation (incl 2 sockets Windows Server) * cpplint fix and tabbing the #if clauses for the readbility * pre-production TBB from internal server * wrapping over #cores/types * wrapping over #cores/types, ver 2 * wrapping over #streams instead * disabling warnings as errors for a while (to unlock testing) * accomodating new TBB layout for dependencies.bat * next tbb ver (with debug binaries that probably can unlock the commodity builds, without playing product_configs) * minor brushing for experiments (so that pinning can be disabled) * minor brushing from code review * Updating the SHA hash which appeared when rebasing to the master * WIP refactoring * Completed refactoring of the "config" phase of the cpu stream executor and on-the-fly streams to core types mapping * making the benchmark_app aware about new pinning mode * Brushing a bit (in preparation for the "soft" affinity) * map to vector to simplify the things * updated executors comparison * more fine-grained pinning scheme for the HYBRID (required to allow all cores on 2+8 1+4, and other LITTLE-skewed scenarios) TODO: seprate little to big ratio for the fp322 and int8 (and pass the fp32Only flag to the MakeDefaultMultiTHreaded) * separating fp32 and int8 intensive cases for hybrid execution, also leveraging the HT if the #big_cores is small, refactored. also switched to the 2021.2 oneTBB RC package * code style * stripped tbb archives from unused folders and files, also has to rename the LICENSE.txt to the LICENSE to match existing OV packaging tools * assigning nodeId regradless of pinning mode * tests OpenCV builds with same 2021.2 oneTBB, ubuntu 18/20 * cmake install paths for oneTBB, alos a ie_parallel.cmake warning on older ver of TBB * Updated latency case desc to cover multi-socket machines * adding centos8 OCV with oneTBB build updating TBB drops with hwloc shared libs added. * enabled internal OCV from THIRD_PARTY_SERVER to test thru CI.. Added Centos7 notbb OCV build (until g-api get ready for onetbb) to unlock the Centos7 CI build * separate rpath log to respect one-tbb specific paths * fixed SEQ code-path * fixed doc misprint * allowing all cores in 2+8 for int8 as well * cleaned from debug printfs * HYBRID_AWARE pinning option for the Python benchmark_app * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Revert "cleaned from debug printfs" This reverts commit 82c9bd90c543a89c5cfeee7bf5b3ce0998ee49df. # Conflicts: # inference-engine/src/inference_engine/os/win/win_system_conf.cpp # inference-engine/src/inference_engine/threading/ie_cpu_streams_executor.cpp # inference-engine/src/mkldnn_plugin/config.cpp * Update tbbbind package version * fixed compilation * removing the direct tbb::info calls from CPU plugin, to aggregate everything in the single module (that exposes the higher level APIs) * Update tbbbind package version (cherry picked from commit f66b8f6aa63e160eb1788cf3da18a6ac73cf4007) * compilation fix * brushing the headers a bit * Make custom::task_arena inherited from tbb::task_arena * change to the latest TBB API, and more debug printfs * code-style * ARM compilation * aligned "failed system config" between OV and TBB (by using '-1') * macos compilation fix * default arena creation (to make sure all code-path have that fallback) * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * with all NUMA/Hybrid checks now consolidated in the custom_arena, cleaning the ugly ifdefs thta we had * Introduce new ThreadBindingType + fix compilation * fixing OMP compilation * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * fixed macos build, corrected comments/desc * reverted to the default binding selection logic ( to preserve the legacy beh) * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * Apply review comments * Fix compilation without tbbbind_2_4 * Fix compilation with different TBB versions * code review remarks * fix for the NONE pinning code-path under HYBRID_AWAR * whitespace and cleaning the debug printfs (per review) * code-review comments * fixed code-style Co-authored-by: Kochin, Ivan <ivan.kochin@intel.com> Co-authored-by: Kochin Ivan <kochin.ivan@intel.com>
2021-04-28 17:42:58 +03:00
static const char infer_threads_pinning_message[] =
"Optional. Explicit inference threads binding options (leave empty to let the OpenVINO to make a choice):\n"
"\t\t\t\tenabling threads->cores pinning(\"YES\", which is already default for any conventional CPU), \n"
"\t\t\t\tletting the runtime to decide on the threads->different core types(\"HYBRID_AWARE\", which is default on "
"the hybrid CPUs) \n"
Openvino hybrid awareness (#5261) * change the deprecated method to the recent * first ver of the hybrid cores aware CPU streams (+debug info) * more debug and fixed sum threads * disabled NUMA pinning to experiment with affinity via OS * further brushing of stream to core type logic * hybrid CPU-aware getNumberOfCPUCores * adding check on the efficiency * experimental TBB package (that cmake should pull from the internal server) * iterating over core types in the reversed order (so the big cores are populated first in case user specified less than all #threads) * adding back the NUMA affinity code-path for the full validation (incl 2 sockets Windows Server) * cpplint fix and tabbing the #if clauses for the readbility * pre-production TBB from internal server * wrapping over #cores/types * wrapping over #cores/types, ver 2 * wrapping over #streams instead * disabling warnings as errors for a while (to unlock testing) * accomodating new TBB layout for dependencies.bat * next tbb ver (with debug binaries that probably can unlock the commodity builds, without playing product_configs) * minor brushing for experiments (so that pinning can be disabled) * minor brushing from code review * Updating the SHA hash which appeared when rebasing to the master * WIP refactoring * Completed refactoring of the "config" phase of the cpu stream executor and on-the-fly streams to core types mapping * making the benchmark_app aware about new pinning mode * Brushing a bit (in preparation for the "soft" affinity) * map to vector to simplify the things * updated executors comparison * more fine-grained pinning scheme for the HYBRID (required to allow all cores on 2+8 1+4, and other LITTLE-skewed scenarios) TODO: seprate little to big ratio for the fp322 and int8 (and pass the fp32Only flag to the MakeDefaultMultiTHreaded) * separating fp32 and int8 intensive cases for hybrid execution, also leveraging the HT if the #big_cores is small, refactored. also switched to the 2021.2 oneTBB RC package * code style * stripped tbb archives from unused folders and files, also has to rename the LICENSE.txt to the LICENSE to match existing OV packaging tools * assigning nodeId regradless of pinning mode * tests OpenCV builds with same 2021.2 oneTBB, ubuntu 18/20 * cmake install paths for oneTBB, alos a ie_parallel.cmake warning on older ver of TBB * Updated latency case desc to cover multi-socket machines * adding centos8 OCV with oneTBB build updating TBB drops with hwloc shared libs added. * enabled internal OCV from THIRD_PARTY_SERVER to test thru CI.. Added Centos7 notbb OCV build (until g-api get ready for onetbb) to unlock the Centos7 CI build * separate rpath log to respect one-tbb specific paths * fixed SEQ code-path * fixed doc misprint * allowing all cores in 2+8 for int8 as well * cleaned from debug printfs * HYBRID_AWARE pinning option for the Python benchmark_app * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Revert "cleaned from debug printfs" This reverts commit 82c9bd90c543a89c5cfeee7bf5b3ce0998ee49df. # Conflicts: # inference-engine/src/inference_engine/os/win/win_system_conf.cpp # inference-engine/src/inference_engine/threading/ie_cpu_streams_executor.cpp # inference-engine/src/mkldnn_plugin/config.cpp * Update tbbbind package version * fixed compilation * removing the direct tbb::info calls from CPU plugin, to aggregate everything in the single module (that exposes the higher level APIs) * Update tbbbind package version (cherry picked from commit f66b8f6aa63e160eb1788cf3da18a6ac73cf4007) * compilation fix * brushing the headers a bit * Make custom::task_arena inherited from tbb::task_arena * change to the latest TBB API, and more debug printfs * code-style * ARM compilation * aligned "failed system config" between OV and TBB (by using '-1') * macos compilation fix * default arena creation (to make sure all code-path have that fallback) * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * with all NUMA/Hybrid checks now consolidated in the custom_arena, cleaning the ugly ifdefs thta we had * Introduce new ThreadBindingType + fix compilation * fixing OMP compilation * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * fixed macos build, corrected comments/desc * reverted to the default binding selection logic ( to preserve the legacy beh) * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * Apply review comments * Fix compilation without tbbbind_2_4 * Fix compilation with different TBB versions * code review remarks * fix for the NONE pinning code-path under HYBRID_AWAR * whitespace and cleaning the debug printfs (per review) * code-review comments * fixed code-style Co-authored-by: Kochin, Ivan <ivan.kochin@intel.com> Co-authored-by: Kochin Ivan <kochin.ivan@intel.com>
2021-04-28 17:42:58 +03:00
"\t\t\t\tthreads->(NUMA)nodes(\"NUMA\") or \n"
"\t\t\t\tcompletely disable(\"NO\") CPU inference threads pinning";
2019-04-12 18:25:53 +03:00
// @brief message for stream_output option
static const char stream_output_message[] =
"Optional. Print progress as a plain text. When specified, an interactive progress bar is "
"replaced with a "
"multiline output.";
2019-04-12 18:25:53 +03:00
// @brief message for report_type option
static const char report_type_message[] =
"Optional. Enable collecting statistics report. \"no_counters\" report contains "
"configuration options specified, resulting FPS and latency. \"average_counters\" "
"report extends \"no_counters\" report and additionally includes average PM "
"counters values for each layer from the network. \"detailed_counters\" report "
"extends \"average_counters\" report and additionally includes per-layer PM "
"counters and latency for each executed infer request.";
2019-04-12 18:25:53 +03:00
// @brief message for report_folder option
static const char report_folder_message[] = "Optional. Path to a folder where statistics report is stored.";
// @brief message for json_stats option
static const char json_stats_message[] = "Optional. Enables JSON-based statistics output (by default reporting system "
"will use CSV format). Should be used together with -report_folder option.";
2019-04-12 18:25:53 +03:00
// @brief message for exec_graph_path option
static const char exec_graph_path_message[] =
"Optional. Path to a file where to store executable graph information serialized.";
2019-04-12 18:25:53 +03:00
2019-08-09 19:02:42 +03:00
// @brief message for progress bar option
static const char progress_message[] =
"Optional. Show progress bar (can affect performance measurement). Default values is "
"\"false\".";
2019-08-09 19:02:42 +03:00
// @brief message for performance counters option
static const char pc_message[] = "Optional. Report performance counters.";
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
// @brief message for performance counters for sequence option
static const char pcseq_message[] = "Optional. Report latencies for each shape in -data_shape sequence.";
#ifdef HAVE_DEVICE_MEM_SUPPORT
// @brief message for switching memory allocation type option
static const char use_device_mem_message[] =
"Optional. Switch between host and device memory allocation for input and output buffers.";
#endif
// @brief message for load config option
static const char load_config_message[] =
"Optional. Path to JSON file to load custom IE parameters."
" Please note, command line parameters have higher priority then parameters from configuration "
"file.";
// @brief message for dump config option
static const char dump_config_message[] =
"Optional. Path to JSON file to dump IE parameters, which were set by application.";
static const char shape_message[] =
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
"Optional. Set shape for network input. For example, \"input1[1,3,224,224],input2[1,4]\" or \"[1,3,224,224]\""
" in case of one input size. This parameter affect model input shape and can be dynamic."
" For dynamic dimensions use symbol `?` or '-1'. Ex. [?,3,?,?]."
" For bounded dimensions specify range 'min..max'. Ex. [1..10,3,?,?].";
static const char data_shape_message[] =
"Required for networks with dynamic shapes. Set shape for input blobs."
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
" In case of one input size: \"[1,3,224,224]\" or \"input1[1,3,224,224],input2[1,4]\"."
" In case of several input sizes provide the same number for each input (except cases with single shape for any "
"input):"
" \"[1,3,128,128][3,3,128,128][1,3,320,320]\", \"input1[1,1,128,128][1,1,256,256],input2[80,1]\""
" or \"input1[1,192][1,384],input2[1,192][1,384],input3[1,192][1,384],input4[1,192][1,384]\"."
" If network shapes are all static specifying the option will cause an exception.";
static const char layout_message[] =
"Optional. Prompts how network layouts should be treated by application. "
"For example, \"input1[NCHW],input2[NC]\" or \"[NCHW]\" in case of one input size.";
[Caching] Add caching options to benchmark app (#4909) * Python API for LoadNetwork by model file name * BenchmarkApp: Add caching and LoadNetworkFromFile support 2 new options are introduced - cache_dir <dir> - enables models caching - load_from_file - use new perform "LoadNetwork" by model file name Using both parameters will achieve maximum performance of read/load network on startup Tests: 1) Run "benchmark_app -h". Help will display 2 new options. After available devices there will be list of devices with cache support 2) ./benchmark_app -d CPU -i <model.xml> -load_from_file Verify that some test steps are skipped (related to ReadNetwork, re-shaping etc) 3) Pre-requisite: support of caching shall be enabled for Template plugin ./benchmark_app -d TEMPLATE -i <model.onnx> -load_from_file -cache_dir someDir Verify that "someDir" is created and generated blob is available Run again, verify that loading works as well (should be faster as it will not load onnx model) 4) Run same test as (3), but without -load_from_file option. Verify that cache is properly created For some devices loadNetwork time shall be improved when cache is available * Removed additional timing prints * Correction from old code * Revert "Removed additional timing prints" Additional change - when .blob is chosen instead of .xml, it takes priority over caching flags * Removed new time printings As discussed, these time measurements like 'total first inference time' will be available in 'timeTests' scripts * Fix clang-format issues
2021-05-17 13:41:15 +03:00
// @brief message for enabling caching
static const char cache_dir_message[] = "Optional. Enables caching of loaded models to specified directory. "
"List of devices which support caching is shown at the end of this message.";
// @brief message for single load network
static const char load_from_file_message[] = "Optional. Loads model from file directly without ReadNetwork."
" All CNNNetwork options (like re-shape) will be ignored";
[Caching] Add caching options to benchmark app (#4909) * Python API for LoadNetwork by model file name * BenchmarkApp: Add caching and LoadNetworkFromFile support 2 new options are introduced - cache_dir <dir> - enables models caching - load_from_file - use new perform "LoadNetwork" by model file name Using both parameters will achieve maximum performance of read/load network on startup Tests: 1) Run "benchmark_app -h". Help will display 2 new options. After available devices there will be list of devices with cache support 2) ./benchmark_app -d CPU -i <model.xml> -load_from_file Verify that some test steps are skipped (related to ReadNetwork, re-shaping etc) 3) Pre-requisite: support of caching shall be enabled for Template plugin ./benchmark_app -d TEMPLATE -i <model.onnx> -load_from_file -cache_dir someDir Verify that "someDir" is created and generated blob is available Run again, verify that loading works as well (should be faster as it will not load onnx model) 4) Run same test as (3), but without -load_from_file option. Verify that cache is properly created For some devices loadNetwork time shall be improved when cache is available * Removed additional timing prints * Correction from old code * Revert "Removed additional timing prints" Additional change - when .blob is chosen instead of .xml, it takes priority over caching flags * Removed new time printings As discussed, these time measurements like 'total first inference time' will be available in 'timeTests' scripts * Fix clang-format issues
2021-05-17 13:41:15 +03:00
// @brief message for inference_precision
static const char inference_precision_message[] = "Optional. Inference precission";
static constexpr char inputs_precision_message[] = "Optional. Specifies precision for all input layers of the network.";
static constexpr char outputs_precision_message[] =
"Optional. Specifies precision for all output layers of the network.";
static constexpr char iop_message[] =
"Optional. Specifies precision for input and output layers by name.\n"
" Example: -iop \"input:FP16, output:FP16\".\n"
" Notice that quotes are required.\n"
" Overwrites precision from ip and op options for "
"specified layers.";
static constexpr char input_image_scale_message[] =
"Optional. Scale values to be used for the input image per channel.\n"
"Values to be provided in the [R, G, B] format. Can be defined for desired input of the model.\n"
"Example: -iscale data[255,255,255],info[255,255,255]\n";
static constexpr char input_image_mean_message[] =
"Optional. Mean values to be used for the input image per channel.\n"
"Values to be provided in the [R, G, B] format. Can be defined for desired input of the model,\n"
"Example: -imean data[255,255,255],info[255,255,255]\n";
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
static constexpr char inference_only_message[] =
"Optional. Measure only inference stage. Default option for static models. Dynamic models"
" are measured in full mode which includes inputs setup stage,"
" inference only mode available for them with single input data shape only."
" To enable full mode for static models pass \"false\" value to this argument:"
" ex. \"-inference_only=false\".\n";
2019-04-12 18:25:53 +03:00
/// @brief Define flag for showing help message <br>
DEFINE_bool(h, false, help_message);
2019-08-09 19:02:42 +03:00
/// @brief Declare flag for showing help message <br>
DECLARE_bool(help);
2019-04-12 18:25:53 +03:00
/// @brief Define parameter for set image file <br>
/// i or mif is a required parameter
2019-08-09 19:02:42 +03:00
DEFINE_string(i, "", input_message);
2019-04-12 18:25:53 +03:00
/// @brief Define parameter for set model file <br>
/// It is a required parameter
DEFINE_string(m, "", model_message);
OV Performance Hints (CPU and GPU logic for selecting the actual configs), while AUTO/MULTI are passing them thru) (#6993) * rebasing the perf-modes-2021.3 to the 2021.4 Caveats: the (explicit) setting #streams is not disabled (as it was before for experiments with DLBenchmark), and the logic slighlty differ (streamsSet) (cherry picked from commit 1ae1edc0ed70fdea40f528fdaf8d00a9904d2a5c) * overriding streams (to force the TPUT mode to the DLBenchnark) (cherry picked from commit 7f506cda31abf35ac293d0dce32f602a0188c619) * disabling reducing #streams to fully mimic baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments) (cherry picked from commit 85073dd1dd2c7d43a89c37c8f646313f6ddfc650) * clang/identation (cherry picked from commit 050a4155a923cee294c8689d685b39247b7a172a) * splitting the Transformation to general and CPU specific. Now hopefully,this fully mimics the baseline c4df94d42d90a2bc3cd91d3d6844ba42f29bca7f of the 2021.3 (before experiments), as the streams reduce num (as well as early exit on GRU/LSTM/TensorIterator) is deisabled (cherry picked from commit e98b2c1a67f2542a686543594b75b575ef515196) * disabling GRU/LSTM/TI + reducing of streams + 5D considered compute-limited only for int8 (cherry picked from commit 32b8d80dee18685ebf3d069bb4cd2172af7363b7) * refactored to avoid compute_limited_ratio, reverted the reducing #streams, removed LSTM from limitations (cherry picked from commit f2b972171b29cf599aae2407ceec2e6adb67e4e9) * isa-based threshold logic (cherry picked from commit b218457e1a93fcb3374eb9da948fdad2175ec33a) * mode->hint (cherry picked from commit ec20aa8ecaf3222f2a6fdfe9153cf6c9dfdd6a54) * optional PERFORMANCE_HINT_NUM_REQUESTS (cherry picked from commit 5a3883e3f36e7928c6391094ae10711c8e4c3b5c) * moving the perfHints to the common OV config class + initial tests (CPU only, as the actual AUTO/MULTI should be accommodated on the master) (cherry picked from commit (then fixed)45bafe7d527f466507dea0693aeed51be4ebf776) * AUTO support for PerfHints * MULTI support for PerfHints * Enabling Perf hints for the GPU plugin * brushing settings output a bit * disabling "throughput" perf hint being default (until OV 2.0) * uncommenting the logic which was disabled to force the DLBenchmark to use the throughput mode by default * removing dead and experimental code, and debug printfs * clang/code-style * code-review remarks * Moved the output of the actual params that the hint produced to the right place * aligning MULTI's GetConfig beh to HETERO's as captured in the preso (CVS-59960) ratified with the ArchForum * clang * benchmark_app brushing * Update inference-engine/samples/benchmark_app/README.md * propagating the perf hints thru one more scenario in the merged AUTO-MULTI * fixed mispint * Python benchmark_app update for perf hints * addresssing reviewers comments on the python benchmark_app * simplifying/brushing logic a bit * refactor the heuristic to the separate file (to be shared with iGPU soon) * refactor conversion of modes to the specific GPU config per feedback from Vladimir
2021-09-13 15:40:36 +03:00
/// @brief Define execution mode
DEFINE_string(hint, "", hint_message);
2019-04-12 18:25:53 +03:00
/// @brief Define execution mode
DEFINE_string(api, "async", api_message);
/// @brief device the target device to infer on <br>
DEFINE_string(d, "CPU", target_device_message);
/// @brief Absolute path to CPU library with user layers <br>
/// It is a required parameter
DEFINE_string(l, "", custom_cpu_library_message);
/// @brief Define parameter for clDNN custom kernels path <br>
/// Default is ./lib
DEFINE_string(c, "", custom_cldnn_message);
/// @brief Iterations count (default 0)
/// Sync mode: iterations count
/// Async mode: StartAsync counts
DEFINE_uint32(niter, 0, iterations_count_message);
2019-08-09 19:02:42 +03:00
/// @brief Time to execute topology in seconds
DEFINE_uint32(t, 0, execution_time_message);
2019-04-12 18:25:53 +03:00
/// @brief Number of infer requests in parallel
2019-08-09 19:02:42 +03:00
DEFINE_uint32(nireq, 0, infer_requests_count_message);
2019-04-12 18:25:53 +03:00
/// @brief Number of threads to use for inference on the CPU in throughput mode (also affects Hetero
/// cases)
2019-04-12 18:25:53 +03:00
DEFINE_uint32(nthreads, 0, infer_num_threads_message);
2019-08-09 19:02:42 +03:00
/// @brief Number of streams to use for inference on the CPU (also affects Hetero cases)
DEFINE_string(nstreams, "", infer_num_streams_message);
2019-08-09 19:02:42 +03:00
/// @brief The percentile which will be reported in latency metric
DEFINE_uint32(latency_percentile, 50, infer_latency_percentile_message);
2019-04-12 18:25:53 +03:00
/// @brief Define parameter for batch size <br>
/// Default is 0 (that means don't specify)
DEFINE_uint32(b, 0, batch_size_message);
// @brief Enable plugin messages
Openvino hybrid awareness (#5261) * change the deprecated method to the recent * first ver of the hybrid cores aware CPU streams (+debug info) * more debug and fixed sum threads * disabled NUMA pinning to experiment with affinity via OS * further brushing of stream to core type logic * hybrid CPU-aware getNumberOfCPUCores * adding check on the efficiency * experimental TBB package (that cmake should pull from the internal server) * iterating over core types in the reversed order (so the big cores are populated first in case user specified less than all #threads) * adding back the NUMA affinity code-path for the full validation (incl 2 sockets Windows Server) * cpplint fix and tabbing the #if clauses for the readbility * pre-production TBB from internal server * wrapping over #cores/types * wrapping over #cores/types, ver 2 * wrapping over #streams instead * disabling warnings as errors for a while (to unlock testing) * accomodating new TBB layout for dependencies.bat * next tbb ver (with debug binaries that probably can unlock the commodity builds, without playing product_configs) * minor brushing for experiments (so that pinning can be disabled) * minor brushing from code review * Updating the SHA hash which appeared when rebasing to the master * WIP refactoring * Completed refactoring of the "config" phase of the cpu stream executor and on-the-fly streams to core types mapping * making the benchmark_app aware about new pinning mode * Brushing a bit (in preparation for the "soft" affinity) * map to vector to simplify the things * updated executors comparison * more fine-grained pinning scheme for the HYBRID (required to allow all cores on 2+8 1+4, and other LITTLE-skewed scenarios) TODO: seprate little to big ratio for the fp322 and int8 (and pass the fp32Only flag to the MakeDefaultMultiTHreaded) * separating fp32 and int8 intensive cases for hybrid execution, also leveraging the HT if the #big_cores is small, refactored. also switched to the 2021.2 oneTBB RC package * code style * stripped tbb archives from unused folders and files, also has to rename the LICENSE.txt to the LICENSE to match existing OV packaging tools * assigning nodeId regradless of pinning mode * tests OpenCV builds with same 2021.2 oneTBB, ubuntu 18/20 * cmake install paths for oneTBB, alos a ie_parallel.cmake warning on older ver of TBB * Updated latency case desc to cover multi-socket machines * adding centos8 OCV with oneTBB build updating TBB drops with hwloc shared libs added. * enabled internal OCV from THIRD_PARTY_SERVER to test thru CI.. Added Centos7 notbb OCV build (until g-api get ready for onetbb) to unlock the Centos7 CI build * separate rpath log to respect one-tbb specific paths * fixed SEQ code-path * fixed doc misprint * allowing all cores in 2+8 for int8 as well * cleaned from debug printfs * HYBRID_AWARE pinning option for the Python benchmark_app * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Revert "cleaned from debug printfs" This reverts commit 82c9bd90c543a89c5cfeee7bf5b3ce0998ee49df. # Conflicts: # inference-engine/src/inference_engine/os/win/win_system_conf.cpp # inference-engine/src/inference_engine/threading/ie_cpu_streams_executor.cpp # inference-engine/src/mkldnn_plugin/config.cpp * Update tbbbind package version * fixed compilation * removing the direct tbb::info calls from CPU plugin, to aggregate everything in the single module (that exposes the higher level APIs) * Update tbbbind package version (cherry picked from commit f66b8f6aa63e160eb1788cf3da18a6ac73cf4007) * compilation fix * brushing the headers a bit * Make custom::task_arena inherited from tbb::task_arena * change to the latest TBB API, and more debug printfs * code-style * ARM compilation * aligned "failed system config" between OV and TBB (by using '-1') * macos compilation fix * default arena creation (to make sure all code-path have that fallback) * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * with all NUMA/Hybrid checks now consolidated in the custom_arena, cleaning the ugly ifdefs thta we had * Introduce new ThreadBindingType + fix compilation * fixing OMP compilation * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * fixed macos build, corrected comments/desc * reverted to the default binding selection logic ( to preserve the legacy beh) * OpenVINO Hybrid CPUs support * Remove custom::task_arena abstraction layout * Get back to the custom::task_arena interface * Add windows.h inclusion * Fix typo in macro name * Separate TBB and TBBbind packages * Fix compile-time conditions * Fix preprocessors conditions * Fix typo * Fix linking * make linking private * Fix typo * Fix target_compile_definitions syntax * Implement CMake install logic, update sha hash for the tbbbind_2_4 package * Add tbbbind_2_4 required paths to setup_vars * Update CI paths * Include ie_parallel.hpp to ie_system_conf.cpp * Try to update dependencies scripts * Try to fix dependencies.bat * Modify dependencies script * Use static tbbbind_2_4 library * Remove redundant paths from CI * Update tbbbind package version * Make custom::task_arena inherited from tbb::task_arena * Incapsulate all TBB versions related logic inside the custom namespace * Move custom layer header to internal scope + minor improvements * Introduce new ThreadBindingType + fix compilation * Fix compilation * Use public tbbbind_2_4 package * Apply review comments * Fix compilation without tbbbind_2_4 * Fix compilation with different TBB versions * code review remarks * fix for the NONE pinning code-path under HYBRID_AWAR * whitespace and cleaning the debug printfs (per review) * code-review comments * fixed code-style Co-authored-by: Kochin, Ivan <ivan.kochin@intel.com> Co-authored-by: Kochin Ivan <kochin.ivan@intel.com>
2021-04-28 17:42:58 +03:00
DEFINE_string(pin, "", infer_threads_pinning_message);
2019-04-12 18:25:53 +03:00
/// @brief Enables multiline text output instead of progress bar
DEFINE_bool(stream_output, false, stream_output_message);
/// @brief Enables statistics report collecting
DEFINE_string(report_type, "", report_type_message);
/// @brief Path to a folder where statistics report is stored
DEFINE_string(report_folder, "", report_folder_message);
/// @brief Enables JSON-based statistics reporting
DEFINE_bool(json_stats, false, json_stats_message);
2019-04-12 18:25:53 +03:00
/// @brief Path to a file where to store executable graph information serialized
DEFINE_string(exec_graph_path, "", exec_graph_path_message);
2019-08-09 19:02:42 +03:00
/// @brief Define flag for showing progress bar <br>
DEFINE_bool(progress, false, progress_message);
/// @brief Define flag for showing performance counters <br>
DEFINE_bool(pc, false, pc_message);
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
/// @brief Define flag for showing performance sequence counters <br>
DEFINE_bool(pcseq, false, pcseq_message);
#ifdef HAVE_DEVICE_MEM_SUPPORT
/// @brief Define flag for switching beetwen host and device memory allocation for input and output buffers
DEFINE_bool(use_device_mem, false, use_device_mem_message);
#endif
/// @brief Define flag for loading configuration file <br>
DEFINE_string(load_config, "", load_config_message);
/// @brief Define flag for dumping configuration file <br>
DEFINE_string(dump_config, "", dump_config_message);
/// @brief Define flag for input shape <br>
DEFINE_string(shape, "", shape_message);
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
/// @brief Define flag for input blob shape <br>
DEFINE_string(data_shape, "", data_shape_message);
/// @brief Define flag for layout shape <br>
DEFINE_string(layout, "", layout_message);
/// @brief Define flag for inference precision
DEFINE_string(infer_precision, "f32", inference_precision_message);
/// @brief Specify precision for all input layers of the network
DEFINE_string(ip, "", inputs_precision_message);
/// @brief Specify precision for all ouput layers of the network
DEFINE_string(op, "", outputs_precision_message);
/// @brief Specify precision for input and output layers by name.\n"
/// Example: -iop \"input:FP16, output:FP16\".\n"
/// Notice that quotes are required.\n"
/// Overwrites layout from ip and op options for specified layers.";
DEFINE_string(iop, "", iop_message);
[Caching] Add caching options to benchmark app (#4909) * Python API for LoadNetwork by model file name * BenchmarkApp: Add caching and LoadNetworkFromFile support 2 new options are introduced - cache_dir <dir> - enables models caching - load_from_file - use new perform "LoadNetwork" by model file name Using both parameters will achieve maximum performance of read/load network on startup Tests: 1) Run "benchmark_app -h". Help will display 2 new options. After available devices there will be list of devices with cache support 2) ./benchmark_app -d CPU -i <model.xml> -load_from_file Verify that some test steps are skipped (related to ReadNetwork, re-shaping etc) 3) Pre-requisite: support of caching shall be enabled for Template plugin ./benchmark_app -d TEMPLATE -i <model.onnx> -load_from_file -cache_dir someDir Verify that "someDir" is created and generated blob is available Run again, verify that loading works as well (should be faster as it will not load onnx model) 4) Run same test as (3), but without -load_from_file option. Verify that cache is properly created For some devices loadNetwork time shall be improved when cache is available * Removed additional timing prints * Correction from old code * Revert "Removed additional timing prints" Additional change - when .blob is chosen instead of .xml, it takes priority over caching flags * Removed new time printings As discussed, these time measurements like 'total first inference time' will be available in 'timeTests' scripts * Fix clang-format issues
2021-05-17 13:41:15 +03:00
/// @brief Define parameter for cache model dir <br>
DEFINE_string(cache_dir, "", cache_dir_message);
/// @brief Define flag for load network from model file by name without ReadNetwork <br>
DEFINE_bool(load_from_file, false, load_from_file_message);
/// @brief Define flag for using input image scale <br>
DEFINE_string(iscale, "", input_image_scale_message);
/// @brief Define flag for using input image mean <br>
DEFINE_string(imean, "", input_image_mean_message);
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
/// @brief Define flag for inference only mode <br>
DEFINE_bool(inference_only, true, inference_only_message);
2019-04-12 18:25:53 +03:00
/**
* @brief This function show a help message
*/
static void show_usage() {
2019-04-12 18:25:53 +03:00
std::cout << std::endl;
std::cout << "benchmark_app [OPTION]" << std::endl;
std::cout << "Options:" << std::endl;
std::cout << std::endl;
2019-08-09 19:02:42 +03:00
std::cout << " -h, --help " << help_message << std::endl;
2019-04-12 18:25:53 +03:00
std::cout << " -m \"<path>\" " << model_message << std::endl;
2020-11-16 01:26:04 -08:00
std::cout << " -i \"<path>\" " << input_message << std::endl;
2019-04-12 18:25:53 +03:00
std::cout << " -d \"<device>\" " << target_device_message << std::endl;
std::cout << " -l \"<absolute_path>\" " << custom_cpu_library_message << std::endl;
std::cout << " Or" << std::endl;
std::cout << " -c \"<absolute_path>\" " << custom_cldnn_message << std::endl;
std::cout << " -hint \"performance hint (latency or throughput or none)\" " << hint_message << std::endl;
2019-04-12 18:25:53 +03:00
std::cout << " -api \"<sync/async>\" " << api_message << std::endl;
std::cout << " -niter \"<integer>\" " << iterations_count_message << std::endl;
std::cout << " -nireq \"<integer>\" " << infer_requests_count_message << std::endl;
std::cout << " -b \"<integer>\" " << batch_size_message << std::endl;
std::cout << " -stream_output " << stream_output_message << std::endl;
2019-08-09 19:02:42 +03:00
std::cout << " -t " << execution_time_message << std::endl;
std::cout << " -progress " << progress_message << std::endl;
std::cout << " -shape " << shape_message << std::endl;
std::cout << " -data_shape " << data_shape_message << std::endl;
std::cout << " -layout " << layout_message << std::endl;
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
std::cout << " -cache_dir \"<path>\" " << cache_dir_message << std::endl;
[Caching] Add caching options to benchmark app (#4909) * Python API for LoadNetwork by model file name * BenchmarkApp: Add caching and LoadNetworkFromFile support 2 new options are introduced - cache_dir <dir> - enables models caching - load_from_file - use new perform "LoadNetwork" by model file name Using both parameters will achieve maximum performance of read/load network on startup Tests: 1) Run "benchmark_app -h". Help will display 2 new options. After available devices there will be list of devices with cache support 2) ./benchmark_app -d CPU -i <model.xml> -load_from_file Verify that some test steps are skipped (related to ReadNetwork, re-shaping etc) 3) Pre-requisite: support of caching shall be enabled for Template plugin ./benchmark_app -d TEMPLATE -i <model.onnx> -load_from_file -cache_dir someDir Verify that "someDir" is created and generated blob is available Run again, verify that loading works as well (should be faster as it will not load onnx model) 4) Run same test as (3), but without -load_from_file option. Verify that cache is properly created For some devices loadNetwork time shall be improved when cache is available * Removed additional timing prints * Correction from old code * Revert "Removed additional timing prints" Additional change - when .blob is chosen instead of .xml, it takes priority over caching flags * Removed new time printings As discussed, these time measurements like 'total first inference time' will be available in 'timeTests' scripts * Fix clang-format issues
2021-05-17 13:41:15 +03:00
std::cout << " -load_from_file " << load_from_file_message << std::endl;
std::cout << " -latency_percentile " << infer_latency_percentile_message << std::endl;
2019-08-09 19:02:42 +03:00
std::cout << std::endl << " device-specific performance options:" << std::endl;
std::cout << " -nstreams \"<integer>\" " << infer_num_streams_message << std::endl;
2019-04-12 18:25:53 +03:00
std::cout << " -nthreads \"<integer>\" " << infer_num_threads_message << std::endl;
2022-02-14 17:57:27 +03:00
std::cout << " -pin (\"YES\"|\"CORE\")/\"HYBRID_AWARE\"/(\"NO\"|\"NONE\")/\"NUMA\" "
<< infer_threads_pinning_message << std::endl;
#ifdef HAVE_DEVICE_MEM_SUPPORT
std::cout << " -use_device_mem " << use_device_mem_message << std::endl;
#endif
2019-04-12 18:25:53 +03:00
std::cout << std::endl << " Statistics dumping options:" << std::endl;
std::cout << " -report_type \"<type>\" " << report_type_message << std::endl;
std::cout << " -report_folder " << report_folder_message << std::endl;
std::cout << " -json_stats " << json_stats_message;
2019-04-12 18:25:53 +03:00
std::cout << " -exec_graph_path " << exec_graph_path_message << std::endl;
2019-08-09 19:02:42 +03:00
std::cout << " -pc " << pc_message << std::endl;
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
std::cout << " -pcseq " << pcseq_message << std::endl;
std::cout << " -dump_config " << dump_config_message << std::endl;
std::cout << " -load_config " << load_config_message << std::endl;
std::cout << " -infer_precision \"<element type>\"" << inference_precision_message << std::endl;
std::cout << " -ip <value> " << inputs_precision_message << std::endl;
std::cout << " -op <value> " << outputs_precision_message << std::endl;
std::cout << " -iop \"<value>\" " << iop_message << std::endl;
std::cout << " -iscale " << input_image_scale_message << std::endl;
std::cout << " -imean " << input_image_mean_message << std::endl;
Dynamic reshapes (#7788) * Merged and compiling * Fix for dynamic shape type * review fixes * renamed blob shape to tensor shape, small improvements * fix code style * added parsing of multiple shapes * store latency per group, add isIdleRequestAvailable() to Infer Queue * added cached random inputs * redesign pipeline, added new metrics(avg, max, min), added metrics per groups * fixed code style * small improvements * modified tensor parameters parsing * modified -i parameter parsing: added possibility to specify input names * implemented image cashing * added cashed blobs creating * added -pcseq flag, modified batch filling, changes fps formula * improvements * code formatting * code formatting2 * apply suggestions from review * replaced Buffer class with InferenceEngine Blobs * use batch size in blobs filling * added shared blob allocator to handle blob's data * fixed warnings & code style * allocate blobs * fix for networks with image info input * added comments & fixed codestyle * clear data in free() in SharedBlobAllocator * remove unnecessary check * Delimeter is changed to :: * stylefix * added layout from string function, small improvements * modified parsing to enable : in input parameters * small fixes * small fixes * added missed blob allocation, fixes * [TEST]added support for remote blobs * fix remote blobs * new inputs/files output format * removed vectors resize which caused bugs * made cl::Buffer type under ifdef, fix inputs filling * changed batch() function to not throwing exceptions * removed unused var * fix code style * replace empty name in input files with name from net input * restored old behaviour for static models * fix code style * fix warning - made const iterator * fix warning - remove reference in loop variable * added random and image_info input types to -i, fix problem with layout * replaced batch() with getBatchSize() in main * fix layout, shape, tensor shape parameters parsing * upd help messages for input, tensor shape and pcseq command * added buffer for cl output blobs, small fixes Signed-off-by: ivikhrev <ivan.vikhrev@intel.com> * added legacy mode * restore setBlob * code style formatting * move collecting latency for groups under flag * removed not applicable layouts * added hint to error message when wrong input name in -tensor_shape was specified * added new metrics to statistics report * Apply suggestions from code review * fix binary blobs filling when layout is CN * apply suggestions * moved file in the right place after rebase * improved -pcseq output * updated args and readme * removed TEMPLATE plugin registration * fix -shape arg decsription * enable providing several -i args as input * renamed legacy_mode to inference_only and made it default for static models, renamed tensor_shape to data_shape * upd readme * use getBlob() in inference only mode * fix old input type for static case * fix typo * upd readme * move log about benchmark mode to the measuring perfomance step * added class for latency metrics * upd readme, fix typos, renamed funcs * fix warning and upd parsing to avoid error with : in file paths * fix error on centos : error: use of deleted function ‘std::basic_stringstream<char>::basic_stringstream(const std::basic_stringstream<char>&) * added check for key in inputs * renamed input to inputs * adjust batch size for binary blobs * replaced warning with exception in bench mode defining * align measurement cycle with master Co-authored-by: ivikhrev <ivan.vikhrev@intel.com>
2021-12-17 12:20:43 +03:00
std::cout << " -inference_only " << inference_only_message << std::endl;
2019-04-12 18:25:53 +03:00
}