[BENCHMARK_APP/PYTHON/CPP] Align benchmark_app output across languages (#12814)
* [PYTHON] Pipeline transfer * [PYTHON] Align python benchmark * [PYTHON] Align last step * [PYTHON] Fix innacuracies of the last step - median * [PYTHON/CPP] Add Core::get_version method to python API, refactor Ben benchmark to print version with this func * [PYTHON] Remove get_version_string from API * [PYTHON/CPP] Align output for model input/output info * [PYTHON/CPP] Step 4,6 alignment of outputs, step 8 dumps all info stored in config parameters * [CPP] Fix a bug causing nstreams parameter to never be set to AUTO in CPP benchmark_app * [CPP] Fix clang format errors * [CPP] Modify print order and data output for 8th step * [PYTHON] Add verification checks from C++, modify set_thoughtput_streams to match documentation * [CPP] Revert changes to C++ benchmark_app * [CPP] Remove additional spacebar * Update submodules versions on remote * Update module from master branch * Redownload submodules from master and override changes from commit * [PYTHON] Remove unneccesary parse_status from validation function * [PYTHON] Check for HINT in map, fix circular import * [PYTHON] Remove artifacts from commit, fix args.perf_hint set to '' instead to 'none' * [PYTHON] Reverse changes to perf hint, add key in map check, fix validation function throwing error on set hint * [PYTHON] Fix linter * [PYTHON] Remove linter spacebar * [CPP] Fix wait_all exception throw * [CPP/PYTHON] Clean artifacts and unwanted changes from work process * [PYTHON] Fix artifacts from merge, clean submodule update * [C++ CPU] Fix device name string by removing padding NULL characters from the back * [CPP] Fix ba infer_request_wrap in other throw-catch clauses * [PYTHON/CPP] Fix missing latencies in final step for shape group, fix minor misaligned messages, add missing report parameter create infer requests time * [CPP] Clang fix formatting * [CPP] Reverse clang fix format on plugin.cpp * [PYTHON/CPP] Fix C++ progressbar printing endl when disabled, fix rounding in python creating infer request message * [CPP] Fix foramtiing error * [PYTHON/C++] Refactor network to model based on naming conventions, provide fresh README output example * [PYTHON/C++] Add example output to C++ README, remove unnecessary device loop * [BENCHMARK_APP/C++] Fix artifact from refactoring, remove try-catch clause * Update samples/cpp/benchmark_app/benchmark_app.hpp Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> * Update samples/cpp/benchmark_app/main.cpp Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> * Update tools/benchmark_tool/openvino/tools/benchmark/main.py Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> * Update samples/cpp/benchmark_app/main.cpp Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> * [CPP] Fix clang errors * [CPP/PLUGIN Reverse modification to extract to separate task * Update tools/benchmark_tool/openvino/tools/benchmark/main.py Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> * Update tools/benchmark_tool/openvino/tools/benchmark/parameters.py Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com> * Update tools/benchmark_tool/openvino/tools/benchmark/utils/utils.py Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com> * Update tools/benchmark_tool/openvino/tools/benchmark/main.py Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com> * [PYTHON/C++/BENCHMARK_APP] Fix language inconsistencies, remove unnecessary checks * Update pyopenvino.cpp * [CPP/BENCHMARK_APP] Remove unnecessary try-catch, fix linter errors * [PYTHON/CPP/BENCHMARK_APP] Revert changes to Core, align version prints usin only provided methods * [DOCS/BENCHMARK_APP] Update README with proper model examples * Update README.md Co-authored-by: Nadezhda Ageeva <nkogteva@gmail.com> Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com> Co-authored-by: Zlobin Vladimir <vladimir.zlobin@intel.com>
This commit is contained in:
@@ -3,7 +3,7 @@
|
||||
This page demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices.
|
||||
|
||||
> **NOTE**: This page describes usage of the C++ implementation of the Benchmark Tool. For the Python implementation, refer to the [Benchmark Python Tool](../../../tools/benchmark_tool/README.md) page. The Python version is recommended for benchmarking models that will be used in Python applications, and the C++ version is recommended for benchmarking models that will be used in C++ applications. Both tools have a similar command interface and backend.
|
||||
|
||||
|
||||
|
||||
## Basic Usage
|
||||
To use the C++ benchmark_app, you must first build it following the [Build the Sample Applications](../../../docs/OV_Runtime_UG/Samples_Overview.md) instructions and then set up paths and environment variables by following the [Get Ready for Running the Sample Applications](../../../docs/OV_Runtime_UG/Samples_Overview.md) instructions. Navigate to the directory where the benchmark_app C++ sample binary was built.
|
||||
@@ -98,7 +98,7 @@ The application also collects per-layer Performance Measurement (PM) counters fo
|
||||
|
||||
Depending on the type, the report is stored to benchmark_no_counters_report.csv, benchmark_average_counters_report.csv, or benchmark_detailed_counters_report.csv file located in the path specified in -report_folder. The application also saves executable graph information serialized to an XML file if you specify a path to it with the -exec_graph_path parameter.
|
||||
|
||||
### All configuration options
|
||||
### <a name="all-configuration-options"></a> All configuration options
|
||||
|
||||
Running the application with the `-h` or `--help` option yields the following usage message:
|
||||
|
||||
@@ -188,7 +188,7 @@ Running the application with the empty list of options yields the usage message
|
||||
The benchmark tool supports topologies with one or more inputs. If a topology is not data sensitive, you can skip the input parameter, and the inputs will be filled with random values. If a model has only image input(s), provide a folder with images or a path to an image as input. If a model has some specific input(s) (besides images), please prepare a binary file(s) that is filled with data of appropriate precision and provide a path to it as input. If a model has mixed input types, the input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one.
|
||||
|
||||
## Examples of Running the Tool
|
||||
This section provides step-by-step instructions on how to run the Benchmark Tool with the `asl-recognition` Intel model on CPU or GPU devices. It uses random data as the input.
|
||||
This section provides step-by-step instructions on how to run the Benchmark Tool with the `asl-recognition` model from the Open Model Zoo on CPU or GPU devices. It uses random data as the input.
|
||||
|
||||
> **NOTE**: Internet access is required to execute the following steps successfully. If you have access to the Internet through a proxy server only, please make sure that it is configured in your OS environment.
|
||||
|
||||
@@ -206,66 +206,126 @@ This section provides step-by-step instructions on how to run the Benchmark Tool
|
||||
|
||||
* On CPU (latency mode):
|
||||
```sh
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency -progress
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency
|
||||
```
|
||||
|
||||
* On GPU (throughput mode):
|
||||
```sh
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d GPU -hint throughput -progress
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d GPU -hint throughput
|
||||
```
|
||||
|
||||
The application outputs the number of executed iterations, total duration of execution, latency, and throughput.
|
||||
Additionally, if you set the `-report_type` parameter, the application outputs statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds.
|
||||
Additionally, if you set the `-report_type` parameter, the application outputs a statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds.
|
||||
|
||||
Below are fragments of sample output static and dynamic networks:
|
||||
An example of the information output when running benchmark_app on CPU in latency mode is shown below:
|
||||
|
||||
* For static network:
|
||||
```sh
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency
|
||||
```
|
||||
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
|
||||
[ INFO ] BENCHMARK IS IN INFERENCE ONLY MODE.
|
||||
[ INFO ] Input blobs will be filled once before performance measurements.
|
||||
[ INFO ] First inference took 26.26 ms
|
||||
Progress: [................... ] 99% done
|
||||
|
||||
```sh
|
||||
[Step 1/11] Parsing and validating input arguments
|
||||
[ INFO ] Parsing input parameters
|
||||
[ INFO ] Input command: /home/openvino/bin/intel64/DEBUG/benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -hint latency
|
||||
[Step 2/11] Loading OpenVINO Runtime
|
||||
[ INFO ] OpenVINO:
|
||||
[ INFO ] Build ................................. 2022.3.0-7750-c1109a7317e-feature/py_cpp_align
|
||||
[ INFO ]
|
||||
[ INFO ] Device info:
|
||||
[ INFO ] CPU
|
||||
[ INFO ] Build ................................. 2022.3.0-7750-c1109a7317e-feature/py_cpp_align
|
||||
[ INFO ]
|
||||
[ INFO ]
|
||||
[Step 3/11] Setting device configuration
|
||||
[ WARNING ] Device(CPU) performance hint is set to LATENCY
|
||||
[Step 4/11] Reading model files
|
||||
[ INFO ] Loading model files
|
||||
[ INFO ] Read model took 141.11 ms
|
||||
[ INFO ] Original model I/O parameters:
|
||||
[ INFO ] Network inputs:
|
||||
[ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224}
|
||||
[ INFO ] Network outputs:
|
||||
[ INFO ] output (node: output) : f32 / [...] / {1,100}
|
||||
[Step 5/11] Resizing model to match image sizes and given batch
|
||||
[ INFO ] Model batch size: 0
|
||||
[Step 6/11] Configuring input of the model
|
||||
[ INFO ] Model batch size: 1
|
||||
[ INFO ] Network inputs:
|
||||
[ INFO ] input (node: input) : f32 / [N,C,D,H,W] / {1,3,16,224,224}
|
||||
[ INFO ] Network outputs:
|
||||
[ INFO ] output (node: output) : f32 / [...] / {1,100}
|
||||
[Step 7/11] Loading the model to the device
|
||||
[ INFO ] Compile model took 989.62 ms
|
||||
[Step 8/11] Querying optimal runtime parameters
|
||||
[ INFO ] Model:
|
||||
[ INFO ] NETWORK_NAME: torch-jit-export
|
||||
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 2
|
||||
[ INFO ] NUM_STREAMS: 2
|
||||
[ INFO ] AFFINITY: CORE
|
||||
[ INFO ] INFERENCE_NUM_THREADS: 0
|
||||
[ INFO ] PERF_COUNT: NO
|
||||
[ INFO ] INFERENCE_PRECISION_HINT: f32
|
||||
[ INFO ] PERFORMANCE_HINT: LATENCY
|
||||
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 0
|
||||
[Step 9/11] Creating infer requests and preparing input tensors
|
||||
[ WARNING ] No input files were given: all inputs will be filled with random values!
|
||||
[ INFO ] Test Config 0
|
||||
[ INFO ] input ([N,C,D,H,W], f32, {1, 3, 16, 224, 224}, static): random (binary data is expected)
|
||||
[Step 10/11] Measuring performance (Start inference asynchronously, 2 inference requests, limits: 60000 ms duration)
|
||||
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
|
||||
[ INFO ] First inference took 37.27 ms
|
||||
[Step 11/11] Dumping statistics report
|
||||
[ INFO ] Count: 6640 iterations
|
||||
[ INFO ] Duration: 60039.70 ms
|
||||
[ INFO ] Count: 5470 iterations
|
||||
[ INFO ] Duration: 60028.56 ms
|
||||
[ INFO ] Latency:
|
||||
[ INFO ] Median: 35.36 ms
|
||||
[ INFO ] Avg: 36.12 ms
|
||||
[ INFO ] Min: 18.55 ms
|
||||
[ INFO ] Max: 88.96 ms
|
||||
[ INFO ] Throughput: 110.59 FPS
|
||||
[ INFO ] Median: 21.79 ms
|
||||
[ INFO ] Average: 21.92 ms
|
||||
[ INFO ] Min: 20.60 ms
|
||||
[ INFO ] Max: 37.19 ms
|
||||
[ INFO ] Throughput: 91.12 FPS
|
||||
```
|
||||
The Benchmark Tool can also be used with dynamically shaped networks to measure expected inference time for various input data shapes. See the `-shape` and `-data_shape` argument descriptions in the <a href="#all-configuration-options">All configuration options</a> section to learn more about using dynamic shapes. Here is a command example for using benchmark_app with dynamic networks and a portion of the resulting output:
|
||||
|
||||
```sh
|
||||
./benchmark_app -m omz_models/intel/asl-recognition-0004/FP16/asl-recognition-0004.xml -d CPU -shape [-1,3,16,224,224] -data_shape [1,3,16,224,224][2,3,16,224,224][4,3,16,224,224] -pcseq
|
||||
```
|
||||
|
||||
* For dynamic network:
|
||||
```
|
||||
[Step 10/11] Measuring performance (Start inference asynchronously, 4 inference requests using 4 streams for CPU, limits: 60000 ms duration)
|
||||
[ INFO ] BENCHMARK IS IN FULL MODE.
|
||||
[ INFO ] Inputs setup stage will be included in performance measurements.
|
||||
[ INFO ] First inference took 26.80 ms
|
||||
Progress: [................... ] 99% done
|
||||
|
||||
```sh
|
||||
[Step 9/11] Creating infer requests and preparing input tensors
|
||||
[ INFO ] Test Config 0
|
||||
[ INFO ] input ([N,C,D,H,W], f32, {1, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected)
|
||||
[ INFO ] Test Config 1
|
||||
[ INFO ] input ([N,C,D,H,W], f32, {2, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected)
|
||||
[ INFO ] Test Config 2
|
||||
[ INFO ] input ([N,C,D,H,W], f32, {4, 3, 16, 224, 224}, dyn:{?,3,16,224,224}): random (binary data is expected)
|
||||
[Step 10/11] Measuring performance (Start inference asynchronously, 11 inference requests, limits: 60000 ms duration)
|
||||
[ INFO ] Benchmarking in full mode (inputs filling are included in measurement loop).
|
||||
[ INFO ] First inference took 204.40 ms
|
||||
[Step 11/11] Dumping statistics report
|
||||
[ INFO ] Count: 5199 iterations
|
||||
[ INFO ] Duration: 60043.34 ms
|
||||
[ INFO ] Count: 2783 iterations
|
||||
[ INFO ] Duration: 60326.29 ms
|
||||
[ INFO ] Latency:
|
||||
[ INFO ] Median: 41.58 ms
|
||||
[ INFO ] Avg: 46.07 ms
|
||||
[ INFO ] Min: 8.44 ms
|
||||
[ INFO ] Max: 115.65 ms
|
||||
[ INFO ] Median: 208.20 ms
|
||||
[ INFO ] Average: 237.47 ms
|
||||
[ INFO ] Min: 85.06 ms
|
||||
[ INFO ] Max: 743.46 ms
|
||||
[ INFO ] Latency for each data shape group:
|
||||
[ INFO ] 1. data : [1, 3, 224, 224]
|
||||
[ INFO ] Median: 38.37 ms
|
||||
[ INFO ] Avg: 30.29 ms
|
||||
[ INFO ] Min: 8.44 ms
|
||||
[ INFO ] Max: 61.30 ms
|
||||
[ INFO ] 2. data : [1, 3, 448, 448]
|
||||
[ INFO ] Median: 68.21 ms
|
||||
[ INFO ] Avg: 61.85 ms
|
||||
[ INFO ] Min: 29.58 ms
|
||||
[ INFO ] Max: 115.65 ms
|
||||
[ INFO ] Throughput: 86.59 FPS
|
||||
[ INFO ] 1. input: {1, 3, 16, 224, 224}
|
||||
[ INFO ] Median: 120.36 ms
|
||||
[ INFO ] Average: 117.19 ms
|
||||
[ INFO ] Min: 85.06 ms
|
||||
[ INFO ] Max: 348.66 ms
|
||||
[ INFO ] 2. input: {2, 3, 16, 224, 224}
|
||||
[ INFO ] Median: 207.81 ms
|
||||
[ INFO ] Average: 206.39 ms
|
||||
[ INFO ] Min: 167.19 ms
|
||||
[ INFO ] Max: 578.33 ms
|
||||
[ INFO ] 3. input: {4, 3, 16, 224, 224}
|
||||
[ INFO ] Median: 387.40 ms
|
||||
[ INFO ] Average: 388.99 ms
|
||||
[ INFO ] Min: 327.50 ms
|
||||
[ INFO ] Max: 743.46 ms
|
||||
[ INFO ] Throughput: 107.61 FPS
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
@@ -29,7 +29,7 @@ static const char help_message[] = "Print a usage message";
|
||||
/// @brief message for images argument
|
||||
static const char input_message[] =
|
||||
"Optional. Path to a folder with images and/or binaries or to specific image or binary file.\n"
|
||||
" In case of dynamic shapes networks with several inputs provide the same number"
|
||||
" In case of dynamic shapes models with several inputs provide the same number"
|
||||
" of files for each input (except cases with single file for any input):"
|
||||
"\"input1:1.jpg input2:1.bin\", \"input1:1.bin,2.bin input2:3.bin input3:4.bin,5.bin \"."
|
||||
" Also you can pass specific keys for inputs: \"random\" - for fillling input with random data,"
|
||||
@@ -45,7 +45,7 @@ static const char model_message[] =
|
||||
|
||||
/// @brief message for performance hint
|
||||
static const char hint_message[] =
|
||||
"Optional. Performance hint allows the OpenVINO device to select the right network-specific settings.\n"
|
||||
"Optional. Performance hint allows the OpenVINO device to select the right model-specific settings.\n"
|
||||
" 'throughput' or 'tput': device performance mode will be set to THROUGHPUT.\n"
|
||||
" 'cumulative_throughput' or 'ctput': device performance mode will be set to "
|
||||
"CUMULATIVE_THROUGHPUT.\n"
|
||||
@@ -90,7 +90,7 @@ static const char infer_num_streams_message[] =
|
||||
"automatic selection "
|
||||
"usually provides a reasonable performance, it still may be non - optimal for some cases, "
|
||||
"especially for "
|
||||
"very small networks. See sample's README for more details. "
|
||||
"very small models. See sample's README for more details. "
|
||||
"Also, using nstreams>1 is inherently throughput-oriented option, "
|
||||
"while for the best-latency estimations the number of streams should be set to 1.";
|
||||
|
||||
@@ -138,7 +138,7 @@ static const char report_type_message[] =
|
||||
"Optional. Enable collecting statistics report. \"no_counters\" report contains "
|
||||
"configuration options specified, resulting FPS and latency. \"average_counters\" "
|
||||
"report extends \"no_counters\" report and additionally includes average PM "
|
||||
"counters values for each layer from the network. \"detailed_counters\" report "
|
||||
"counters values for each layer from the model. \"detailed_counters\" report "
|
||||
"extends \"average_counters\" report and additionally includes per-layer PM "
|
||||
"counters and latency for each executed infer request.";
|
||||
|
||||
@@ -188,22 +188,22 @@ static const char dump_config_message[] =
|
||||
"Optional. Path to JSON file to dump IE parameters, which were set by application.";
|
||||
|
||||
static const char shape_message[] =
|
||||
"Optional. Set shape for network input. For example, \"input1[1,3,224,224],input2[1,4]\" or \"[1,3,224,224]\""
|
||||
"Optional. Set shape for model input. For example, \"input1[1,3,224,224],input2[1,4]\" or \"[1,3,224,224]\""
|
||||
" in case of one input size. This parameter affect model input shape and can be dynamic."
|
||||
" For dynamic dimensions use symbol `?` or '-1'. Ex. [?,3,?,?]."
|
||||
" For bounded dimensions specify range 'min..max'. Ex. [1..10,3,?,?].";
|
||||
|
||||
static const char data_shape_message[] =
|
||||
"Required for networks with dynamic shapes. Set shape for input blobs."
|
||||
"Required for models with dynamic shapes. Set shape for input blobs."
|
||||
" In case of one input size: \"[1,3,224,224]\" or \"input1[1,3,224,224],input2[1,4]\"."
|
||||
" In case of several input sizes provide the same number for each input (except cases with single shape for any "
|
||||
"input):"
|
||||
" \"[1,3,128,128][3,3,128,128][1,3,320,320]\", \"input1[1,1,128,128][1,1,256,256],input2[80,1]\""
|
||||
" or \"input1[1,192][1,384],input2[1,192][1,384],input3[1,192][1,384],input4[1,192][1,384]\"."
|
||||
" If network shapes are all static specifying the option will cause an exception.";
|
||||
" If model shapes are all static specifying the option will cause an exception.";
|
||||
|
||||
static const char layout_message[] =
|
||||
"Optional. Prompts how network layouts should be treated by application. "
|
||||
"Optional. Prompts how model layouts should be treated by application. "
|
||||
"For example, \"input1[NCHW],input2[NC]\" or \"[NCHW]\" in case of one input size.";
|
||||
|
||||
// @brief message for enabling caching
|
||||
@@ -211,16 +211,15 @@ static const char cache_dir_message[] = "Optional. Enables caching of loaded mod
|
||||
"List of devices which support caching is shown at the end of this message.";
|
||||
|
||||
// @brief message for single load network
|
||||
static const char load_from_file_message[] = "Optional. Loads model from file directly without ReadNetwork."
|
||||
static const char load_from_file_message[] = "Optional. Loads model from file directly without read_model."
|
||||
" All CNNNetwork options (like re-shape) will be ignored";
|
||||
|
||||
// @brief message for inference_precision
|
||||
static const char inference_precision_message[] = "Optional. Inference precision";
|
||||
|
||||
static constexpr char inputs_precision_message[] = "Optional. Specifies precision for all input layers of the network.";
|
||||
static constexpr char inputs_precision_message[] = "Optional. Specifies precision for all input layers of the model.";
|
||||
|
||||
static constexpr char outputs_precision_message[] =
|
||||
"Optional. Specifies precision for all output layers of the network.";
|
||||
static constexpr char outputs_precision_message[] = "Optional. Specifies precision for all output layers of the model.";
|
||||
|
||||
static constexpr char iop_message[] =
|
||||
"Optional. Specifies precision for input and output layers by name.\n"
|
||||
|
||||
@@ -169,11 +169,7 @@ public:
|
||||
std::unique_lock<std::mutex> lock(_mutex);
|
||||
_cv.wait(lock, [this] {
|
||||
if (inferenceException) {
|
||||
try {
|
||||
std::rethrow_exception(inferenceException);
|
||||
} catch (const std::exception& ex) {
|
||||
throw ex;
|
||||
}
|
||||
std::rethrow_exception(inferenceException);
|
||||
}
|
||||
return _idleIds.size() > 0;
|
||||
});
|
||||
@@ -187,11 +183,7 @@ public:
|
||||
std::unique_lock<std::mutex> lock(_mutex);
|
||||
_cv.wait(lock, [this] {
|
||||
if (inferenceException) {
|
||||
try {
|
||||
std::rethrow_exception(inferenceException);
|
||||
} catch (const std::exception& ex) {
|
||||
throw ex;
|
||||
}
|
||||
std::rethrow_exception(inferenceException);
|
||||
}
|
||||
return _idleIds.size() == requests.size();
|
||||
});
|
||||
|
||||
@@ -166,7 +166,7 @@ ov::Tensor create_tensor_from_binary(const std::vector<std::string>& files,
|
||||
files[inputIndex],
|
||||
" contains ",
|
||||
fileSize,
|
||||
" bytes, but the network expects ",
|
||||
" bytes, but the model expects ",
|
||||
inputSize);
|
||||
|
||||
if (inputInfo.layout != "CN") {
|
||||
@@ -380,11 +380,11 @@ std::map<std::string, ov::TensorVector> get_tensors(std::map<std::string, std::v
|
||||
std::ios::fmtflags fmt(std::cout.flags());
|
||||
std::map<std::string, ov::TensorVector> tensors;
|
||||
if (app_inputs_info.empty()) {
|
||||
throw std::logic_error("Inputs Info for network is empty!");
|
||||
throw std::logic_error("Inputs Info for model is empty!");
|
||||
}
|
||||
|
||||
if (!inputFiles.empty() && inputFiles.size() != app_inputs_info[0].size()) {
|
||||
throw std::logic_error("Number of inputs specified in -i must be equal to number of network inputs!");
|
||||
throw std::logic_error("Number of inputs specified in -i must be equal to number of model inputs!");
|
||||
}
|
||||
|
||||
// count image type inputs of network
|
||||
@@ -400,7 +400,7 @@ std::map<std::string, ov::TensorVector> get_tensors(std::map<std::string, std::v
|
||||
for (auto& files : inputFiles) {
|
||||
if (!files.first.empty() && app_inputs_info[0].find(files.first) == app_inputs_info[0].end()) {
|
||||
throw std::logic_error("Input name \"" + files.first +
|
||||
"\" used in -i parameter doesn't match any network's input");
|
||||
"\" used in -i parameter doesn't match any model's input");
|
||||
}
|
||||
|
||||
std::string input_name = files.first.empty() ? app_inputs_info[0].begin()->first : files.first;
|
||||
|
||||
128
samples/cpp/benchmark_app/main.cpp
Executable file → Normal file
128
samples/cpp/benchmark_app/main.cpp
Executable file → Normal file
@@ -43,7 +43,7 @@ std::string get_console_command(int argc, char* argv[]) {
|
||||
std::string relative_path(argv[0]);
|
||||
std::vector<char> buffer;
|
||||
|
||||
uint32_t len = 1024;
|
||||
uint32_t len = 1;
|
||||
do {
|
||||
buffer.resize(len);
|
||||
len = GetFullPathNameA(relative_path.data(), len, buffer.data(), nullptr);
|
||||
@@ -117,8 +117,8 @@ bool parse_and_check_command_line(int argc, char* argv[]) {
|
||||
bool isNetworkCompiled = fileExt(FLAGS_m) == "blob";
|
||||
bool isPrecisionSet = !(FLAGS_ip.empty() && FLAGS_op.empty() && FLAGS_iop.empty());
|
||||
if (isNetworkCompiled && isPrecisionSet) {
|
||||
std::string err = std::string("Cannot set precision for a compiled network. ") +
|
||||
std::string("Please re-compile your network with required precision "
|
||||
std::string err = std::string("Cannot set precision for a compiled model. ") +
|
||||
std::string("Please re-compile your model with required precision "
|
||||
"using compile_tool");
|
||||
|
||||
throw std::logic_error(err);
|
||||
@@ -128,18 +128,17 @@ bool parse_and_check_command_line(int argc, char* argv[]) {
|
||||
|
||||
static void next_step(const std::string additional_info = "") {
|
||||
static size_t step_id = 0;
|
||||
static const std::map<size_t, std::string> step_names = {
|
||||
{1, "Parsing and validating input arguments"},
|
||||
{2, "Loading OpenVINO Runtime"},
|
||||
{3, "Setting device configuration"},
|
||||
{4, "Reading network files"},
|
||||
{5, "Resizing network to match image sizes and given batch"},
|
||||
{6, "Configuring input of the model"},
|
||||
{7, "Loading the model to the device"},
|
||||
{8, "Setting optimal runtime parameters"},
|
||||
{9, "Creating infer requests and preparing input blobs with data"},
|
||||
{10, "Measuring performance"},
|
||||
{11, "Dumping statistics report"}};
|
||||
static const std::map<size_t, std::string> step_names = {{1, "Parsing and validating input arguments"},
|
||||
{2, "Loading OpenVINO Runtime"},
|
||||
{3, "Setting device configuration"},
|
||||
{4, "Reading model files"},
|
||||
{5, "Resizing model to match image sizes and given batch"},
|
||||
{6, "Configuring input of the model"},
|
||||
{7, "Loading the model to the device"},
|
||||
{8, "Querying optimal runtime parameters"},
|
||||
{9, "Creating infer requests and preparing input tensors"},
|
||||
{10, "Measuring performance"},
|
||||
{11, "Dumping statistics report"}};
|
||||
|
||||
step_id++;
|
||||
|
||||
@@ -174,7 +173,7 @@ ov::hint::PerformanceMode get_performance_hint(const std::string& device, const
|
||||
}
|
||||
} else {
|
||||
ov_perf_hint =
|
||||
FLAGS_api == "sync" ? ov::hint::PerformanceMode::LATENCY : ov::hint::PerformanceMode::THROUGHPUT;
|
||||
FLAGS_api == "async" ? ov::hint::PerformanceMode::THROUGHPUT : ov::hint::PerformanceMode::LATENCY;
|
||||
|
||||
slog::warn << "Performance hint was not explicitly specified in command line. "
|
||||
"Device("
|
||||
@@ -213,7 +212,7 @@ int main(int argc, char* argv[]) {
|
||||
|
||||
bool isNetworkCompiled = fileExt(FLAGS_m) == "blob";
|
||||
if (isNetworkCompiled) {
|
||||
slog::info << "Network is compiled" << slog::endl;
|
||||
slog::info << "Model is compiled" << slog::endl;
|
||||
}
|
||||
|
||||
std::vector<gflags::CommandLineFlagInfo> flags;
|
||||
@@ -284,8 +283,9 @@ int main(int argc, char* argv[]) {
|
||||
slog::info << "GPU extensions are loaded: " << ext << slog::endl;
|
||||
}
|
||||
|
||||
slog::info << "OpenVINO: " << ov::get_openvino_version() << slog::endl;
|
||||
slog::info << "Device info: " << slog::endl;
|
||||
slog::info << "OpenVINO:" << slog::endl;
|
||||
slog::info << ov::get_openvino_version() << slog::endl;
|
||||
slog::info << "Device info:" << slog::endl;
|
||||
slog::info << core.get_versions(device_name) << slog::endl;
|
||||
|
||||
// ----------------- 3. Setting device configuration
|
||||
@@ -419,7 +419,7 @@ int main(int argc, char* argv[]) {
|
||||
"but it still may be non-optimal for some cases, for more "
|
||||
"information look at README."
|
||||
<< slog::endl;
|
||||
if (std::string::npos == device.find("MYRIAD")) { // MYRIAD sets the default number of
|
||||
if (device.find("MYRIAD") == std::string::npos) { // MYRIAD sets the default number of
|
||||
// streams implicitly (without _AUTO)
|
||||
if (supported(key)) {
|
||||
device_config[key] = std::string(getDeviceTypeFromName(device) + "_THROUGHPUT_AUTO");
|
||||
@@ -546,22 +546,22 @@ int main(int argc, char* argv[]) {
|
||||
|
||||
if (FLAGS_load_from_file && !isNetworkCompiled) {
|
||||
next_step();
|
||||
slog::info << "Skipping the step for loading network from file" << slog::endl;
|
||||
slog::info << "Skipping the step for loading model from file" << slog::endl;
|
||||
next_step();
|
||||
slog::info << "Skipping the step for loading network from file" << slog::endl;
|
||||
slog::info << "Skipping the step for loading model from file" << slog::endl;
|
||||
next_step();
|
||||
slog::info << "Skipping the step for loading network from file" << slog::endl;
|
||||
slog::info << "Skipping the step for loading model from file" << slog::endl;
|
||||
auto startTime = Time::now();
|
||||
compiledModel = core.compile_model(FLAGS_m, device_name);
|
||||
auto duration_ms = get_duration_ms_till_now(startTime);
|
||||
slog::info << "Load network took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original network I/O parameters:" << slog::endl;
|
||||
slog::info << "Compile model took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original model I/O parameters:" << slog::endl;
|
||||
printInputAndOutputsInfoShort(compiledModel);
|
||||
|
||||
if (statistics)
|
||||
statistics->add_parameters(
|
||||
StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("load network time (ms)", "load_network_time", duration_ms)});
|
||||
{StatisticsVariant("сompile model time (ms)", "load_model_time", duration_ms)});
|
||||
|
||||
convert_io_names_in_map(inputFiles, compiledModel.inputs());
|
||||
app_inputs_info = get_inputs_info(FLAGS_shape,
|
||||
@@ -581,19 +581,18 @@ int main(int argc, char* argv[]) {
|
||||
// ----------------------------------------
|
||||
next_step();
|
||||
|
||||
slog::info << "Loading network files" << slog::endl;
|
||||
slog::info << "Loading model files" << slog::endl;
|
||||
|
||||
auto startTime = Time::now();
|
||||
auto model = core.read_model(FLAGS_m);
|
||||
auto duration_ms = get_duration_ms_till_now(startTime);
|
||||
slog::info << "Read network took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original network I/O parameters:" << slog::endl;
|
||||
slog::info << "Read model took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original model I/O parameters:" << slog::endl;
|
||||
printInputAndOutputsInfoShort(*model);
|
||||
|
||||
if (statistics)
|
||||
statistics->add_parameters(
|
||||
StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("read network time (ms)", "read_network_time", duration_ms)});
|
||||
statistics->add_parameters(StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("read model time (ms)", "read_model_time", duration_ms)});
|
||||
|
||||
const auto& inputInfo = std::const_pointer_cast<const ov::Model>(model)->inputs();
|
||||
if (inputInfo.empty()) {
|
||||
@@ -625,15 +624,15 @@ int main(int argc, char* argv[]) {
|
||||
benchmark_app::PartialShapes shapes = {};
|
||||
for (auto& item : app_inputs_info[0])
|
||||
shapes[item.first] = item.second.partialShape;
|
||||
slog::info << "Reshaping network: " << get_shapes_string(shapes) << slog::endl;
|
||||
slog::info << "Reshaping model: " << get_shapes_string(shapes) << slog::endl;
|
||||
startTime = Time::now();
|
||||
model->reshape(shapes);
|
||||
duration_ms = get_duration_ms_till_now(startTime);
|
||||
slog::info << "Reshape network took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Reshape model took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
if (statistics)
|
||||
statistics->add_parameters(
|
||||
StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("reshape network time (ms)", "reshape_network_time", duration_ms)});
|
||||
{StatisticsVariant("reshape model time (ms)", "reshape_model_time", duration_ms)});
|
||||
}
|
||||
|
||||
// ----------------- 6. Configuring inputs and outputs
|
||||
@@ -725,7 +724,7 @@ int main(int argc, char* argv[]) {
|
||||
if (!isDynamicNetwork && app_inputs_info.size()) {
|
||||
batchSize = get_batch_size(app_inputs_info.front());
|
||||
|
||||
slog::info << "Network batch size: " << batchSize << slog::endl;
|
||||
slog::info << "Model batch size: " << batchSize << slog::endl;
|
||||
} else if (batchSize == 0) {
|
||||
batchSize = 1;
|
||||
}
|
||||
@@ -737,18 +736,18 @@ int main(int argc, char* argv[]) {
|
||||
startTime = Time::now();
|
||||
compiledModel = core.compile_model(model, device_name);
|
||||
duration_ms = get_duration_ms_till_now(startTime);
|
||||
slog::info << "Load network took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Compile model took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
if (statistics)
|
||||
statistics->add_parameters(
|
||||
StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("load network time (ms)", "load_network_time", duration_ms)});
|
||||
{StatisticsVariant("compile model time (ms)", "load_model_time", duration_ms)});
|
||||
} else {
|
||||
next_step();
|
||||
slog::info << "Skipping the step for compiled network" << slog::endl;
|
||||
slog::info << "Skipping the step for compiled model" << slog::endl;
|
||||
next_step();
|
||||
slog::info << "Skipping the step for compiled network" << slog::endl;
|
||||
slog::info << "Skipping the step for compiled model" << slog::endl;
|
||||
next_step();
|
||||
slog::info << "Skipping the step for compiled network" << slog::endl;
|
||||
slog::info << "Skipping the step for compiled model" << slog::endl;
|
||||
// ----------------- 7. Loading the model to the device
|
||||
// --------------------------------------------------------
|
||||
next_step();
|
||||
@@ -762,14 +761,14 @@ int main(int argc, char* argv[]) {
|
||||
modelStream.close();
|
||||
|
||||
auto duration_ms = get_duration_ms_till_now(startTime);
|
||||
slog::info << "Import network took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original network I/O paramteters:" << slog::endl;
|
||||
slog::info << "Import model took " << double_to_string(duration_ms) << " ms" << slog::endl;
|
||||
slog::info << "Original model I/O paramteters:" << slog::endl;
|
||||
printInputAndOutputsInfoShort(compiledModel);
|
||||
|
||||
if (statistics)
|
||||
statistics->add_parameters(
|
||||
StatisticsReport::Category::EXECUTION_RESULTS,
|
||||
{StatisticsVariant("import network time (ms)", "import_network_time", duration_ms)});
|
||||
{StatisticsVariant("import model time (ms)", "import_model_time", duration_ms)});
|
||||
|
||||
convert_io_names_in_map(inputFiles, compiledModel.inputs());
|
||||
app_inputs_info = get_inputs_info(FLAGS_shape,
|
||||
@@ -786,7 +785,7 @@ int main(int argc, char* argv[]) {
|
||||
}
|
||||
|
||||
if (isDynamicNetwork && FLAGS_api == "sync") {
|
||||
throw std::logic_error("Benchmarking of the model with dynamic shapes is available for async API only."
|
||||
throw std::logic_error("Benchmarking of the model with dynamic shapes is available for async API only. "
|
||||
"Please use -api async -nstreams 1 -nireq 1 to emulate sync behavior");
|
||||
}
|
||||
|
||||
@@ -804,20 +803,15 @@ int main(int argc, char* argv[]) {
|
||||
// ----------------- 8. Querying optimal runtime parameters
|
||||
// -----------------------------------------------------
|
||||
next_step();
|
||||
// output of the actual settings that the device selected
|
||||
for (const auto& device : devices) {
|
||||
auto supported_properties = compiledModel.get_property(ov::supported_properties);
|
||||
slog::info << "Device: " << device << slog::endl;
|
||||
for (const auto& cfg : supported_properties) {
|
||||
try {
|
||||
if (cfg == ov::supported_properties)
|
||||
continue;
|
||||
|
||||
auto prop = compiledModel.get_property(cfg);
|
||||
slog::info << " { " << cfg << " , " << prop.as<std::string>() << " }" << slog::endl;
|
||||
} catch (const ov::Exception&) {
|
||||
}
|
||||
}
|
||||
// output of the actual settings that the device selected
|
||||
auto supported_properties = compiledModel.get_property(ov::supported_properties);
|
||||
slog::info << "Model:" << slog::endl;
|
||||
for (const auto& cfg : supported_properties) {
|
||||
if (cfg == ov::supported_properties)
|
||||
continue;
|
||||
auto prop = compiledModel.get_property(cfg);
|
||||
slog::info << " " << cfg << ": " << prop.as<std::string>() << slog::endl;
|
||||
}
|
||||
|
||||
// Update number of streams
|
||||
@@ -994,11 +988,10 @@ int main(int argc, char* argv[]) {
|
||||
next_step(ss.str());
|
||||
|
||||
if (inferenceOnly) {
|
||||
slog::info << "BENCHMARK IS IN INFERENCE ONLY MODE." << slog::endl;
|
||||
slog::info << "Input blobs will be filled once before performance measurements." << slog::endl;
|
||||
slog::info << "Benchmarking in inference only mode (inputs filling are not included in measurement loop)."
|
||||
<< slog::endl;
|
||||
} else {
|
||||
slog::info << "BENCHMARK IS IN FULL MODE." << slog::endl;
|
||||
slog::info << "Inputs setup stage will be included in performance measurements." << slog::endl;
|
||||
slog::info << "Benchmarking in full mode (inputs filling are included in measurement loop)." << slog::endl;
|
||||
}
|
||||
|
||||
// copy prepared data straight into inferRequest->getTensor()
|
||||
@@ -1255,8 +1248,10 @@ int main(int argc, char* argv[]) {
|
||||
// Performance metrics report
|
||||
if (device_name.find("AUTO") != std::string::npos)
|
||||
slog::info << "ExecutionDevice: " << compiledModel.get_property(ov::execution_devices) << slog::endl;
|
||||
slog::info << "Count: " << iteration << " iterations" << slog::endl;
|
||||
slog::info << "Duration: " << double_to_string(totalDuration) << " ms" << slog::endl;
|
||||
|
||||
slog::info << "Count: " << iteration << " iterations" << slog::endl;
|
||||
slog::info << "Duration: " << double_to_string(totalDuration) << " ms" << slog::endl;
|
||||
|
||||
if (device_name.find("MULTI") == std::string::npos) {
|
||||
slog::info << "Latency:" << slog::endl;
|
||||
generalLatency.write_to_slog();
|
||||
@@ -1270,7 +1265,7 @@ int main(int argc, char* argv[]) {
|
||||
auto shape = item.second.dataShape;
|
||||
std::copy(shape.begin(), shape.end() - 1, std::ostream_iterator<size_t>(input_shape, ","));
|
||||
input_shape << shape.back();
|
||||
slog::info << " " << item.first << " : " << get_shape_string(item.second.dataShape);
|
||||
slog::info << " " << item.first << ": " << get_shape_string(item.second.dataShape);
|
||||
}
|
||||
slog::info << slog::endl;
|
||||
|
||||
@@ -1278,7 +1273,8 @@ int main(int argc, char* argv[]) {
|
||||
}
|
||||
}
|
||||
}
|
||||
slog::info << "Throughput: " << double_to_string(fps) << " FPS" << slog::endl;
|
||||
|
||||
slog::info << "Throughput: " << double_to_string(fps) << " FPS" << slog::endl;
|
||||
|
||||
} catch (const std::exception& ex) {
|
||||
slog::err << ex.what() << slog::endl;
|
||||
|
||||
@@ -30,8 +30,8 @@ public:
|
||||
add_progress(num);
|
||||
}
|
||||
_isFinished = true;
|
||||
_bar->finish();
|
||||
if (_progressEnabled) {
|
||||
_bar->finish();
|
||||
std::cout << std::endl;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -345,15 +345,13 @@ void LatencyMetrics::write_to_stream(std::ostream& stream) const {
|
||||
|
||||
void LatencyMetrics::write_to_slog() const {
|
||||
std::string percentileStr = (percentile_boundary == 50)
|
||||
? "\tMedian: "
|
||||
: "\t" + std::to_string(percentile_boundary) + " percentile: ";
|
||||
if (!data_shape.empty()) {
|
||||
slog::info << "\tData shape: " << data_shape << slog::endl;
|
||||
}
|
||||
? " Median: "
|
||||
: " " + std::to_string(percentile_boundary) + " percentile: ";
|
||||
|
||||
slog::info << percentileStr << double_to_string(median_or_percentile) << " ms" << slog::endl;
|
||||
slog::info << "\tAverage: " << double_to_string(avg) << " ms" << slog::endl;
|
||||
slog::info << "\tMin: " << double_to_string(min) << " ms" << slog::endl;
|
||||
slog::info << "\tMax: " << double_to_string(max) << " ms" << slog::endl;
|
||||
slog::info << " Average: " << double_to_string(avg) << " ms" << slog::endl;
|
||||
slog::info << " Min: " << double_to_string(min) << " ms" << slog::endl;
|
||||
slog::info << " Max: " << double_to_string(max) << " ms" << slog::endl;
|
||||
}
|
||||
|
||||
const nlohmann::json LatencyMetrics::to_json() const {
|
||||
|
||||
@@ -480,6 +480,8 @@ std::vector<benchmark_app::InputsInfo> get_inputs_info(const std::string& shape_
|
||||
}
|
||||
}
|
||||
|
||||
slog::info << "Model batch size: " << batch_size << slog::endl;
|
||||
|
||||
reshape_required = false;
|
||||
|
||||
std::map<std::string, int> currentFileCounters;
|
||||
@@ -624,15 +626,15 @@ std::vector<benchmark_app::InputsInfo> get_inputs_info(const std::string& shape_
|
||||
info.dataShape = info.partialShape.get_shape();
|
||||
if (data_shapes_map.find(name) != data_shapes_map.end()) {
|
||||
throw std::logic_error(
|
||||
"Network's input \"" + name +
|
||||
"Model's input \"" + name +
|
||||
"\" is static. Use -shape argument for static inputs instead of -data_shape.");
|
||||
}
|
||||
} else if (!data_shapes_map.empty()) {
|
||||
throw std::logic_error("Can't find network input name \"" + name + "\" in \"-data_shape " +
|
||||
throw std::logic_error("Can't find model input name \"" + name + "\" in \"-data_shape " +
|
||||
data_shapes_string + "\" command line parameter");
|
||||
} else {
|
||||
throw std::logic_error("-i or -data_shape command line parameter should be set for all inputs in case "
|
||||
"of network with dynamic shapes.");
|
||||
"of model with dynamic shapes.");
|
||||
}
|
||||
|
||||
// Update shape with batch if needed (only in static shape case)
|
||||
@@ -892,4 +894,4 @@ std::string parameter_name_to_tensor_name(const std::string& name,
|
||||
}
|
||||
throw std::runtime_error("Provided I/O name \"" + name +
|
||||
"\" is not found neither in tensor names nor in nodes names.");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -151,4 +151,4 @@ void convert_io_names_in_map(
|
||||
std::move(item.second);
|
||||
}
|
||||
map = new_map;
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user