diff --git a/docs/OV_Runtime_UG/Samples_Overview.md b/docs/OV_Runtime_UG/Samples_Overview.md index f9c9efd8009..e896d8cae5a 100644 --- a/docs/OV_Runtime_UG/Samples_Overview.md +++ b/docs/OV_Runtime_UG/Samples_Overview.md @@ -24,6 +24,11 @@ openvino_inference_engine_ie_bridges_python_sample_model_creation_sample_README openvino_inference_engine_samples_speech_sample_README openvino_inference_engine_ie_bridges_python_sample_speech_sample_README + openvino_inference_engine_samples_sync_benchmark_README + openvino_inference_engine_ie_bridges_python_sample_sync_benchmark_README + openvino_inference_engine_samples_throughput_benchmark_README + openvino_inference_engine_ie_bridges_python_sample_throughput_benchmark_README + openvino_inference_engine_ie_bridges_python_sample_bert_benchmark_README openvino_inference_engine_samples_benchmark_app_README openvino_inference_engine_tools_benchmark_tool_README @@ -60,6 +65,12 @@ The applications include: - **OpenVINO Model Creation Sample** – Construction of the LeNet model using the OpenVINO model creation sample. - [OpenVINO Model Creation C++ Sample](../../samples/cpp/model_creation_sample/README.md) - [OpenVINO Model Creation Python Sample](../../samples/python/model_creation_sample/README.md) +- **Benchmark Samples** - Simple estimation of a model inference performance + - [Sync Benchmark C++ Sample](../../samples/cpp/benchmark/sync_benchmark/README.md) + - [Sync Benchmark Python* Sample](../../samples/python/benchmark/sync_benchmark/README.md) + - [Throughput Benchmark C++ Sample](../../samples/cpp/benchmark/throughput_benchmark/README.md) + - [Throughput Benchmark Python* Sample](../../samples/python/benchmark/throughput_benchmark/README.md) + - [Bert Benhcmark Python* Sample](../../samples/python/benchmark/bert_benhcmark/README.md) - **Benchmark Application** – Estimates deep learning inference performance on supported devices for synchronous and asynchronous modes. diff --git a/samples/cpp/benchmark/CMakeLists.txt b/samples/cpp/benchmark/CMakeLists.txt new file mode 100644 index 00000000000..61c52a6b878 --- /dev/null +++ b/samples/cpp/benchmark/CMakeLists.txt @@ -0,0 +1,6 @@ +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +# + +add_subdirectory(sync_benchmark) +add_subdirectory(throughput_benchmark) diff --git a/samples/cpp/benchmark/sync_benchmark/CMakeLists.txt b/samples/cpp/benchmark/sync_benchmark/CMakeLists.txt new file mode 100644 index 00000000000..39a1b86f3f0 --- /dev/null +++ b/samples/cpp/benchmark/sync_benchmark/CMakeLists.txt @@ -0,0 +1,7 @@ +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +# + +ie_add_sample(NAME sync_benchmark + SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/main.cpp" + DEPENDENCIES ie_samples_utils) diff --git a/samples/cpp/benchmark/sync_benchmark/README.md b/samples/cpp/benchmark/sync_benchmark/README.md new file mode 100644 index 00000000000..f3132d8675c --- /dev/null +++ b/samples/cpp/benchmark/sync_benchmark/README.md @@ -0,0 +1,97 @@ +# Sync Benchmark C++ Sample {#openvino_inference_engine_samples_sync_benchmark_README} + +This sample demonstrates how to estimate performace of a model using Synchronous Inference Request API. It makes sence to use synchronous inference only in latency oriented scenarios. Models with static input shapes are supported. Unlike [demos](@ref omz_demos) this sample doesn't have other configurable command line arguments. Feel free to modify sample's source code to try out different options. + +The following C++ API is used in the application: + +| Feature | API | Description | +| :--- | :--- | :--- | +| OpenVINO Runtime Version | `ov::get_openvino_version` | Get Openvino API version | +| Basic Infer Flow | `ov::Core`, `ov::Core::compile_model`, `ov::CompiledModel::create_infer_request`, `ov::InferRequest::get_tensor` | Common API to do inference: compile a model, create an infer request, configure input tensors | +| Synchronous Infer | `ov::InferRequest::infer` | Do synchronous inference | +| Model Operations | `ov::CompiledModel::inputs` | Get inputs of a model | +| Tensor Operations | `ov::Tensor::get_shape` | Get a tensor shape | +| Tensor Operations | `ov::Tensor::get_shape`, `ov::Tensor::data` | Get a tensor shape and its data. | + +| Options | Values | +| :--- | :--- | +| Validated Models | [alexnet](@ref omz_models_model_alexnet), [googlenet-v1](@ref omz_models_model_googlenet_v1) [yolo-v3-tf](@ref omz_models_model_yolo_v3_tf), [face-detection-0200](@ref omz_models_model_face_detection_0200) | +| Model Format | OpenVINO™ toolkit Intermediate Representation (\*.xml + \*.bin), ONNX (\*.onnx) | +| Supported devices | [All](../../../../docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md) | +| Other language realization | [Python](../../../python/benchmark/sync_benchmark/README.md) | + +## How It Works + +The sample compiles a model for a given device, randomly generates input data, performs synchronous inference multiple times for a given number of seconds. Then processes and reports performance results. + +You can see the explicit description of +each sample step at [Integration Steps](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) section of "Integrate OpenVINO™ Runtime with Your Application" guide. + +## Building + +To build the sample, please use instructions available at [Build the Sample Applications](../../../../docs/OV_Runtime_UG/Samples_Overview.md) section in OpenVINO™ Toolkit Samples guide. + +## Running + +``` +sync_benchmark +``` + +To run the sample, you need to specify a model: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). + +> **NOTES**: +> +> - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> +> - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. + +### Example + +1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: + +``` +python -m pip install openvino-dev[caffe] +``` + +2. Download a pre-trained model using: + +``` +omz_downloader --name googlenet-v1 +``` + +3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter: + +``` +omz_converter --name googlenet-v1 +``` + +4. Perform benchmarking using the `googlenet-v1` model on a `CPU`: + +``` +sync_benchmark googlenet-v1.xml +``` + +## Sample Output + +The application outputs performance results. + +``` +[ INFO ] OpenVINO: +[ INFO ] Build ................................. +[ INFO ] Count: 992 iterations +[ INFO ] Duration: 15009.8 ms +[ INFO ] Latency: +[ INFO ] Median: 14.00 ms +[ INFO ] Average: 15.13 ms +[ INFO ] Min: 9.33 ms +[ INFO ] Max: 53.60 ms +[ INFO ] Throughput: 66.09 FPS +``` + +## See Also + +- [Integrate the OpenVINO™ Runtime with Your Application](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) +- [Using OpenVINO™ Toolkit Samples](../../../../docs/OV_Runtime_UG/Samples_Overview.md) +- [Model Downloader](@ref omz_tools_downloader) +- [Model Optimizer](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/samples/cpp/benchmark/sync_benchmark/main.cpp b/samples/cpp/benchmark/sync_benchmark/main.cpp new file mode 100644 index 00000000000..9fb58c1f8ad --- /dev/null +++ b/samples/cpp/benchmark/sync_benchmark/main.cpp @@ -0,0 +1,72 @@ +// Copyright (C) 2022 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include +#include + +// clang-format off +#include "openvino/openvino.hpp" + +#include "samples/args_helper.hpp" +#include "samples/common.hpp" +#include "samples/latency_metrics.hpp" +#include "samples/slog.hpp" +// clang-format on + +using Ms = std::chrono::duration>; + +int main(int argc, char* argv[]) { + try { + slog::info << "OpenVINO:" << slog::endl; + slog::info << ov::get_openvino_version(); + if (argc != 2) { + slog::info << "Usage : " << argv[0] << " " << slog::endl; + return EXIT_FAILURE; + } + // Optimize for latency. Most of the devices are configured for latency by default, + // but there are exceptions like MYRIAD + ov::AnyMap latency{{ov::hint::performance_mode.name(), ov::hint::PerformanceMode::LATENCY}}; + + // Create ov::Core and use it to compile a model. + // Pick a device by replacing CPU, for example AUTO:GPU,CPU. + // Using MULTI device is pointless in sync scenario + // because only one instance of ov::InferRequest is used + ov::Core core; + ov::CompiledModel compiled_model = core.compile_model(argv[1], "CPU", latency); + ov::InferRequest ireq = compiled_model.create_infer_request(); + // Fill input data for the ireq + for (const ov::Output& model_input : compiled_model.inputs()) { + fill_tensor_random(ireq.get_tensor(model_input)); + } + // Warm up + ireq.infer(); + // Benchmark for seconds_to_run seconds and at least niter iterations + std::chrono::seconds seconds_to_run{10}; + size_t niter = 10; + std::vector latencies; + latencies.reserve(niter); + auto start = std::chrono::steady_clock::now(); + auto time_point = start; + auto time_point_to_finish = start + seconds_to_run; + while (time_point < time_point_to_finish || latencies.size() < niter) { + ireq.infer(); + auto iter_end = std::chrono::steady_clock::now(); + latencies.push_back(std::chrono::duration_cast(iter_end - time_point).count()); + time_point = iter_end; + } + auto end = time_point; + double duration = std::chrono::duration_cast(end - start).count(); + // Report results + slog::info << "Count: " << latencies.size() << " iterations" << slog::endl + << "Duration: " << duration << " ms" << slog::endl + << "Latency:" << slog::endl; + size_t percent = 50; + LatencyMetrics{latencies, "", percent}.write_to_slog(); + slog::info << "Throughput: " << double_to_string(latencies.size() * 1000 / duration) << " FPS" << slog::endl; + } catch (const std::exception& ex) { + slog::err << ex.what() << slog::endl; + return EXIT_FAILURE; + } + return EXIT_SUCCESS; +} diff --git a/samples/cpp/benchmark/throughput_benchmark/CMakeLists.txt b/samples/cpp/benchmark/throughput_benchmark/CMakeLists.txt new file mode 100644 index 00000000000..682feee8cef --- /dev/null +++ b/samples/cpp/benchmark/throughput_benchmark/CMakeLists.txt @@ -0,0 +1,7 @@ +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 +# + +ie_add_sample(NAME throughput_benchmark + SOURCES "${CMAKE_CURRENT_SOURCE_DIR}/main.cpp" + DEPENDENCIES ie_samples_utils) diff --git a/samples/cpp/benchmark/throughput_benchmark/README.md b/samples/cpp/benchmark/throughput_benchmark/README.md new file mode 100644 index 00000000000..b8caccea4c1 --- /dev/null +++ b/samples/cpp/benchmark/throughput_benchmark/README.md @@ -0,0 +1,98 @@ +# Throughput Benchmark C++ Sample {#openvino_inference_engine_samples_throughput_benchmark_README} + +This sample demonstrates how to estimate performace of a model using Asynchronous Inference Request API in throughput mode. Unlike [demos](@ref omz_demos) this sample doesn't have other configurable command line arguments. Feel free to modify sample's source code to try out different options. + +The reported results may deviate from what [benchmark_app](../../benchmark_app/README.md) reports. One example is model input precision for computer vision tasks. benchmark_app sets uint8, while the sample uses default model precision which is usually float32. + +The following C++ API is used in the application: + +| Feature | API | Description | +| :--- | :--- | :--- | +| OpenVINO Runtime Version | `ov::get_openvino_version` | Get Openvino API version | +| Basic Infer Flow | `ov::Core`, `ov::Core::compile_model`, `ov::CompiledModel::create_infer_request`, `ov::InferRequest::get_tensor` | Common API to do inference: compile a model, create an infer request, configure input tensors | +| Asynchronous Infer | `ov::InferRequest::start_async`, `ov::InferRequest::set_callback` | Do asynchronous inference with callback. | +| Model Operations | `ov::CompiledModel::inputs` | Get inputs of a model | +| Tensor Operations | `ov::Tensor::get_shape`, `ov::Tensor::data` | Get a tensor shape and its data. | + +| Options | Values | +| :--- | :--- | +| Validated Models | [alexnet](@ref omz_models_model_alexnet), [googlenet-v1](@ref omz_models_model_googlenet_v1) [yolo-v3-tf](@ref omz_models_model_yolo_v3_tf), [face-detection-0200](@ref omz_models_model_face_detection_0200) | +| Model Format | OpenVINO™ toolkit Intermediate Representation (\*.xml + \*.bin), ONNX (\*.onnx) | +| Supported devices | [All](../../../../docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md) | +| Other language realization | [Python](../../../python/benchmark/throughput_benchmark/README.md) | + +## How It Works + +The sample compiles a model for a given device, randomly generates input data, performs asynchronous inference multiple times for a given number of seconds. Then processes and reports performance results. + +You can see the explicit description of +each sample step at [Integration Steps](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) section of "Integrate OpenVINO™ Runtime with Your Application" guide. + +## Building + +To build the sample, please use instructions available at [Build the Sample Applications](../../../../docs/OV_Runtime_UG/Samples_Overview.md) section in OpenVINO™ Toolkit Samples guide. + +## Running + +``` +throughput_benchmark +``` + +To run the sample, you need to specify a model: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). + +> **NOTES**: +> +> - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> +> - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. + +### Example + +1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: + +``` +python -m pip install openvino-dev[caffe] +``` + +2. Download a pre-trained model using: + +``` +omz_downloader --name googlenet-v1 +``` + +3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter: + +``` +omz_converter --name googlenet-v1 +``` + +4. Perform benchmarking using the `googlenet-v1` model on a `CPU`: + +``` +throughput_benchmark googlenet-v1.xml +``` + +## Sample Output + +The application outputs performance results. + +``` +[ INFO ] OpenVINO: +[ INFO ] Build ................................. +[ INFO ] Count: 1577 iterations +[ INFO ] Duration: 15024.2 ms +[ INFO ] Latency: +[ INFO ] Median: 38.02 ms +[ INFO ] Average: 38.08 ms +[ INFO ] Min: 25.23 ms +[ INFO ] Max: 49.16 ms +[ INFO ] Throughput: 104.96 FPS +``` + +## See Also + +- [Integrate the OpenVINO™ Runtime with Your Application](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) +- [Using OpenVINO™ Toolkit Samples](../../../../docs/OV_Runtime_UG/Samples_Overview.md) +- [Model Downloader](@ref omz_tools_downloader) +- [Model Optimizer](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/samples/cpp/benchmark/throughput_benchmark/main.cpp b/samples/cpp/benchmark/throughput_benchmark/main.cpp new file mode 100644 index 00000000000..885bd27713b --- /dev/null +++ b/samples/cpp/benchmark/throughput_benchmark/main.cpp @@ -0,0 +1,134 @@ +// Copyright (C) 2022 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#include +#include +#include +#include + +// clang-format off +#include "openvino/openvino.hpp" + +#include "samples/args_helper.hpp" +#include "samples/common.hpp" +#include "samples/latency_metrics.hpp" +#include "samples/slog.hpp" +// clang-format on + +using Ms = std::chrono::duration>; + +int main(int argc, char* argv[]) { + try { + slog::info << "OpenVINO:" << slog::endl; + slog::info << ov::get_openvino_version(); + if (argc != 2) { + slog::info << "Usage : " << argv[0] << " " << slog::endl; + return EXIT_FAILURE; + } + // Optimize for throughput. Best throughput can be reached by + // running multiple ov::InferRequest instances asyncronously + ov::AnyMap tput{{ov::hint::performance_mode.name(), ov::hint::PerformanceMode::THROUGHPUT}}; + + // Create ov::Core and use it to compile a model. + // Pick a device by replacing CPU, for example MULTI:CPU(4),GPU(8). + // It is possible to set CUMULATIVE_THROUGHPUT as ov::hint::PerformanceMode for AUTO device + ov::Core core; + ov::CompiledModel compiled_model = core.compile_model(argv[1], "CPU", tput); + // Create optimal number of ov::InferRequest instances + uint32_t nireq = compiled_model.get_property(ov::optimal_number_of_infer_requests); + std::vector ireqs(nireq); + std::generate(ireqs.begin(), ireqs.end(), [&] { + return compiled_model.create_infer_request(); + }); + // Fill input data for ireqs + for (ov::InferRequest& ireq : ireqs) { + for (const ov::Output& model_input : compiled_model.inputs()) { + fill_tensor_random(ireq.get_tensor(model_input)); + } + } + // Warm up + for (ov::InferRequest& ireq : ireqs) { + ireq.start_async(); + } + for (ov::InferRequest& ireq : ireqs) { + ireq.wait(); + } + // Benchmark for seconds_to_run seconds and at least niter iterations + std::chrono::seconds seconds_to_run{10}; + size_t niter = 10; + std::vector latencies; + std::mutex mutex; + std::condition_variable cv; + std::exception_ptr callback_exception; + struct TimedIreq { + ov::InferRequest& ireq; // ref + std::chrono::steady_clock::time_point start; + bool has_start_time; + }; + std::deque finished_ireqs; + for (ov::InferRequest& ireq : ireqs) { + finished_ireqs.push_back({ireq, std::chrono::steady_clock::time_point{}, false}); + } + auto start = std::chrono::steady_clock::now(); + auto time_point_to_finish = start + seconds_to_run; + // Once there’s a finished ireq wake up main thread. + // Compute and save latency for that ireq and prepare for next inference by setting up callback. + // Callback pushes that ireq again to finished ireqs when infrence is completed. + // Start asynchronous infer with updated callback + for (;;) { + std::unique_lock lock(mutex); + while (!callback_exception && finished_ireqs.empty()) { + cv.wait(lock); + } + if (callback_exception) { + std::rethrow_exception(callback_exception); + } + if (!finished_ireqs.empty()) { + auto time_point = std::chrono::steady_clock::now(); + if (time_point > time_point_to_finish && latencies.size() > niter) { + break; + } + TimedIreq timedIreq = finished_ireqs.front(); + finished_ireqs.pop_front(); + lock.unlock(); + ov::InferRequest& ireq = timedIreq.ireq; + if (timedIreq.has_start_time) { + latencies.push_back(std::chrono::duration_cast(time_point - timedIreq.start).count()); + } + ireq.set_callback( + [&ireq, time_point, &mutex, &finished_ireqs, &callback_exception, &cv](std::exception_ptr ex) { + // Keep callback small. This improves performance for fast (tens of thousands FPS) models + std::unique_lock lock(mutex); + { + try { + if (ex) { + std::rethrow_exception(ex); + } + finished_ireqs.push_back({ireq, time_point, true}); + } catch (const std::exception&) { + if (!callback_exception) { + callback_exception = std::current_exception(); + } + } + } + cv.notify_one(); + }); + ireq.start_async(); + } + } + auto end = std::chrono::steady_clock::now(); + double duration = std::chrono::duration_cast(end - start).count(); + // Report results + slog::info << "Count: " << latencies.size() << " iterations" << slog::endl + << "Duration: " << duration << " ms" << slog::endl + << "Latency:" << slog::endl; + size_t percent = 50; + LatencyMetrics{latencies, "", percent}.write_to_slog(); + slog::info << "Throughput: " << double_to_string(1000 * latencies.size() / duration) << " FPS" << slog::endl; + } catch (const std::exception& ex) { + slog::err << ex.what() << slog::endl; + return EXIT_FAILURE; + } + return EXIT_SUCCESS; +} diff --git a/samples/cpp/benchmark_app/statistics_report.cpp b/samples/cpp/benchmark_app/statistics_report.cpp index 3c03e570a9a..590008e6ab1 100644 --- a/samples/cpp/benchmark_app/statistics_report.cpp +++ b/samples/cpp/benchmark_app/statistics_report.cpp @@ -336,45 +336,16 @@ const nlohmann::json StatisticsReportJSON::sort_perf_counters_to_json( return js; } -void LatencyMetrics::write_to_stream(std::ostream& stream) const { - std::ios::fmtflags fmt(std::cout.flags()); - stream << data_shape << ";" << std::fixed << std::setprecision(2) << median_or_percentile << ";" << avg << ";" - << min << ";" << max; - std::cout.flags(fmt); -} - -void LatencyMetrics::write_to_slog() const { - std::string percentileStr = (percentile_boundary == 50) - ? " Median: " - : " " + std::to_string(percentile_boundary) + " percentile: "; - - slog::info << percentileStr << double_to_string(median_or_percentile) << " ms" << slog::endl; - slog::info << " Average: " << double_to_string(avg) << " ms" << slog::endl; - slog::info << " Min: " << double_to_string(min) << " ms" << slog::endl; - slog::info << " Max: " << double_to_string(max) << " ms" << slog::endl; -} - -const nlohmann::json LatencyMetrics::to_json() const { +static nlohmann::json to_json(const LatencyMetrics& latenct_metrics) { nlohmann::json stat; - stat["data_shape"] = data_shape; - stat["latency_median"] = median_or_percentile; - stat["latency_average"] = avg; - stat["latency_min"] = min; - stat["latency_max"] = max; + stat["data_shape"] = latenct_metrics.data_shape; + stat["latency_median"] = latenct_metrics.median_or_percentile; + stat["latency_average"] = latenct_metrics.avg; + stat["latency_min"] = latenct_metrics.min; + stat["latency_max"] = latenct_metrics.max; return stat; } -void LatencyMetrics::fill_data(std::vector latencies, size_t percentile_boundary) { - if (latencies.empty()) { - throw std::logic_error("Latency metrics class expects non-empty vector of latencies at consturction."); - } - std::sort(latencies.begin(), latencies.end()); - min = latencies[0]; - avg = std::accumulate(latencies.begin(), latencies.end(), 0.0) / latencies.size(); - median_or_percentile = latencies[size_t(latencies.size() / 100.0 * percentile_boundary)]; - max = latencies.back(); -}; - std::string StatisticsVariant::to_string() const { switch (type) { case INT: @@ -412,7 +383,7 @@ void StatisticsVariant::write_to_json(nlohmann::json& js) const { if (arr.empty()) { arr = nlohmann::json::array(); } - arr.push_back(metrics_val.to_json()); + arr.push_back(to_json(metrics_val)); } break; default: throw std::invalid_argument("StatisticsVariant:: json conversion : invalid type is provided"); diff --git a/samples/cpp/benchmark_app/statistics_report.hpp b/samples/cpp/benchmark_app/statistics_report.hpp index 57d9f79f087..c178ee14f10 100644 --- a/samples/cpp/benchmark_app/statistics_report.hpp +++ b/samples/cpp/benchmark_app/statistics_report.hpp @@ -19,6 +19,7 @@ #include "samples/common.hpp" #include "samples/csv_dumper.hpp" #include "samples/slog.hpp" +#include "samples/latency_metrics.hpp" #include "utils.hpp" // clang-format on @@ -29,35 +30,6 @@ static constexpr char averageCntReport[] = "average_counters"; static constexpr char detailedCntReport[] = "detailed_counters"; static constexpr char sortDetailedCntReport[] = "sort_detailed_counters"; -/// @brief Responsible for calculating different latency metrics -class LatencyMetrics { -public: - LatencyMetrics() {} - - LatencyMetrics(const std::vector& latencies, - const std::string& data_shape = "", - size_t percentile_boundary = 50) - : data_shape(data_shape), - percentile_boundary(percentile_boundary) { - fill_data(latencies, percentile_boundary); - } - - void write_to_stream(std::ostream& stream) const; - void write_to_slog() const; - const nlohmann::json to_json() const; - -public: - double median_or_percentile = 0; - double avg = 0; - double min = 0; - double max = 0; - std::string data_shape; - -private: - void fill_data(std::vector latencies, size_t percentile_boundary); - size_t percentile_boundary = 50; -}; - class StatisticsVariant { public: enum Type { INT, DOUBLE, STRING, ULONGLONG, METRICS }; diff --git a/samples/cpp/benchmark_app/utils.hpp b/samples/cpp/benchmark_app/utils.hpp index 0c84832dd6c..6bb1eeabf61 100644 --- a/samples/cpp/benchmark_app/utils.hpp +++ b/samples/cpp/benchmark_app/utils.hpp @@ -27,12 +27,6 @@ inline double get_duration_ms_till_now(Time::time_point& startTime) { return std::chrono::duration_cast(Time::now() - startTime).count() * 0.000001; }; -inline std::string double_to_string(const double number) { - std::stringstream ss; - ss << std::fixed << std::setprecision(2) << number; - return ss.str(); -}; - namespace benchmark_app { struct InputInfo { ov::element::Type type; diff --git a/samples/cpp/classification_sample_async/README.md b/samples/cpp/classification_sample_async/README.md index d376a9f56be..3038a1e4291 100644 --- a/samples/cpp/classification_sample_async/README.md +++ b/samples/cpp/classification_sample_async/README.md @@ -67,10 +67,9 @@ Options: Available target devices: ``` -To run the sample, you need specify a model and image: - -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. > **NOTES**: > @@ -84,7 +83,7 @@ To run the sample, you need specify a model and image: 1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model using: diff --git a/samples/cpp/common/utils/include/samples/common.hpp b/samples/cpp/common/utils/include/samples/common.hpp index 033a3c9637b..6a4a701190d 100644 --- a/samples/cpp/common/utils/include/samples/common.hpp +++ b/samples/cpp/common/utils/include/samples/common.hpp @@ -1054,6 +1054,76 @@ static UNUSED void printPerformanceCounts(ov::InferRequest request, printPerformanceCounts(performanceMap, stream, deviceName, bshowHeader); } +static inline std::string double_to_string(const double number) { + std::stringstream ss; + ss << std::fixed << std::setprecision(2) << number; + return ss.str(); +} + +template +using uniformDistribution = typename std::conditional< + std::is_floating_point::value, + std::uniform_real_distribution, + typename std::conditional::value, std::uniform_int_distribution, void>::type>::type; + +template +static inline void fill_random(ov::Tensor& tensor, + T rand_min = std::numeric_limits::min(), + T rand_max = std::numeric_limits::max()) { + std::mt19937 gen(0); + size_t tensor_size = tensor.get_size(); + if (0 == tensor_size) { + throw std::runtime_error( + "Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference"); + } + T* data = tensor.data(); + uniformDistribution distribution(rand_min, rand_max); + for (size_t i = 0; i < tensor_size; i++) { + data[i] = static_cast(distribution(gen)); + } +} + +static inline void fill_tensor_random(ov::Tensor tensor) { + switch (tensor.get_element_type()) { + case ov::element::f32: + fill_random(tensor); + break; + case ov::element::f64: + fill_random(tensor); + break; + case ov::element::f16: + fill_random(tensor); + break; + case ov::element::i32: + fill_random(tensor); + break; + case ov::element::i64: + fill_random(tensor); + break; + case ov::element::u8: + // uniform_int_distribution is not allowed in the C++17 + // standard and vs2017/19 + fill_random(tensor); + break; + case ov::element::i8: + // uniform_int_distribution is not allowed in the C++17 standard + // and vs2017/19 + fill_random(tensor, std::numeric_limits::min(), std::numeric_limits::max()); + break; + case ov::element::u16: + fill_random(tensor); + break; + case ov::element::i16: + fill_random(tensor); + break; + case ov::element::boolean: + fill_random(tensor, 0, 1); + break; + default: + throw ov::Exception("Input type is not supported for a tensor"); + } +} + static UNUSED void printPerformanceCountsNoSort(std::vector performanceData, std::ostream& stream, std::string deviceName, diff --git a/samples/cpp/common/utils/include/samples/latency_metrics.hpp b/samples/cpp/common/utils/include/samples/latency_metrics.hpp new file mode 100644 index 00000000000..bca39d0a735 --- /dev/null +++ b/samples/cpp/common/utils/include/samples/latency_metrics.hpp @@ -0,0 +1,42 @@ +// Copyright (C) 2022 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +#pragma once + +#include +#include +#include +#include + +// clang-format off +#include "samples/common.hpp" +#include "samples/slog.hpp" +// clang-format on + +/// @brief Responsible for calculating different latency metrics +class LatencyMetrics { +public: + LatencyMetrics() {} + + LatencyMetrics(const std::vector& latencies, + const std::string& data_shape = "", + size_t percentile_boundary = 50) + : data_shape(data_shape), + percentile_boundary(percentile_boundary) { + fill_data(latencies, percentile_boundary); + } + + void write_to_stream(std::ostream& stream) const; + void write_to_slog() const; + + double median_or_percentile = 0; + double avg = 0; + double min = 0; + double max = 0; + std::string data_shape; + +private: + void fill_data(std::vector latencies, size_t percentile_boundary); + size_t percentile_boundary = 50; +}; diff --git a/samples/cpp/common/utils/src/latency_metrics.cpp b/samples/cpp/common/utils/src/latency_metrics.cpp new file mode 100644 index 00000000000..d4386f3a43c --- /dev/null +++ b/samples/cpp/common/utils/src/latency_metrics.cpp @@ -0,0 +1,42 @@ +// Copyright (C) 2018-2022 Intel Corporation +// SPDX-License-Identifier: Apache-2.0 +// + +// clang-format off +#include +#include +#include +#include +#include + +#include "samples/latency_metrics.hpp" +// clang-format on + +void LatencyMetrics::write_to_stream(std::ostream& stream) const { + std::ios::fmtflags fmt(std::cout.flags()); + stream << data_shape << ";" << std::fixed << std::setprecision(2) << median_or_percentile << ";" << avg << ";" + << min << ";" << max; + std::cout.flags(fmt); +} + +void LatencyMetrics::write_to_slog() const { + std::string percentileStr = (percentile_boundary == 50) + ? " Median: " + : " " + std::to_string(percentile_boundary) + " percentile: "; + + slog::info << percentileStr << double_to_string(median_or_percentile) << " ms" << slog::endl; + slog::info << " Average: " << double_to_string(avg) << " ms" << slog::endl; + slog::info << " Min: " << double_to_string(min) << " ms" << slog::endl; + slog::info << " Max: " << double_to_string(max) << " ms" << slog::endl; +} + +void LatencyMetrics::fill_data(std::vector latencies, size_t percentile_boundary) { + if (latencies.empty()) { + throw std::logic_error("Latency metrics class expects non-empty vector of latencies at consturction."); + } + std::sort(latencies.begin(), latencies.end()); + min = latencies[0]; + avg = std::accumulate(latencies.begin(), latencies.end(), 0.0) / latencies.size(); + median_or_percentile = latencies[size_t(latencies.size() / 100.0 * percentile_boundary)]; + max = latencies.back(); +}; diff --git a/samples/cpp/hello_classification/README.md b/samples/cpp/hello_classification/README.md index 2871d3a29ab..6c129904a7a 100644 --- a/samples/cpp/hello_classification/README.md +++ b/samples/cpp/hello_classification/README.md @@ -38,10 +38,9 @@ To build the sample, please use instructions available at [Build the Sample Appl hello_classification ``` -To run the sample, you need specify a model and image: - -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. > **NOTES**: > @@ -55,7 +54,7 @@ To run the sample, you need specify a model and image: 1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model using: diff --git a/samples/cpp/hello_nv12_input_classification/README.md b/samples/cpp/hello_nv12_input_classification/README.md index 89d966366de..18ddc547433 100644 --- a/samples/cpp/hello_nv12_input_classification/README.md +++ b/samples/cpp/hello_nv12_input_classification/README.md @@ -37,10 +37,9 @@ To build the sample, please use instructions available at [Build the Sample Appl hello_nv12_input_classification ``` -To run the sample, you need specify a model and image: - -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. The sample accepts an uncompressed image in the NV12 color format. To run the sample, you need to convert your BGR/RGB image to NV12. To do this, you can use one of the widely available tools such @@ -70,7 +69,7 @@ ffmpeg -i cat.jpg -pix_fmt nv12 car.yuv 1. Install openvino-dev python package if you don't have it to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model: diff --git a/samples/cpp/hello_reshape_ssd/README.md b/samples/cpp/hello_reshape_ssd/README.md index 19c35e83b05..d683f542fa0 100644 --- a/samples/cpp/hello_reshape_ssd/README.md +++ b/samples/cpp/hello_reshape_ssd/README.md @@ -39,10 +39,9 @@ To build the sample, please use instructions available at [Build the Sample Appl hello_reshape_ssd ``` -To run the sample, you need specify a model and image: - -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. > **NOTES**: > @@ -56,7 +55,7 @@ To run the sample, you need specify a model and image: 1. Install openvino-dev python package if you don't have it to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev ``` 2. Download a pre-trained model using: diff --git a/samples/python/benchmark/bert_benhcmark/README.md b/samples/python/benchmark/bert_benhcmark/README.md new file mode 100644 index 00000000000..cdfa94de3c2 --- /dev/null +++ b/samples/python/benchmark/bert_benhcmark/README.md @@ -0,0 +1,50 @@ +# Bert Benchmark Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_bert_benchmark_README} + +This sample demonstrates how to estimate performace of a Bert model using Asynchronous Inference Request API. Unlike [demos](@ref omz_demos) this sample doesn't have configurable command line arguments. Feel free to modify sample's source code to try out different options. + +The following Python\* API is used in the application: + +| Feature | API | Description | +| :--- | :--- | :--- | +| OpenVINO Runtime Version | [openvino.runtime.get_version] | Get Openvino API version | +| Basic Infer Flow | [openvino.runtime.Core], [openvino.runtime.Core.compile_model] | Common API to do inference: compile a model | +| Asynchronous Infer | [openvino.runtime.AsyncInferQueue], [openvino.runtime.AsyncInferQueue.start_async], [openvino.runtime.AsyncInferQueue.wait_all] | Do asynchronous inference | +| Model Operations | [openvino.runtime.CompiledModel.inputs] | Get inputs of a model | + +## How It Works + +The sample downloads a model and a tokenizer, export the model to onnx, reads the exported model and reshapes it to enforce dynamic inpus shapes, compiles the resulting model, downloads a dataset and runs benhcmarking on the dataset. + +You can see the explicit description of +each sample step at [Integration Steps](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) section of "Integrate OpenVINO™ Runtime with Your Application" guide. + +## Running + +Install the `openvino` Python package: + +``` +python -m pip install openvino +``` + +Install packages from `requirements.txt`: + +``` +python -m pip install -r requirements.txt +``` + +Run the sample + +``` +python bert_benhcmark.py +``` + +## Sample Output + +The sample outputs how long it takes to process a dataset. + +## See Also + +- [Integrate the OpenVINO™ Runtime with Your Application](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) +- [Using OpenVINO™ Toolkit Samples](../../../../docs/OV_Runtime_UG/Samples_Overview.md) +- [Model Downloader](@ref omz_tools_downloader) +- [Model Optimizer](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/samples/python/benchmark/bert_benhcmark/bert_benhcmark.py b/samples/python/benchmark/bert_benhcmark/bert_benhcmark.py new file mode 100755 index 00000000000..eb50fb7bb52 --- /dev/null +++ b/samples/python/benchmark/bert_benhcmark/bert_benhcmark.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import logging as log +from pathlib import Path +import sys +import tempfile +from time import perf_counter + +import datasets +from openvino.runtime import Core, get_version, AsyncInferQueue, PartialShape +from transformers import AutoTokenizer +from transformers.onnx import export +from transformers.onnx.features import FeaturesManager + + +def main(): + log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout) + log.info('OpenVINO:') + log.info(f"{'Build ':.<39} {get_version()}") + model_name = 'bert-base-uncased' + # Download the model + transformers_model = FeaturesManager.get_model_from_feature('default', model_name) + _, model_onnx_config = FeaturesManager.check_supported_model_or_raise(transformers_model, feature='default') + onnx_config = model_onnx_config(transformers_model.config) + # Download the tokenizer + tokenizer = AutoTokenizer.from_pretrained(model_name) + + core = Core() + + with tempfile.TemporaryDirectory() as tmp: + onnx_path = Path(tmp) / f'{model_name}.onnx' + # Export .onnx + export(tokenizer, transformers_model, onnx_config, onnx_config.default_onnx_opset, onnx_path) + # Read .onnx with OpenVINO + model = core.read_model(onnx_path) + + # Enforce dynamic input shape + try: + model.reshape({model_input.any_name: PartialShape([1, '?']) for model_input in model.inputs}) + except RuntimeError: + log.error("Can't set dynamic shape") + raise + # Optimize for throughput. Best throughput can be reached by + # running multiple openvino.runtime.InferRequest instances asyncronously + tput = {'PERFORMANCE_HINT': 'THROUGHPUT'} + # Pick a device by replacing CPU, for example MULTI:CPU(4),GPU(8). + # It is possible to set CUMULATIVE_THROUGHPUT as PERFORMANCE_HINT for AUTO device + compiled_model = core.compile_model(model, 'CPU', tput) + # AsyncInferQueue creates optimal number of InferRequest instances + ireqs = AsyncInferQueue(compiled_model) + + sst2 = datasets.load_dataset('glue', 'sst2') + sst2_sentences = sst2['validation']['sentence'] + # Warm up + encoded_warm_up = dict(tokenizer('Warm up sentence is here.', return_tensors='np')) + for _ in ireqs: + ireqs.start_async(encoded_warm_up) + ireqs.wait_all() + # Benchmark + sum_seq_len = 0 + start = perf_counter() + for sentence in sst2_sentences: + encoded = dict(tokenizer(sentence, return_tensors='np')) + sum_seq_len += next(iter(encoded.values())).size # get sequence length to compute average length + ireqs.start_async(encoded) + ireqs.wait_all() + end = perf_counter() + duration = end - start + log.info(f'Average sequence length: {sum_seq_len / len(sst2_sentences):.2f}') + log.info(f'Average processing time: {duration / len(sst2_sentences) * 1e3:.2f} ms') + log.info(f'Duration: {duration:.2f} seconds') + + +if __name__ == '__main__': + main() diff --git a/samples/python/benchmark/bert_benhcmark/requirements.txt b/samples/python/benchmark/bert_benhcmark/requirements.txt new file mode 100644 index 00000000000..0230e74d04e --- /dev/null +++ b/samples/python/benchmark/bert_benhcmark/requirements.txt @@ -0,0 +1,2 @@ +transformers[onnx] +torch diff --git a/samples/python/benchmark/sync_benchmark/README.md b/samples/python/benchmark/sync_benchmark/README.md new file mode 100644 index 00000000000..3d4bda76e8c --- /dev/null +++ b/samples/python/benchmark/sync_benchmark/README.md @@ -0,0 +1,92 @@ +# Sync Benchmark Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_sync_benchmark_README} + +This sample demonstrates how to estimate performace of a model using Synchronous Inference Request API. It makes sence to use synchronous inference only in latency oriented scenarios. Models with static input shapes are supported. Unlike [demos](@ref omz_demos) this sample doesn't have other configurable command line arguments. Feel free to modify sample's source code to try out different options. + +The following Python\* API is used in the application: + +| Feature | API | Description | +| :--- | :--- | :--- | +| OpenVINO Runtime Version | [openvino.runtime.get_version] | Get Openvino API version | +| Basic Infer Flow | [openvino.runtime.Core], [openvino.runtime.Core.compile_mode], [openvino.runtime.InferRequest.get_tensor] | Common API to do inference: compile a model, configure input tensors | +| Synchronous Infer | [openvino.runtime.InferRequest.infer] | Do synchronous inference | +| Model Operations | [openvino.runtime.CompiledModel.inputs] | Get inputs of a model | +| Tensor Operations | [openvino.runtime.Tensor.get_shape], [openvino.runtime.Tensor.data] | Get a tensor shape and its data. | + +| Options | Values | +| :--- | :--- | +| Validated Models | [alexnet](@ref omz_models_model_alexnet), [googlenet-v1](@ref omz_models_model_googlenet_v1) [yolo-v3-tf](@ref omz_models_model_yolo_v3_tf), [face-detection-0200](@ref omz_models_model_face_detection_0200) | +| Model Format | OpenVINO™ toolkit Intermediate Representation (\*.xml + \*.bin), ONNX (\*.onnx) | +| Supported devices | [All](../../../../docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md) | +| Other language realization | [C++](../../../cpp/benchmark/sync_benchmark/README.md) | + +## How It Works + +The sample compiles a model for a given device, randomly generates input data, performs synchronous inference multiple times for a given number of seconds. Then processes and reports performance results. + +You can see the explicit description of +each sample step at [Integration Steps](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) section of "Integrate OpenVINO™ Runtime with Your Application" guide. + +## Running + +``` +python sync_benchmark.py +``` + +To run the sample, you need to specify a model: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). + +> **NOTES**: +> +> - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> +> - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. + +### Example + +1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: + +``` +python -m pip install openvino-dev[caffe] +``` + +2. Download a pre-trained model using: + +``` +omz_downloader --name googlenet-v1 +``` + +3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter: + +``` +omz_converter --name googlenet-v1 +``` + +4. Perform benchmarking using the `googlenet-v1` model on a `CPU`: + +``` +python sync_benchmark.py googlenet-v1.xml +``` + +## Sample Output + +The application outputs performance results. + +``` +[ INFO ] OpenVINO: +[ INFO ] Build ................................. +[ INFO ] Count: 2333 iterations +[ INFO ] Duration: 10003.59 ms +[ INFO ] Latency: +[ INFO ] Median: 3.90 ms +[ INFO ] Average: 4.29 ms +[ INFO ] Min: 3.30 ms +[ INFO ] Max: 10.11 ms +[ INFO ] Throughput: 233.22 FPS +``` + +## See Also + +- [Integrate the OpenVINO™ Runtime with Your Application](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) +- [Using OpenVINO™ Toolkit Samples](../../../../docs/OV_Runtime_UG/Samples_Overview.md) +- [Model Downloader](@ref omz_tools_downloader) +- [Model Optimizer](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/samples/python/benchmark/sync_benchmark/sync_benchmark.py b/samples/python/benchmark/sync_benchmark/sync_benchmark.py new file mode 100755 index 00000000000..6ab7df196bd --- /dev/null +++ b/samples/python/benchmark/sync_benchmark/sync_benchmark.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import logging as log +import statistics +import sys +from time import perf_counter + +import numpy as np +from openvino.runtime import Core, get_version +from openvino.runtime.utils.types import get_dtype + + +def fill_tensor_random(tensor): + dtype = get_dtype(tensor.element_type) + rand_min, rand_max = (0, 1) if dtype == bool else (np.iinfo(np.uint8).min, np.iinfo(np.uint8).max) + # np.random.uniform excludes high: add 1 to have it generated + if np.dtype(dtype).kind in ['i', 'u', 'b']: + rand_max += 1 + rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0))) + if 0 == tensor.get_size(): + raise RuntimeError("Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference") + tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype) + + +def main(): + log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout) + log.info('OpenVINO:') + log.info(f"{'Build ':.<39} {get_version()}") + if len(sys.argv) != 2: + log.info(f'Usage: {sys.argv[0]} ') + return 1 + # Optimize for latency. Most of the devices are configured for latency by default, + # but there are exceptions like MYRIAD + latency = {'PERFORMANCE_HINT': 'LATENCY'} + + # Create Core and use it to compile a model. + # Pick a device by replacing CPU, for example AUTO:GPU,CPU. + # Using MULTI device is pointless in sync scenario + # because only one instance of openvino.runtime.InferRequest is used + core = Core() + compiled_model = core.compile_model(sys.argv[1], 'CPU', latency) + ireq = compiled_model.create_infer_request() + # Fill input data for the ireq + for model_input in compiled_model.inputs: + fill_tensor_random(ireq.get_tensor(model_input)) + # Warm up + ireq.infer() + # Benchmark for seconds_to_run seconds and at least niter iterations + seconds_to_run = 10 + niter = 10 + latencies = [] + start = perf_counter() + time_point = start + time_point_to_finish = start + seconds_to_run + while time_point < time_point_to_finish or len(latencies) < niter: + ireq.infer() + iter_end = perf_counter() + latencies.append((iter_end - time_point) * 1e3) + time_point = iter_end + end = time_point + duration = end - start + # Report results + fps = len(latencies) / duration + log.info(f'Count: {len(latencies)} iterations') + log.info(f'Duration: {duration * 1e3:.2f} ms') + log.info('Latency:') + log.info(f' Median: {statistics.median(latencies):.2f} ms') + log.info(f' Average: {sum(latencies) / len(latencies):.2f} ms') + log.info(f' Min: {min(latencies):.2f} ms') + log.info(f' Max: {max(latencies):.2f} ms') + log.info(f'Throughput: {fps:.2f} FPS') + + +if __name__ == '__main__': + main() diff --git a/samples/python/benchmark/throughput_benchmark/README.md b/samples/python/benchmark/throughput_benchmark/README.md new file mode 100644 index 00000000000..8e104248e49 --- /dev/null +++ b/samples/python/benchmark/throughput_benchmark/README.md @@ -0,0 +1,94 @@ +# Throughput Benchmark Python* Sample {#openvino_inference_engine_ie_bridges_python_sample_throughput_benchmark_README} + +This sample demonstrates how to estimate performace of a model using Asynchronous Inference Request API in throughput mode. Unlike [demos](@ref omz_demos) this sample doesn't have other configurable command line arguments. Feel free to modify sample's source code to try out different options. + +The reported results may deviate from what [benchmark_app](../../../../tools/benchmark_tool/README.md) reports. One example is model input precision for computer vision tasks. benchmark_app sets uint8, while the sample uses default model precision which is usually float32. + +The following Python\* API is used in the application: + +| Feature | API | Description | +| :--- | :--- | :--- | +| OpenVINO Runtime Version | [openvino.runtime.get_version] | Get Openvino API version | +| Basic Infer Flow | [openvino.runtime.Core], [openvino.runtime.Core.compile_model], [openvino.runtime.InferRequest.get_tensor] | Common API to do inference: compile a model, configure input tensors | +| Asynchronous Infer | [openvino.runtime.AsyncInferQueue], [openvino.runtime.AsyncInferQueue.start_async], [openvino.runtime.AsyncInferQueue.wait_all], [openvino.runtime.InferRequest.results] | Do asynchronous inference | +| Model Operations | [openvino.runtime.CompiledModel.inputs] | Get inputs of a model | +| Tensor Operations | [openvino.runtime.Tensor.get_shape], [openvino.runtime.Tensor.data] | Get a tensor shape and its data. | + +| Options | Values | +| :--- | :--- | +| Validated Models | [alexnet](@ref omz_models_model_alexnet), [googlenet-v1](@ref omz_models_model_googlenet_v1) [yolo-v3-tf](@ref omz_models_model_yolo_v3_tf), [face-detection-0200](@ref omz_models_model_face_detection_0200) | +| Model Format | OpenVINO™ toolkit Intermediate Representation (\*.xml + \*.bin), ONNX (\*.onnx) | +| Supported devices | [All](../../../../docs/OV_Runtime_UG/supported_plugins/Supported_Devices.md) | +| Other language realization | [C++](../../../cpp/benchmark/throughput_benchmark/README.md) | + +## How It Works + +The sample compiles a model for a given device, randomly generates input data, performs asynchronous inference multiple times for a given number of seconds. Then processes and reports performance results. + +You can see the explicit description of +each sample step at [Integration Steps](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) section of "Integrate OpenVINO™ Runtime with Your Application" guide. + +## Running + +``` +python throughput_benchmark.py +``` + +To run the sample, you need to specify a model: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). + +> **NOTES**: +> +> - Before running the sample with a trained model, make sure the model is converted to the intermediate representation (IR) format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). +> +> - The sample accepts models in ONNX format (.onnx) that do not require preprocessing. + +### Example + +1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: + +``` +python -m pip install openvino-dev[caffe] +``` + +2. Download a pre-trained model using: + +``` +omz_downloader --name googlenet-v1 +``` + +3. If a model is not in the IR or ONNX format, it must be converted. You can do this using the model converter: + +``` +omz_converter --name googlenet-v1 +``` + +4. Perform benchmarking using the `googlenet-v1` model on a `CPU`: + +``` +python throughput_benchmark.py googlenet-v1.xml +``` + +## Sample Output + +The application outputs performance results. + +``` +[ INFO ] OpenVINO: +[ INFO ] Build ................................. +[ INFO ] Count: 2817 iterations +[ INFO ] Duration: 10012.65 ms +[ INFO ] Latency: +[ INFO ] Median: 13.80 ms +[ INFO ] Average: 14.10 ms +[ INFO ] Min: 8.35 ms +[ INFO ] Max: 28.38 ms +[ INFO ] Throughput: 281.34 FPS +``` + +## See Also + +- [Integrate the OpenVINO™ Runtime with Your Application](../../../../docs/OV_Runtime_UG/integrate_with_your_application.md) +- [Using OpenVINO™ Toolkit Samples](../../../../docs/OV_Runtime_UG/Samples_Overview.md) +- [Model Downloader](@ref omz_tools_downloader) +- [Model Optimizer](../../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) diff --git a/samples/python/benchmark/throughput_benchmark/throughput_benchmark.py b/samples/python/benchmark/throughput_benchmark/throughput_benchmark.py new file mode 100755 index 00000000000..3867e785b31 --- /dev/null +++ b/samples/python/benchmark/throughput_benchmark/throughput_benchmark.py @@ -0,0 +1,85 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# Copyright (C) 2022 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import logging as log +import sys +import statistics +from time import perf_counter + +import numpy as np +from openvino.runtime import Core, get_version, AsyncInferQueue +from openvino.runtime.utils.types import get_dtype + + +def fill_tensor_random(tensor): + dtype = get_dtype(tensor.element_type) + rand_min, rand_max = (0, 1) if dtype == bool else (np.iinfo(np.uint8).min, np.iinfo(np.uint8).max) + # np.random.uniform excludes high: add 1 to have it generated + if np.dtype(dtype).kind in ['i', 'u', 'b']: + rand_max += 1 + rs = np.random.RandomState(np.random.MT19937(np.random.SeedSequence(0))) + if 0 == tensor.get_size(): + raise RuntimeError("Models with dynamic shapes aren't supported. Input tensors must have specific shapes before inference") + tensor.data[:] = rs.uniform(rand_min, rand_max, list(tensor.shape)).astype(dtype) + + +def main(): + log.basicConfig(format='[ %(levelname)s ] %(message)s', level=log.INFO, stream=sys.stdout) + log.info('OpenVINO:') + log.info(f"{'Build ':.<39} {get_version()}") + if len(sys.argv) != 2: + log.info(f'Usage: {sys.argv[0]} ') + return 1 + # Optimize for throughput. Best throughput can be reached by + # running multiple openvino.runtime.InferRequest instances asyncronously + tput = {'PERFORMANCE_HINT': 'THROUGHPUT'} + + # Create Core and use it to compile a model. + # Pick a device by replacing CPU, for example MULTI:CPU(4),GPU(8). + # It is possible to set CUMULATIVE_THROUGHPUT as PERFORMANCE_HINT for AUTO device + core = Core() + compiled_model = core.compile_model(sys.argv[1], 'CPU', tput) + # AsyncInferQueue creates optimal number of InferRequest instances + ireqs = AsyncInferQueue(compiled_model) + # Fill input data for ireqs + for ireq in ireqs: + for model_input in compiled_model.inputs: + fill_tensor_random(ireq.get_tensor(model_input)) + # Warm up + for _ in ireqs: + ireqs.start_async() + ireqs.wait_all() + # Benchmark for seconds_to_run seconds and at least niter iterations + seconds_to_run = 10 + niter = 10 + latencies = [] + in_fly = set() + start = perf_counter() + time_point_to_finish = start + seconds_to_run + while perf_counter() < time_point_to_finish or len(latencies) + len(in_fly) < niter: + idle_id = ireqs.get_idle_request_id() + if idle_id in in_fly: + latencies.append(ireqs[idle_id].latency) + else: + in_fly.add(idle_id) + ireqs.start_async() + ireqs.wait_all() + duration = perf_counter() - start + for infer_request_id in in_fly: + latencies.append(ireqs[infer_request_id].latency) + # Report results + fps = len(latencies) / duration + log.info(f'Count: {len(latencies)} iterations') + log.info(f'Duration: {duration * 1e3:.2f} ms') + log.info('Latency:') + log.info(f' Median: {statistics.median(latencies):.2f} ms') + log.info(f' Average: {sum(latencies) / len(latencies):.2f} ms') + log.info(f' Min: {min(latencies):.2f} ms') + log.info(f' Max: {max(latencies):.2f} ms') + log.info(f'Throughput: {fps:.2f} FPS') + + +if __name__ == '__main__': + main() diff --git a/samples/python/classification_sample_async/README.md b/samples/python/classification_sample_async/README.md index bea99b663d8..3d71ca13dfb 100644 --- a/samples/python/classification_sample_async/README.md +++ b/samples/python/classification_sample_async/README.md @@ -70,7 +70,7 @@ To run the sample, you need specify a model and image: 1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model: diff --git a/samples/python/requirements.txt b/samples/python/classification_sample_async/requirements.txt similarity index 60% rename from samples/python/requirements.txt rename to samples/python/classification_sample_async/requirements.txt index 55689907c7a..2d3d6d182d4 100644 --- a/samples/python/requirements.txt +++ b/samples/python/classification_sample_async/requirements.txt @@ -1,2 +1 @@ opencv-python==4.5.* -numpy>=1.16.6 diff --git a/samples/python/hello_classification/README.md b/samples/python/hello_classification/README.md index f28cd18c610..bf3a8796c1f 100644 --- a/samples/python/hello_classification/README.md +++ b/samples/python/hello_classification/README.md @@ -32,9 +32,9 @@ each sample step at [Integration Steps](../../../docs/OV_Runtime_UG/integrate_wi python hello_classification.py ``` -To run the sample, you need specify a model and image: -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. > **NOTES**: > @@ -48,7 +48,7 @@ To run the sample, you need specify a model and image: 1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model: diff --git a/samples/python/hello_classification/requirements.txt b/samples/python/hello_classification/requirements.txt new file mode 100644 index 00000000000..2d3d6d182d4 --- /dev/null +++ b/samples/python/hello_classification/requirements.txt @@ -0,0 +1 @@ +opencv-python==4.5.* diff --git a/samples/python/hello_reshape_ssd/README.md b/samples/python/hello_reshape_ssd/README.md index 0263ce0a2e1..3c7a74990fb 100644 --- a/samples/python/hello_reshape_ssd/README.md +++ b/samples/python/hello_reshape_ssd/README.md @@ -33,9 +33,9 @@ each sample step at [Integration Steps](../../../docs/OV_Runtime_UG/integrate_wi python hello_reshape_ssd.py ``` -To run the sample, you need specify a model and image: -- you can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). -- you can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. +To run the sample, you need to specify a model and image: +- You can use [public](@ref omz_models_group_public) or [Intel's](@ref omz_models_group_intel) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader). +- You can use images from the media files collection available at https://storage.openvinotoolkit.org/data/test_data. > **NOTES**: > @@ -49,7 +49,7 @@ To run the sample, you need specify a model and image: 1. Install the `openvino-dev` Python package to use Open Model Zoo Tools: ``` - python -m pip install openvino-dev[caffe,onnx,tensorflow2,pytorch,mxnet] + python -m pip install openvino-dev[caffe] ``` 2. Download a pre-trained model: diff --git a/samples/python/hello_reshape_ssd/requirements.txt b/samples/python/hello_reshape_ssd/requirements.txt new file mode 100644 index 00000000000..2d3d6d182d4 --- /dev/null +++ b/samples/python/hello_reshape_ssd/requirements.txt @@ -0,0 +1 @@ +opencv-python==4.5.* diff --git a/src/inference/include/openvino/runtime/properties.hpp b/src/inference/include/openvino/runtime/properties.hpp index 52b38e7c974..1522d4305e6 100644 --- a/src/inference/include/openvino/runtime/properties.hpp +++ b/src/inference/include/openvino/runtime/properties.hpp @@ -295,7 +295,7 @@ enum class PerformanceMode { UNDEFINED = -1, //!< Undefined value, performance setting may vary from device to device LATENCY = 1, //!< Optimize for latency THROUGHPUT = 2, //!< Optimize for throughput - CUMULATIVE_THROUGHPUT = 3, //!< Optimize for cumulative throughput + CUMULATIVE_THROUGHPUT = 3, //!< Optimize for cumulative throughput }; /** @cond INTERNAL */ diff --git a/tools/benchmark_tool/openvino/tools/benchmark/benchmark.py b/tools/benchmark_tool/openvino/tools/benchmark/benchmark.py index 2d4ce364803..8a4ecfbabbb 100644 --- a/tools/benchmark_tool/openvino/tools/benchmark/benchmark.py +++ b/tools/benchmark_tool/openvino/tools/benchmark/benchmark.py @@ -40,7 +40,7 @@ class Benchmark: def print_version_info(self) -> None: version = get_version() - logger.info("OpenVINO:") + logger.info('OpenVINO:') logger.info(f"{'Build ':.<39} {version}") logger.info("")