diff --git a/inference-engine/samples/benchmark_app/README.md b/inference-engine/samples/benchmark_app/README.md
index 3bba703c68b..0f02101720a 100644
--- a/inference-engine/samples/benchmark_app/README.md
+++ b/inference-engine/samples/benchmark_app/README.md
@@ -5,7 +5,7 @@ This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learn
> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](../../tools/benchmark_tool/README.md).
> **TIP**: You also can work with the Benchmark Tool inside the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench).
-> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is a platform built upon OpenVINO™ and provides a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare
+> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is a platform built upon OpenVINO™ and provides a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare
> performance of deep learning models on various Intel® architecture
> configurations. In the DL Workbench, you can use most of OpenVINO™ toolkit components.
>
@@ -75,11 +75,11 @@ benchmark_app [OPTION]
Options:
-h, --help Print a usage message
- -m "" Required. Path to an .xml/.onnx/.prototxt file with a trained model or to a .blob files with a trained compiled model.
+ -m "" Required. Path to an .xml/.onnx/.prototxt file with a trained model or to a .blob files with a trained compiled model.
-i "" Optional. Path to a folder with images and/or binaries or to specific image or binary file.
-d "" Optional. Specify a target device to infer on (the list of available devices is shown below). Default value is CPU.
Use "-d HETERO:" format to specify HETERO plugin.
- Use "-d MULTI:" format to specify MULTI plugin.
+ Use "-d MULTI:" format to specify MULTI plugin.
The application looks for a suitable plugin for the specified device.
-l "" Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
Or
@@ -92,14 +92,15 @@ Options:
-t Optional. Time, in seconds, to execute topology.
-progress Optional. Show progress bar (can affect performance measurement). Default values is "false".
-shape Optional. Set shape for input. For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size.
+ -layout Optional. Prompts how network layouts should be treated by application. For example, "input1[NCHW],input2[NC]" or "[NCHW]" in case of one input size.
CPU-specific performance options:
-nstreams "" Optional. Number of streams to use for inference on the CPU, GPU or MYRIAD devices
(for HETERO and MULTI device cases use format :,: or just ).
- Default value is determined automatically for a device.
- Please note that although the automatic selection usually provides a reasonable performance,
+ Default value is determined automatically for a device.
+ Please note that although the automatic selection usually provides a reasonable performance,
it still may be non-optimal for some cases, especially for very small networks.
- Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency
+ Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency
estimations the number of streams should be set to 1.
-nthreads "" Optional. Number of threads to use for inference on the CPU (including HETERO and MULTI cases).
-enforcebf16 Optional. Enforcing of floating point operations execution in bfloat16 precision on platforms with native bfloat16 support. By default, this key sets "true" on platforms with native bfloat16 support and "false" for other platforms. Use "-enforcebf16=false" to disable this feature.
@@ -125,12 +126,12 @@ If a model has mixed input types, input folder should contain all required files
To run the tool, you can use [public](@ref omz_models_public_index) or [Intel's](@ref omz_models_intel_index) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader_README).
> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](../../../docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
->
+>
> The sample accepts models in ONNX format (.onnx) that do not require preprocessing.
## Examples of Running the Tool
-This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or FPGA devices. As an input, the `car.png` file from the `/deployment_tools/demo/` directory is used.
+This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or FPGA devices. As an input, the `car.png` file from the `/deployment_tools/demo/` directory is used.
> **NOTE:** The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment.
@@ -147,9 +148,9 @@ This section provides step-by-step instructions on how to run the Benchmark Tool
```
```sh
python3 mo.py --input_model /public/googlenet-v1/googlenet-v1.caffemodel --data_type FP32 --output_dir
- ```
+ ```
3. Run the tool with specifying the `/deployment_tools/demo/car.png` file as an input image, the IR of the `googlenet-v1` model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and FPGA devices:
-
+
* On CPU:
```sh
./benchmark_app -m /googlenet-v1.xml -i /deployment_tools/demo/car.png -d CPU -api async --progress true
@@ -162,7 +163,7 @@ This section provides step-by-step instructions on how to run the Benchmark Tool
The application outputs the number of executed iterations, total duration of execution, latency, and throughput.
Additionally, if you set the `-report_type` parameter, the application outputs statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds.
-Below are fragments of sample output for CPU and FPGA devices:
+Below are fragments of sample output for CPU and FPGA devices:
* For CPU:
```
diff --git a/inference-engine/samples/benchmark_app/benchmark_app.hpp b/inference-engine/samples/benchmark_app/benchmark_app.hpp
index 7f6c1c67e2d..9ac25bb82a1 100644
--- a/inference-engine/samples/benchmark_app/benchmark_app.hpp
+++ b/inference-engine/samples/benchmark_app/benchmark_app.hpp
@@ -1,4 +1,4 @@
-// Copyright (C) 2018-2020 Intel Corporation
+// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
@@ -102,6 +102,9 @@ static const char dump_config_message[] = "Optional. Path to XML/YAML/JSON file
static const char shape_message[] = "Optional. Set shape for input. For example, \"input1[1,3,224,224],input2[1,4]\" or \"[1,3,224,224]\""
" in case of one input size.";
+static const char layout_message[] = "Optional. Prompts how network layouts should be treated by application. "
+ "For example, \"input1[NCHW],input2[NC]\" or \"[NCHW]\" in case of one input size.";
+
// @brief message for quantization bits
static const char gna_qb_message[] = "Optional. Weight bits for quantization: 8 or 16 (default)";
@@ -189,6 +192,9 @@ DEFINE_string(dump_config, "", dump_config_message);
/// @brief Define flag for input shape
DEFINE_string(shape, "", shape_message);
+/// @brief Define flag for layout shape
+DEFINE_string(layout, "", layout_message);
+
/// @brief Define flag for quantization bits (default 16)
DEFINE_int32(qb, 16, gna_qb_message);
@@ -215,6 +221,7 @@ static void showUsage() {
std::cout << " -t " << execution_time_message << std::endl;
std::cout << " -progress " << progress_message << std::endl;
std::cout << " -shape " << shape_message << std::endl;
+ std::cout << " -layout " << layout_message << std::endl;
std::cout << std::endl << " device-specific performance options:" << std::endl;
std::cout << " -nstreams \"\" " << infer_num_streams_message << std::endl;
std::cout << " -nthreads \"\" " << infer_num_threads_message << std::endl;
diff --git a/inference-engine/samples/benchmark_app/inputs_filling.cpp b/inference-engine/samples/benchmark_app/inputs_filling.cpp
index f3a66b1db58..4742ddb361c 100644
--- a/inference-engine/samples/benchmark_app/inputs_filling.cpp
+++ b/inference-engine/samples/benchmark_app/inputs_filling.cpp
@@ -1,4 +1,4 @@
-// Copyright (C) 2018-2020 Intel Corporation
+// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
@@ -48,7 +48,7 @@ std::vector filterFilesByExtensions(const std::vector&
void fillBlobImage(Blob::Ptr& inputBlob,
const std::vector& filePaths,
const size_t& batchSize,
- const InputInfo& info,
+ const benchmark_app::InputInfo& app_info,
const size_t& requestId,
const size_t& inputId,
const size_t& inputSize) {
@@ -60,7 +60,6 @@ void fillBlobImage(Blob::Ptr& inputBlob,
// locked memory holder should be alive all time while access to its buffer happens
auto minputHolder = minput->wmap();
auto inputBlobData = minputHolder.as();
- const TensorDesc& inputBlobDesc = inputBlob->getTensorDesc();
/** Collect images data ptrs **/
std::vector> vreader;
@@ -77,24 +76,30 @@ void fillBlobImage(Blob::Ptr& inputBlob,
}
/** Getting image data **/
- TensorDesc desc = info.getTensorDesc();
- std::shared_ptr imageData(reader->getData(getTensorWidth(desc), getTensorHeight(desc)));
+ std::shared_ptr imageData(reader->getData(app_info.width(), app_info.height()));
if (imageData) {
vreader.push_back(imageData);
}
}
/** Fill input tensor with images. First b channel, then g and r channels **/
- const size_t numChannels = getTensorChannels(inputBlobDesc);
- const size_t imageSize = getTensorWidth(inputBlobDesc) * getTensorHeight(inputBlobDesc);
+ const size_t numChannels = app_info.channels();
+ const size_t width = app_info.width();
+ const size_t height = app_info.height();
/** Iterate over all input images **/
for (size_t imageId = 0; imageId < vreader.size(); ++imageId) {
- /** Iterate over all pixel in image (b,g,r) **/
- for (size_t pid = 0; pid < imageSize; pid++) {
- /** Iterate over all channels **/
- for (size_t ch = 0; ch < numChannels; ++ch) {
- /** [images stride + channels stride + pixel id ] all in bytes **/
- inputBlobData[imageId * imageSize * numChannels + ch * imageSize + pid] = vreader.at(imageId).get()[pid*numChannels + ch];
+ /** Iterate over all width **/
+ for (size_t w = 0; w < app_info.width(); ++w) {
+ /** Iterate over all height **/
+ for (size_t h = 0; h < app_info.height(); ++h) {
+ /** Iterate over all channels **/
+ for (size_t ch = 0; ch < numChannels; ++ch) {
+ /** [images stride + channels stride + pixel id ] all in bytes **/
+ size_t offset = imageId * numChannels * width * height +
+ (((app_info.layout == "NCHW") || (app_info.layout == "CHW")) ?
+ (ch * width * height + h * width + w) : (h * width * numChannels + w * numChannels + ch));
+ inputBlobData[offset] = vreader.at(imageId).get()[h * width * numChannels + w * numChannels + ch];
+ }
}
}
}
@@ -185,24 +190,23 @@ void fillBlobImInfo(Blob::Ptr& inputBlob,
void fillBlobs(const std::vector& inputFiles,
const size_t& batchSize,
- const InferenceEngine::ConstInputsDataMap& info,
+ benchmark_app::InputsInfo& app_inputs_info,
std::vector requests) {
std::vector> input_image_sizes;
- for (const ConstInputsDataMap::value_type& item : info) {
- if (isImage(item.second)) {
- input_image_sizes.push_back(std::make_pair(getTensorWidth(item.second->getTensorDesc()),
- getTensorHeight(item.second->getTensorDesc())));
+ for (auto& item : app_inputs_info) {
+ if (item.second.isImage()) {
+ input_image_sizes.push_back(std::make_pair(item.second.width(), item.second.height()));
}
- slog::info << "Network input '" << item.first << "' precision " << item.second->getTensorDesc().getPrecision()
- << ", dimensions (" << item.second->getTensorDesc().getLayout() << "): ";
- for (const auto& i : item.second->getTensorDesc().getDims()) {
+ slog::info << "Network input '" << item.first << "' precision " << item.second.precision
+ << ", dimensions (" << item.second.layout << "): ";
+ for (const auto& i : item.second.shape) {
slog::info << i << " ";
}
slog::info << slog::endl;
}
size_t imageInputCount = input_image_sizes.size();
- size_t binaryInputCount = info.size() - imageInputCount;
+ size_t binaryInputCount = app_inputs_info.size() - imageInputCount;
std::vector binaryFiles;
std::vector imageFiles;
@@ -258,26 +262,28 @@ void fillBlobs(const std::vector& inputFiles,
size_t imageInputId = 0;
size_t binaryInputId = 0;
- for (const ConstInputsDataMap::value_type& item : info) {
+ for (auto& item : app_inputs_info) {
Blob::Ptr inputBlob = requests.at(requestId)->getBlob(item.first);
- if (isImage(inputBlob)) {
+ auto app_info = app_inputs_info.at(item.first);
+ auto precision = app_info.precision;
+ if (app_info.isImage()) {
if (!imageFiles.empty()) {
// Fill with Images
- fillBlobImage(inputBlob, imageFiles, batchSize, *item.second, requestId, imageInputId++, imageInputCount);
+ fillBlobImage(inputBlob, imageFiles, batchSize, app_info, requestId, imageInputId++, imageInputCount);
continue;
}
} else {
if (!binaryFiles.empty()) {
// Fill with binary files
- if (item.second->getPrecision() == InferenceEngine::Precision::FP32) {
+ if (precision == InferenceEngine::Precision::FP32) {
fillBlobBinary(inputBlob, binaryFiles, batchSize, requestId, binaryInputId++, binaryInputCount);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::FP16) {
+ } else if (precision == InferenceEngine::Precision::FP16) {
fillBlobBinary(inputBlob, binaryFiles, batchSize, requestId, binaryInputId++, binaryInputCount);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I32) {
+ } else if (precision == InferenceEngine::Precision::I32) {
fillBlobBinary(inputBlob, binaryFiles, batchSize, requestId, binaryInputId++, binaryInputCount);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I64) {
+ } else if (precision == InferenceEngine::Precision::I64) {
fillBlobBinary(inputBlob, binaryFiles, batchSize, requestId, binaryInputId++, binaryInputCount);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::U8) {
+ } else if (precision == InferenceEngine::Precision::U8) {
fillBlobBinary(inputBlob, binaryFiles, batchSize, requestId, binaryInputId++, binaryInputCount);
} else {
THROW_IE_EXCEPTION << "Input precision is not supported for " << item.first;
@@ -285,18 +291,18 @@ void fillBlobs(const std::vector& inputFiles,
continue;
}
- if (isImageInfo(inputBlob) && (input_image_sizes.size() == 1)) {
+ if (app_info.isImageInfo() && (input_image_sizes.size() == 1)) {
// Most likely it is image info: fill with image information
auto image_size = input_image_sizes.at(0);
slog::info << "Fill input '" << item.first << "' with image size " << image_size.first << "x"
<< image_size.second << slog::endl;
- if (item.second->getPrecision() == InferenceEngine::Precision::FP32) {
+ if (precision == InferenceEngine::Precision::FP32) {
fillBlobImInfo(inputBlob, batchSize, image_size);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::FP16) {
+ } else if (precision == InferenceEngine::Precision::FP16) {
fillBlobImInfo(inputBlob, batchSize, image_size);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I32) {
+ } else if (precision == InferenceEngine::Precision::I32) {
fillBlobImInfo(inputBlob, batchSize, image_size);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I64) {
+ } else if (precision == InferenceEngine::Precision::I64) {
fillBlobImInfo(inputBlob, batchSize, image_size);
} else {
THROW_IE_EXCEPTION << "Input precision is not supported for image info!";
@@ -306,23 +312,23 @@ void fillBlobs(const std::vector& inputFiles,
}
// Fill random
slog::info << "Fill input '" << item.first << "' with random values ("
- << std::string((isImage(inputBlob) ? "image" : "some binary data"))
+ << std::string((app_info.isImage() ? "image" : "some binary data"))
<< " is expected)" << slog::endl;
- if (item.second->getPrecision() == InferenceEngine::Precision::FP32) {
+ if (precision == InferenceEngine::Precision::FP32) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::FP16) {
+ } else if (precision == InferenceEngine::Precision::FP16) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I32) {
+ } else if (precision == InferenceEngine::Precision::I32) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I64) {
+ } else if (precision == InferenceEngine::Precision::I64) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::U8) {
+ } else if (precision == InferenceEngine::Precision::U8) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I8) {
+ } else if (precision == InferenceEngine::Precision::I8) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::U16) {
+ } else if (precision == InferenceEngine::Precision::U16) {
fillBlobRandom(inputBlob);
- } else if (item.second->getPrecision() == InferenceEngine::Precision::I16) {
+ } else if (precision == InferenceEngine::Precision::I16) {
fillBlobRandom(inputBlob);
} else {
THROW_IE_EXCEPTION << "Input precision is not supported for " << item.first;
diff --git a/inference-engine/samples/benchmark_app/inputs_filling.hpp b/inference-engine/samples/benchmark_app/inputs_filling.hpp
index 8cbc6915860..82ceefdf188 100644
--- a/inference-engine/samples/benchmark_app/inputs_filling.hpp
+++ b/inference-engine/samples/benchmark_app/inputs_filling.hpp
@@ -1,4 +1,4 @@
-// Copyright (C) 2018-2020 Intel Corporation
+// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
@@ -9,29 +9,10 @@
#include
+#include "utils.hpp"
#include "infer_request_wrap.hpp"
-template
-static bool isImage(const T &blob) {
- auto descriptor = blob->getTensorDesc();
- if (descriptor.getLayout() != InferenceEngine::NCHW) {
- return false;
- }
- auto channels = descriptor.getDims()[1];
- return channels == 3;
-}
-
-template
-static bool isImageInfo(const T &blob) {
- auto descriptor = blob->getTensorDesc();
- if (descriptor.getLayout() != InferenceEngine::NC) {
- return false;
- }
- auto channels = descriptor.getDims()[1];
- return (channels >= 2);
-}
-
void fillBlobs(const std::vector& inputFiles,
const size_t& batchSize,
- const InferenceEngine::ConstInputsDataMap& info,
+ benchmark_app::InputsInfo& app_inputs_info,
std::vector requests);
diff --git a/inference-engine/samples/benchmark_app/main.cpp b/inference-engine/samples/benchmark_app/main.cpp
index 28c69b0d95c..6302ae0a4b6 100644
--- a/inference-engine/samples/benchmark_app/main.cpp
+++ b/inference-engine/samples/benchmark_app/main.cpp
@@ -1,4 +1,4 @@
-// Copyright (C) 2018-2020 Intel Corporation
+// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
@@ -320,6 +320,8 @@ int main(int argc, char *argv[]) {
size_t batchSize = FLAGS_b;
Precision precision = Precision::UNSPECIFIED;
std::string topology_name = "";
+ benchmark_app::InputsInfo app_inputs_info;
+ std::string output_name;
if (!isNetworkCompiled) {
// ----------------- 4. Reading the Intermediate Representation network ----------------------------------------
next_step();
@@ -345,15 +347,12 @@ int main(int argc, char *argv[]) {
next_step();
batchSize = cnnNetwork.getBatchSize();
// Parse input shapes if specified
- InferenceEngine::ICNNNetwork::InputShapes shapes = cnnNetwork.getInputShapes();
bool reshape = false;
- if (!FLAGS_shape.empty()) {
- reshape |= updateShapes(shapes, FLAGS_shape, inputInfo);
- }
- if ((FLAGS_b != 0) && (batchSize != FLAGS_b)) {
- reshape |= adjustShapesBatch(shapes, FLAGS_b, inputInfo);
- }
+ app_inputs_info = getInputsInfo(FLAGS_shape, FLAGS_layout, FLAGS_b, inputInfo, reshape);
if (reshape) {
+ InferenceEngine::ICNNNetwork::InputShapes shapes = {};
+ for (auto& item : app_inputs_info)
+ shapes[item.first] = item.second.shape;
slog::info << "Reshaping network: " << getShapesString(shapes) << slog::endl;
startTime = Time::now();
cnnNetwork.reshape(shapes);
@@ -365,7 +364,9 @@ int main(int argc, char *argv[]) {
{"reshape network time (ms)", duration_ms}
});
}
- batchSize = cnnNetwork.getBatchSize();
+ // use batch size according to provided layout and shapes
+ batchSize = (!FLAGS_layout.empty()) ? getBatchSize(app_inputs_info) : cnnNetwork.getBatchSize();
+
topology_name = cnnNetwork.getName();
slog::info << (FLAGS_b != 0 ? "Network batch size was changed to: " : "Network batch size: ") << batchSize << slog::endl;
@@ -373,9 +374,10 @@ int main(int argc, char *argv[]) {
next_step();
for (auto& item : inputInfo) {
- if (isImage(item.second)) {
+ if (app_inputs_info.at(item.first).isImage()) {
/** Set the precision of input data provided by the user, should be called before load of the network to the device **/
- item.second->setPrecision(Precision::U8);
+ app_inputs_info.at(item.first).precision = Precision::U8;
+ item.second->setPrecision(app_inputs_info.at(item.first).precision);
}
}
// ----------------- 7. Loading the model to the device --------------------------------------------------------
@@ -407,6 +409,7 @@ int main(int argc, char *argv[]) {
{
{"import network time (ms)", duration_ms}
});
+ app_inputs_info = getInputsInfo(FLAGS_shape, FLAGS_layout, FLAGS_b, exeNetwork.GetInputsInfo());
if (batchSize == 0) {
batchSize = 1;
}
@@ -485,8 +488,7 @@ int main(int argc, char *argv[]) {
next_step();
InferRequestsQueue inferRequestsQueue(exeNetwork, nireq);
- const InferenceEngine::ConstInputsDataMap info(exeNetwork.GetInputsInfo());
- fillBlobs(inputFiles, batchSize, info, inferRequestsQueue.requests);
+ fillBlobs(inputFiles, batchSize, app_inputs_info, inferRequestsQueue.requests);
// ----------------- 10. Measuring performance ------------------------------------------------------------------
size_t progressCnt = 0;
diff --git a/inference-engine/samples/benchmark_app/utils.cpp b/inference-engine/samples/benchmark_app/utils.cpp
index 1c1186baa5e..22e53bb346b 100644
--- a/inference-engine/samples/benchmark_app/utils.cpp
+++ b/inference-engine/samples/benchmark_app/utils.cpp
@@ -1,4 +1,4 @@
-// Copyright (C) 2018-2020 Intel Corporation
+// Copyright (C) 2018-2021 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
@@ -8,6 +8,7 @@
#include
#include