Auto Batching impl (#7883)
* auto-batching POC squashed (all commits from auto-batch-2021.3 branch)
(cherry picked from commit d7742f2c747bc514a126cc9a4d5b99f0ff5cbbc7)
* applying/accomodating the API changes after rebase to the master
* replaying modified version of actual batch selection
* eearly experiments with model mem footprint
* changes from rebasing to the latest master
* experimenting with DG1 on the batch size selection, also collecting the mem footprint
* WIP:moving the auto-batching to the icore to let the MULT/AUTO support that, ALLOW_AUTO_BATCHING as a conventional config key. still fials hot device swap
* quick-n-dirty batch footpint vs device total mem
* code style
* testing which models perform badly due to kernels and NOT (batched) footprint
* stub pipeline task to comunicate the readiness rather than promise/future
* quick-n-dirty timeout impl
* explicit _completionTasks,reverting BA to use the timeout
* inputs outputs copies, works with AUTO and demo now
* accomodate the config per device-id, after rebase to the latest master
* allowing the auto-batching only with tput hint to let more conventional tests pass
* fix the pre-mature timeout restaring via waiting for batch1 requests completion
* moved the bacthed request statring ( along with input copies) to the dedicated thread
* [IE CLDNN] Disable bs_fs_yx_bsv16_fsv16 format for int8 convolution
* code style
* increasing the timeout to test the ssd_* models perf (timeout?) issues
* reducing number of output stuff in BA to avoid bloating the logs in experiments
* more aggressive batching for experiments, not limited to 32 and also 4 as a min
* more accurate timeout debugging info
* getting the reqs limitation from the plugin SetConfig as well
* refactor the reshape logic a bit to accomodate CPU for bathcing, also added remeote context
* let the benchamrk_app to consume specific batch values for the auto-batching such as BATCH:GPU(4)
* auto-batching functional test (with results check vs ref) and GPU instance for that
* fixed arithemtic on blobs ptrs
* clang
* handling possible batched network failure
* BATCH as the constants device name in test
* ENABLE_BATCH
* func tests for CPU, also DetectionOutput hetero tests (CPU and GPU)
* DetectionOutput hetero test for the CPU
* reenabling the Auto-Batching in the AUTO
* auto-batching device enabled in the test
* fixed the DO test
* improve the loading loop logic
* brushed the config keys
* allow hetero code-path for explicit device name like BATCH:GPU(4), used in the hetero code-path tests
* fix the test after refactoring
* clang
* moving ThreadSafeQueue to the ie_parallel, as it is re-used in the AUTO/MULTI and BATCH now
* auto-batching hetero test (subgraph with DetectionOutput)
* fixed minor changes that were result of experiments with impl
* code-style
* brushing, disabling CPU's HETERO tests until planned activity for 22.2
* removing home-baked MAX_BATCH_SZIE and swicthing to the official impl by GPU team
* remote blobs tests for the auto-batching (old API)
* brushed names a bit
* CreateContext and LoadNEtwork with context for the Auto-Batching plus remote-blobs tests
* fixed the ieUnitTests with adding CreateContext stub to the MockICore
* clang
* improved remote-blobs tests
* revert the back BA from exeprimenents with AB + device_use_mem
* conformance tests for BATCH, alos batch size 1 is default for BATCH:DEVICE
* remote blobs 2.0 tests, issue with context having the orig device name
* debugging DG1 perf drop (presumably due to non-fitting the device-mem)
* disbaling WA with batch/=2 for excesive mem footptint, leaving only streams 2
* remote blobs 2.0 tests for different tensor sharing types
* converting assert to throw to accomodate legacy API where the lock() was possible to be called
* revert the timeout back to avoid mixing the studies, fixed the footprint calc
* reverting to estimating the max batch by extrapolating from bacth1 size
* more conservative footptint etimation (with bacth1), graceful bacth 1 handling without duplication
* even graceful batch 1 handling without duplication
* WA for MAX_BATCH_SIZE failure, removing batch4 as a min for the auto-batching
* AutoBatchPlugin -> ov_auto_batch_plugin
* WA for gcc 4.8
* clang
* fix misprint
* fixed errors resulted from recent OV's Variant to Any transition
* skip auto-batching for already-batched networks
* AUTO_BATCH_TIMEOUT and tests
* GPU-specific L3
* switched to pure config, also improved ALLOW_AUTO_BATCHING config key handling logic
* debugging device info
* enabling the config tests for the GPU and fixing the Auto-batching tests to pass
* making the default (when not recognized the driver) cache size more aggressive, to accomodate recent HW with old drivers
* skip auto-batching for RNNs and alikes (e.g. single CHW input)
* fixed fallback to the bacth1 and moved HETERO path under condition to avoid bloating
* brushing
* Auto plugin GetMetric support gpu auto-batch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add test case
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* add comments on test
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the vars names, alos adding the excpetion handling
* disabling the auto-batching for the networks with non-batched outputs and faster-rcnn and alikes (CVS-74085) to minimize the of #failures
* add try catch
Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
* brushing the code changed in the GPU plugin
* Auto-Batch requests tests
* brushed varibles a bit (ref)
* cleaned debug output from the ie_core
* cleaned cmake for the Auto-Batch
* removed batchN estimation from batch1
* cleaned from debug printf
* comments, cleanup
* WA the mock test errors introduced with merging the https://github.com/myshevts/openvino/pull/13
* Adding back removed batchN estimation from batch1 to debug degradations on DG1 (resulted from too optimistic MAX_BATCH_SIZE?). This partially reverts commit e8f1738ac1
.
* brushing ie_core.cpp
* fix 32bit compilation
* Code review: ENABLE_AUTO_BATCH
* consolidate the auot-batching logic in ie_core.cpp into single ApplyAutoBAtching
* renamed brushed the OPTIMAL_BATCH (now with_SIZE) and mimicks the MAX_BATCH_SZIE wrt MODEL_PTR
* default value for the OPTIMAL_BATCH_SIZE
* clang
* accomodate new func tests location
* fix shuffle of headers after clang + copyrights
* fixed misprint made during code refactoring
* moving the common therad-safe containers (like ThreadSafeQueue) to the dedicated dev_api header
* switch from the device name to the OPTIMAL_BATCH_SIZE metric presence as a conditin to consider Auto-Batching
* switching from the unsafe size() and minimizing time under lock
* code style
* brushed the ApplyAutoBatching
* brushed the netric/config names and descriptions
* completed the core intergration tests for the auto-batching
* ExecGraphInfo and check for incorrect cfg
* removed explicit dependencies from cmake file of the plugin
* disabling Auto-Batching thru the tput hint (to preserve current product default), only excplicit like BATCH:GPU used in the tests
Co-authored-by: Roman Lyamin <roman.lyamin@intel.com>
Co-authored-by: Hu, Yuan2 <yuan2.hu@intel.com>
This commit is contained in:
parent
bc5da8d522
commit
49b5e5728b
@ -100,6 +100,8 @@ ie_option (ENABLE_GAPI_PREPROCESSING "Enables G-API preprocessing" ON)
|
|||||||
ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
|
ie_option (ENABLE_MULTI "Enables MULTI Device Plugin" ON)
|
||||||
ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
|
ie_option (ENABLE_AUTO "Enables AUTO Device Plugin" ON)
|
||||||
|
|
||||||
|
ie_option (ENABLE_AUTO_BATCH "Enables Auto-Batching Plugin" ON)
|
||||||
|
|
||||||
ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
|
ie_option (ENABLE_HETERO "Enables Hetero Device Plugin" ON)
|
||||||
|
|
||||||
ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)
|
ie_option (ENABLE_TEMPLATE "Enable template plugin" ON)
|
||||||
|
@ -141,6 +141,9 @@ When specifying key values as raw strings (that is, when using Python API), omit
|
|||||||
|
|
||||||
@snippet snippets/GPU_Metric1.cpp part1
|
@snippet snippets/GPU_Metric1.cpp part1
|
||||||
|
|
||||||
|
* OPTIMAL_BATCH_SIZE : Returns _optimal_ batch size for a given network on the given GPU device. The returned value is aligned to power of 2. Also, MODEL_PTR is the required option for this metric since the optimal batch size highly depends on the model. If the MODEL_PTR is not given, the value of 1 is returned. The example code to set the required and optional configs for this metric is available in the following snippet:
|
||||||
|
|
||||||
|
@snippet snippets/GPU_Metric1.cpp part2
|
||||||
## GPU Context and Video Memory Sharing RemoteBlob API
|
## GPU Context and Video Memory Sharing RemoteBlob API
|
||||||
|
|
||||||
See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md)
|
See [RemoteBlob API of GPU Plugin](GPU_RemoteBlob_API.md)
|
||||||
|
@ -14,4 +14,12 @@ options.insert(std::make_pair("AVAILABLE_DEVICE_MEM_SIZE", available_device_mem_
|
|||||||
|
|
||||||
auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as<uint32_t>();
|
auto max_batch_size = core.GetMetric("GPU", GPU_METRIC_KEY(MAX_BATCH_SIZE), options).as<uint32_t>();
|
||||||
//! [part1]
|
//! [part1]
|
||||||
|
//! [part2]
|
||||||
|
std::map<std::string, Parameter> opt = {{"MODEL_PTR", cnnNetwork.getFunction()}}; // Required. Same usage as for the MAX_BATCH_SIZE above. If not set, the OPTIONAL_BATCH_SIZE returns 1.
|
||||||
|
// This is not entirely GPU-specific metric (so METRIC_KEY is used rather than GPU_METRIC_KEY below),
|
||||||
|
// but the GPU is the only device that supports that at the moment.
|
||||||
|
// For the GPU, the metric already accommodates limitation for the on-device memory that the MAX_BATCH_SIZE poses.
|
||||||
|
// so OPTIMAL_BATCH_SIZE is always less than MAX_BATCH_SIZE. Unlike the latter it is also aligned to the power of 2.
|
||||||
|
auto optimal_batch_size = core.GetMetric("GPU", METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
|
||||||
|
//! [part2]
|
||||||
}
|
}
|
||||||
|
@ -6,6 +6,7 @@
|
|||||||
|
|
||||||
#include <string>
|
#include <string>
|
||||||
#include <vector>
|
#include <vector>
|
||||||
|
#include <tuple>
|
||||||
|
|
||||||
namespace cldnn {
|
namespace cldnn {
|
||||||
/// @addtogroup cpp_api C++ API
|
/// @addtogroup cpp_api C++ API
|
||||||
@ -25,6 +26,10 @@ struct gfx_version {
|
|||||||
uint16_t major;
|
uint16_t major;
|
||||||
uint8_t minor;
|
uint8_t minor;
|
||||||
uint8_t revision;
|
uint8_t revision;
|
||||||
|
friend bool operator < (const gfx_version& l, const gfx_version& r) {
|
||||||
|
return std::tie(l.major, l.minor, l.revision)
|
||||||
|
< std::tie(r.major, r.minor, r.revision); // same order
|
||||||
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
/// @brief Information about the device properties and capabilities.
|
/// @brief Information about the device properties and capabilities.
|
||||||
|
@ -124,6 +124,7 @@ std::map<std::string, std::vector<InferenceEngine::Blob::Ptr>> getRemoteInputBlo
|
|||||||
}
|
}
|
||||||
|
|
||||||
auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back());
|
auto blob = InferenceEngine::gpu::make_shared_blob(desc, context, clBuffer.back());
|
||||||
|
blob->allocate();
|
||||||
remoteBlobs[name].push_back(blob);
|
remoteBlobs[name].push_back(blob);
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -109,8 +109,10 @@ std::vector<float> splitFloat(const std::string& s, char delim) {
|
|||||||
|
|
||||||
std::vector<std::string> parseDevices(const std::string& device_string) {
|
std::vector<std::string> parseDevices(const std::string& device_string) {
|
||||||
std::string comma_separated_devices = device_string;
|
std::string comma_separated_devices = device_string;
|
||||||
if (comma_separated_devices.find(":") != std::string::npos) {
|
auto colon = comma_separated_devices.find(":");
|
||||||
comma_separated_devices = comma_separated_devices.substr(comma_separated_devices.find(":") + 1);
|
if (colon != std::string::npos) {
|
||||||
|
auto bracket = comma_separated_devices.find("("); // e.g. in BATCH:GPU(4)
|
||||||
|
comma_separated_devices = comma_separated_devices.substr(colon + 1, bracket - colon - 1);
|
||||||
}
|
}
|
||||||
if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO"))
|
if ((comma_separated_devices == "MULTI") || (comma_separated_devices == "HETERO"))
|
||||||
return std::vector<std::string>();
|
return std::vector<std::string>();
|
||||||
|
@ -26,6 +26,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
|||||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||||
|
endif()
|
||||||
|
|
||||||
if(ENABLE_INTEL_CPU)
|
if(ENABLE_INTEL_CPU)
|
||||||
add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin)
|
add_dependencies(${TARGET_NAME} ov_intel_cpu_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
@ -16,6 +16,7 @@
|
|||||||
#include "cpp/ie_cnn_network.h"
|
#include "cpp/ie_cnn_network.h"
|
||||||
#include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp"
|
#include "cpp_interfaces/interface/ie_iexecutable_network_internal.hpp"
|
||||||
#include "ie_parameter.hpp"
|
#include "ie_parameter.hpp"
|
||||||
|
#include "ie_remote_context.hpp"
|
||||||
#include "threading/ie_itask_executor.hpp"
|
#include "threading/ie_itask_executor.hpp"
|
||||||
|
|
||||||
namespace InferenceEngine {
|
namespace InferenceEngine {
|
||||||
@ -60,6 +61,22 @@ public:
|
|||||||
const std::string& deviceName,
|
const std::string& deviceName,
|
||||||
const std::map<std::string, std::string>& config = {}) = 0;
|
const std::map<std::string, std::string>& config = {}) = 0;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Creates an executable network from a network object.
|
||||||
|
*
|
||||||
|
* Users can create as many networks as they need and use
|
||||||
|
* them simultaneously (up to the limitation of the hardware resources)
|
||||||
|
*
|
||||||
|
* @param network CNNNetwork object acquired from Core::ReadNetwork
|
||||||
|
* @param remoteCtx "Remote" (non-CPU) accelerator device-specific execution context to use
|
||||||
|
* @param config Optional map of pairs: (config parameter name, config parameter value) relevant only for this load
|
||||||
|
* operation
|
||||||
|
* @return An executable network reference
|
||||||
|
*/
|
||||||
|
virtual SoExecutableNetworkInternal LoadNetwork(const CNNNetwork& network,
|
||||||
|
const RemoteContext::Ptr& remoteCtx,
|
||||||
|
const std::map<std::string, std::string>& config = {}) = 0;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @brief Creates an executable network from a model file.
|
* @brief Creates an executable network from a model file.
|
||||||
*
|
*
|
||||||
@ -142,6 +159,16 @@ public:
|
|||||||
*/
|
*/
|
||||||
virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0;
|
virtual bool DeviceSupportsImportExport(const std::string& deviceName) const = 0;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Create a new shared context object on specified accelerator device
|
||||||
|
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
|
||||||
|
* @param deviceName Name of a device to create new shared context on.
|
||||||
|
* @param params Map of device-specific shared context parameters.
|
||||||
|
* @return A shared pointer to a created remote context.
|
||||||
|
*/
|
||||||
|
virtual InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
|
||||||
|
const InferenceEngine::ParamMap&) = 0;
|
||||||
|
|
||||||
virtual bool isNewAPI() const = 0;
|
virtual bool isNewAPI() const = 0;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@ -165,6 +192,7 @@ public:
|
|||||||
|
|
||||||
static std::vector<std::string> getHeteroDevices(std::string fallbackDevice);
|
static std::vector<std::string> getHeteroDevices(std::string fallbackDevice);
|
||||||
static std::vector<std::string> getMultiDevices(std::string devicesList);
|
static std::vector<std::string> getMultiDevices(std::string devicesList);
|
||||||
|
static std::string getBatchDevice(std::string devicesList);
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace InferenceEngine
|
} // namespace InferenceEngine
|
||||||
|
@ -23,14 +23,12 @@ struct MemBandwidthPressure {
|
|||||||
|
|
||||||
static MemBandwidthPressure MemBandwidthPressureTolerance(
|
static MemBandwidthPressure MemBandwidthPressureTolerance(
|
||||||
const std::shared_ptr<ngraph::Function> nGraphFunc,
|
const std::shared_ptr<ngraph::Function> nGraphFunc,
|
||||||
const float L2_cache_size,
|
const float cache_size,
|
||||||
const float L3_cache_size,
|
|
||||||
const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) {
|
const float memThresholdAssumeLimited = MemBandwidthPressure::LIMITED) {
|
||||||
int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0,
|
int total_convs = 0, mem_limited_convs = 0, compute_convs = 0, total_gemms = 0, mem_limited_gemms = 0,
|
||||||
total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0;
|
total_deconvs = 0, compute_deconvs = 0, mem_limited_deconvs = 0;
|
||||||
auto memLimitedFactor = [&](int size_data_moved, int datatype_size) -> float {
|
auto memLimitedFactor = [&](int size_data_moved, int datatype_size = 4) -> float {
|
||||||
return (L2_cache_size * 1.0f /*util factor, tbd */
|
return (cache_size / (size_data_moved * datatype_size));
|
||||||
/ (size_data_moved * datatype_size));
|
|
||||||
};
|
};
|
||||||
auto isLowPrecision = [&](ngraph::element::Type type) -> bool {
|
auto isLowPrecision = [&](ngraph::element::Type type) -> bool {
|
||||||
return (type == ngraph::element::i8) || (type == ngraph::element::u8);
|
return (type == ngraph::element::i8) || (type == ngraph::element::u8);
|
||||||
|
@ -0,0 +1,86 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <cstddef>
|
||||||
|
#include <mutex>
|
||||||
|
#include <queue>
|
||||||
|
#include <type_traits>
|
||||||
|
|
||||||
|
#include "ie_parallel.hpp"
|
||||||
|
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
||||||
|
# include <tbb/concurrent_queue.h>
|
||||||
|
#endif
|
||||||
|
|
||||||
|
namespace InferenceEngine {
|
||||||
|
|
||||||
|
template <typename T>
|
||||||
|
class ThreadSafeQueueWithSize {
|
||||||
|
public:
|
||||||
|
void push(T value) {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
_queue.push(std::move(value));
|
||||||
|
}
|
||||||
|
bool try_pop(T& value) {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
if (!_queue.empty()) {
|
||||||
|
value = std::move(_queue.front());
|
||||||
|
_queue.pop();
|
||||||
|
return true;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
size_t size() {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
return _queue.size();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
std::queue<T> _queue;
|
||||||
|
std::mutex _mutex;
|
||||||
|
};
|
||||||
|
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
||||||
|
template <typename T>
|
||||||
|
using ThreadSafeQueue = tbb::concurrent_queue<T>;
|
||||||
|
template <typename T>
|
||||||
|
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
|
||||||
|
#else
|
||||||
|
template <typename T>
|
||||||
|
using ThreadSafeQueue = ThreadSafeQueueWithSize<T>;
|
||||||
|
template <typename T>
|
||||||
|
class ThreadSafeBoundedQueue {
|
||||||
|
public:
|
||||||
|
ThreadSafeBoundedQueue() = default;
|
||||||
|
bool try_push(T value) {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
if (_capacity) {
|
||||||
|
_queue.push(std::move(value));
|
||||||
|
}
|
||||||
|
return _capacity;
|
||||||
|
}
|
||||||
|
bool try_pop(T& value) {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
if (_capacity && !_queue.empty()) {
|
||||||
|
value = std::move(_queue.front());
|
||||||
|
_queue.pop();
|
||||||
|
return true;
|
||||||
|
} else {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void set_capacity(std::size_t newCapacity) {
|
||||||
|
std::lock_guard<std::mutex> lock(_mutex);
|
||||||
|
_capacity = newCapacity;
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
std::queue<T> _queue;
|
||||||
|
std::mutex _mutex;
|
||||||
|
bool _capacity = false;
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
} // namespace InferenceEngine
|
@ -118,6 +118,18 @@ DECLARE_METRIC_VALUE(BATCHED_BLOB);
|
|||||||
* String value for metric name is "RANGE_FOR_STREAMS".
|
* String value for metric name is "RANGE_FOR_STREAMS".
|
||||||
*/
|
*/
|
||||||
DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple<unsigned int, unsigned int>);
|
DECLARE_METRIC_KEY(RANGE_FOR_STREAMS, std::tuple<unsigned int, unsigned int>);
|
||||||
|
/**
|
||||||
|
* @brief Metric to query information optimal batch size for the given device and the network
|
||||||
|
*
|
||||||
|
* Metric returns a value of unsigned int type,
|
||||||
|
* Returns optimal batch size for a given network on the given device. The returned value is aligned to power of 2.
|
||||||
|
* Also, MODEL_PTR is the required option for this metric since the optimal batch size depends on the model,
|
||||||
|
* so if the MODEL_PTR is not given, the result of the metric is always 1.
|
||||||
|
* For the GPU the metric is queried automatically whenever the OpenVINO performance hint for the throughput is used,
|
||||||
|
* so that the result (>1) governs the automatic batching (transparently to the application).
|
||||||
|
* The automatic batching can be disabled with ALLOW_AUTO_BATCHING set to NO
|
||||||
|
*/
|
||||||
|
DECLARE_METRIC_KEY(OPTIMAL_BATCH_SIZE, unsigned int);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams,
|
* @brief Metric to provide a hint for a range for number of async infer requests. If device supports streams,
|
||||||
@ -250,6 +262,15 @@ DECLARE_CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS);
|
|||||||
DECLARE_CONFIG_VALUE(YES);
|
DECLARE_CONFIG_VALUE(YES);
|
||||||
DECLARE_CONFIG_VALUE(NO);
|
DECLARE_CONFIG_VALUE(NO);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Auto-batching configuration, string for the device + batch size, e.g. "GPU(4)"
|
||||||
|
*/
|
||||||
|
DECLARE_CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG);
|
||||||
|
/**
|
||||||
|
* @brief Auto-batching configuration: string with timeout (in ms), e.g. "100"
|
||||||
|
*/
|
||||||
|
DECLARE_CONFIG_KEY(AUTO_BATCH_TIMEOUT);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @brief Limit `#threads` that are used by Inference Engine for inference on the CPU.
|
* @brief Limit `#threads` that are used by Inference Engine for inference on the CPU.
|
||||||
*/
|
*/
|
||||||
|
@ -46,6 +46,7 @@
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
using namespace InferenceEngine::PluginConfigParams;
|
using namespace InferenceEngine::PluginConfigParams;
|
||||||
|
using namespace InferenceEngine;
|
||||||
using namespace std::placeholders;
|
using namespace std::placeholders;
|
||||||
|
|
||||||
namespace ov {
|
namespace ov {
|
||||||
@ -94,6 +95,9 @@ Parsed<T> parseDeviceNameIntoConfig(const std::string& deviceName, const std::ma
|
|||||||
config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] =
|
config_[ie::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES] =
|
||||||
deviceName.substr(std::string("AUTO:").size());
|
deviceName.substr(std::string("AUTO:").size());
|
||||||
}
|
}
|
||||||
|
} else if (deviceName_.find("BATCH:") == 0) {
|
||||||
|
deviceName_ = "BATCH";
|
||||||
|
config_[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceName.substr(6);
|
||||||
} else {
|
} else {
|
||||||
ie::DeviceIDParser parser(deviceName_);
|
ie::DeviceIDParser parser(deviceName_);
|
||||||
deviceName_ = parser.getDeviceName();
|
deviceName_ = parser.getDeviceName();
|
||||||
@ -480,14 +484,22 @@ public:
|
|||||||
return newAPI;
|
return newAPI;
|
||||||
}
|
}
|
||||||
|
|
||||||
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(const ie::CNNNetwork& network,
|
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> LoadNetwork(
|
||||||
const std::shared_ptr<ie::RemoteContext>& context,
|
const ie::CNNNetwork& network,
|
||||||
const std::map<std::string, std::string>& config) {
|
const std::shared_ptr<ie::RemoteContext>& context,
|
||||||
|
const std::map<std::string, std::string>& config) override {
|
||||||
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
|
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
|
||||||
if (context == nullptr) {
|
if (context == nullptr) {
|
||||||
IE_THROW() << "Remote context is null";
|
IE_THROW() << "Remote context is null";
|
||||||
}
|
}
|
||||||
|
// have to deduce the device name/config from the context first
|
||||||
auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config);
|
auto parsed = parseDeviceNameIntoConfig(context->getDeviceName(), config);
|
||||||
|
std::string& deviceName = parsed._deviceName;
|
||||||
|
std::map<std::string, std::string>& config_with_batch = parsed._config;
|
||||||
|
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
|
||||||
|
ApplyAutoBatching(network, deviceName, config_with_batch);
|
||||||
|
parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
|
||||||
|
|
||||||
auto plugin = GetCPPPluginByName(parsed._deviceName);
|
auto plugin = GetCPPPluginByName(parsed._deviceName);
|
||||||
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> res;
|
ov::runtime::SoPtr<ie::IExecutableNetworkInternal> res;
|
||||||
auto cacheManager = coreConfig.getCacheConfig()._cacheManager;
|
auto cacheManager = coreConfig.getCacheConfig()._cacheManager;
|
||||||
@ -508,12 +520,59 @@ public:
|
|||||||
return res;
|
return res;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void ApplyAutoBatching(const ie::CNNNetwork& network,
|
||||||
|
std::string& deviceName,
|
||||||
|
std::map<std::string, std::string>& config_with_batch) {
|
||||||
|
if (deviceName.find("BATCH") != std::string::npos) {
|
||||||
|
// explicitly enabled Auto-Batching e.g. in the tests
|
||||||
|
auto pos = deviceName.find_first_of(":");
|
||||||
|
if (pos != std::string::npos) {
|
||||||
|
auto deviceNameWithBatchSize = deviceName.substr(pos + 1);
|
||||||
|
auto deviceNameWithoutBatch = DeviceIDParser::getBatchDevice(deviceNameWithBatchSize);
|
||||||
|
auto function = network.getFunction();
|
||||||
|
// have to execute the DetectionOutput separately (without batching)
|
||||||
|
// as this layer mix-in the values from the different inputs (batch id)
|
||||||
|
bool bDetectionOutput = false;
|
||||||
|
const std::string detectionOutputOpName = ngraph::op::DetectionOutput::get_type_info_static().name;
|
||||||
|
const std::string resultOpName = ngraph::op::Result::get_type_info_static().name;
|
||||||
|
for (auto&& node : function->get_ops()) {
|
||||||
|
auto isDetectionOutputParent = [&detectionOutputOpName](decltype(node)& nd) {
|
||||||
|
for (size_t n = 0; n < nd->get_input_size(); n++) {
|
||||||
|
if (detectionOutputOpName == nd->get_input_node_ptr(n)->get_type_info().name)
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
};
|
||||||
|
|
||||||
|
if ((detectionOutputOpName == node->get_type_info().name) ||
|
||||||
|
((resultOpName == node->get_type_info().name) && isDetectionOutputParent(node))) {
|
||||||
|
node->get_rt_info()["affinity"] = deviceNameWithoutBatch;
|
||||||
|
bDetectionOutput = true;
|
||||||
|
} else {
|
||||||
|
node->get_rt_info()["affinity"] = "BATCH";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (bDetectionOutput) {
|
||||||
|
deviceName = "HETERO:BATCH," + deviceNameWithoutBatch;
|
||||||
|
config_with_batch[CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)] = deviceNameWithBatchSize;
|
||||||
|
} else {
|
||||||
|
deviceName = "BATCH:" + deviceNameWithBatchSize;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network,
|
ie::SoExecutableNetworkInternal LoadNetwork(const ie::CNNNetwork& network,
|
||||||
const std::string& deviceName,
|
const std::string& deviceNameOrig,
|
||||||
const std::map<std::string, std::string>& config) override {
|
const std::map<std::string, std::string>& config) override {
|
||||||
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN");
|
OV_ITT_SCOPE(FIRST_INFERENCE, ie::itt::domains::IE_LT, "Core::LoadNetwork::CNN");
|
||||||
bool forceDisableCache = config.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
|
std::string deviceName = deviceNameOrig;
|
||||||
auto parsed = parseDeviceNameIntoConfig(deviceName, config);
|
std::map<std::string, std::string> config_with_batch = config;
|
||||||
|
// if auto-batching is applicable, the below function will patch the device name and config accordingly:
|
||||||
|
ApplyAutoBatching(network, deviceName, config_with_batch);
|
||||||
|
|
||||||
|
bool forceDisableCache = config_with_batch.count(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE)) > 0;
|
||||||
|
auto parsed = parseDeviceNameIntoConfig(deviceName, config_with_batch);
|
||||||
if (forceDisableCache) {
|
if (forceDisableCache) {
|
||||||
// remove this config key from parsed as plugins can throw unsupported exception
|
// remove this config key from parsed as plugins can throw unsupported exception
|
||||||
parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE));
|
parsed._config.erase(CONFIG_KEY_INTERNAL(FORCE_DISABLE_CACHE));
|
||||||
@ -732,6 +791,19 @@ public:
|
|||||||
return devices;
|
return devices;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Create a new shared context object on specified accelerator device
|
||||||
|
* using specified plugin-specific low level device API parameters (device handle, pointer, etc.)
|
||||||
|
* @param deviceName Name of a device to create new shared context on.
|
||||||
|
* @param params Map of device-specific shared context parameters.
|
||||||
|
* @return A shared pointer to a created remote context.
|
||||||
|
*/
|
||||||
|
InferenceEngine::RemoteContext::Ptr CreateContext(const std::string& deviceName,
|
||||||
|
const InferenceEngine::ParamMap& params) override {
|
||||||
|
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
|
||||||
|
return GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @brief Returns reference to CPP plugin wrapper by a device name
|
* @brief Returns reference to CPP plugin wrapper by a device name
|
||||||
* @param deviceName A name of device
|
* @param deviceName A name of device
|
||||||
@ -1030,6 +1102,12 @@ public:
|
|||||||
deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1));
|
deviceNames = ie::DeviceIDParser::getMultiDevices(deviceName.substr(pos + 1));
|
||||||
}
|
}
|
||||||
deviceNames.emplace_back("AUTO");
|
deviceNames.emplace_back("AUTO");
|
||||||
|
} else if (deviceName.find("BATCH") == 0) {
|
||||||
|
auto pos = deviceName.find_first_of(":");
|
||||||
|
if (pos != std::string::npos) {
|
||||||
|
deviceNames = {ie::DeviceIDParser::getBatchDevice(deviceName.substr(pos + 1))};
|
||||||
|
}
|
||||||
|
deviceNames.push_back("BATCH");
|
||||||
} else {
|
} else {
|
||||||
deviceNames.push_back(deviceName);
|
deviceNames.push_back(deviceName);
|
||||||
}
|
}
|
||||||
@ -1120,8 +1198,8 @@ std::vector<std::string> DeviceIDParser::getHeteroDevices(std::string fallbackDe
|
|||||||
}
|
}
|
||||||
|
|
||||||
std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList) {
|
std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList) {
|
||||||
std::vector<std::string> deviceNames;
|
std::set<std::string> deviceNames;
|
||||||
auto trim_request_info = [](std::string device_with_requests) {
|
auto trim_request_info = [](const std::string& device_with_requests) {
|
||||||
auto opening_bracket = device_with_requests.find_first_of('(');
|
auto opening_bracket = device_with_requests.find_first_of('(');
|
||||||
return device_with_requests.substr(0, opening_bracket);
|
return device_with_requests.substr(0, opening_bracket);
|
||||||
};
|
};
|
||||||
@ -1132,14 +1210,36 @@ std::vector<std::string> DeviceIDParser::getMultiDevices(std::string devicesList
|
|||||||
// we skip the #requests info here
|
// we skip the #requests info here
|
||||||
while ((pos = devicesList.find(delimiter)) != std::string::npos) {
|
while ((pos = devicesList.find(delimiter)) != std::string::npos) {
|
||||||
auto d = devicesList.substr(0, pos);
|
auto d = devicesList.substr(0, pos);
|
||||||
deviceNames.push_back(trim_request_info(d));
|
if (d.find("BATCH") == 0) {
|
||||||
|
deviceNames.insert("BATCH");
|
||||||
|
auto p = d.find_first_of(":");
|
||||||
|
if (p != std::string::npos)
|
||||||
|
deviceNames.insert(DeviceIDParser::getBatchDevice(d.substr(p + 1)));
|
||||||
|
} else {
|
||||||
|
deviceNames.insert(trim_request_info(d));
|
||||||
|
}
|
||||||
devicesList.erase(0, pos + 1);
|
devicesList.erase(0, pos + 1);
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!devicesList.empty())
|
if (!devicesList.empty()) {
|
||||||
deviceNames.push_back(trim_request_info(devicesList));
|
if (devicesList.find("BATCH") == 0) {
|
||||||
|
deviceNames.insert("BATCH");
|
||||||
|
auto p = devicesList.find_first_of(":");
|
||||||
|
if (p != std::string::npos)
|
||||||
|
deviceNames.insert(DeviceIDParser::getBatchDevice(devicesList.substr(p + 1)));
|
||||||
|
} else {
|
||||||
|
deviceNames.insert(trim_request_info(devicesList));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return std::vector<std::string>(deviceNames.begin(), deviceNames.end());
|
||||||
|
}
|
||||||
|
|
||||||
return deviceNames;
|
std::string DeviceIDParser::getBatchDevice(std::string device) {
|
||||||
|
auto trim_request_info = [](const std::string& device_with_requests) {
|
||||||
|
auto opening_bracket = device_with_requests.find_first_of('(');
|
||||||
|
return device_with_requests.substr(0, opening_bracket);
|
||||||
|
};
|
||||||
|
return trim_request_info(device);
|
||||||
}
|
}
|
||||||
|
|
||||||
class Core::Impl : public ov::runtime::CoreImpl {
|
class Core::Impl : public ov::runtime::CoreImpl {
|
||||||
@ -1207,18 +1307,7 @@ ExecutableNetwork Core::LoadNetwork(const std::string& modelPath, const std::map
|
|||||||
}
|
}
|
||||||
|
|
||||||
RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) {
|
RemoteContext::Ptr Core::CreateContext(const std::string& deviceName, const ParamMap& params) {
|
||||||
if (deviceName.find("HETERO") == 0) {
|
return _impl->CreateContext(deviceName, params);
|
||||||
IE_THROW() << "HETERO device does not support remote context";
|
|
||||||
}
|
|
||||||
if (deviceName.find("MULTI") == 0) {
|
|
||||||
IE_THROW() << "MULTI device does not support remote context";
|
|
||||||
}
|
|
||||||
if (deviceName.find("AUTO") == 0) {
|
|
||||||
IE_THROW() << "AUTO device does not support remote context";
|
|
||||||
}
|
|
||||||
|
|
||||||
auto parsed = ov::runtime::parseDeviceNameIntoConfig(deviceName, params);
|
|
||||||
return _impl->GetCPPPluginByName(parsed._deviceName).create_context(parsed._config)._ptr;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) {
|
RemoteContext::Ptr Core::GetDefaultContext(const std::string& deviceName) {
|
||||||
|
@ -21,3 +21,7 @@ endif()
|
|||||||
if(ENABLE_AUTO OR ENABLE_MULTI)
|
if(ENABLE_AUTO OR ENABLE_MULTI)
|
||||||
add_subdirectory(auto)
|
add_subdirectory(auto)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
add_subdirectory(auto_batch)
|
||||||
|
endif()
|
||||||
|
@ -156,7 +156,8 @@ MultiDeviceExecutableNetwork::MultiDeviceExecutableNetwork(const std::string&
|
|||||||
, _needPerfCounters(needPerfCounters)
|
, _needPerfCounters(needPerfCounters)
|
||||||
, _multiPlugin(plugin)
|
, _multiPlugin(plugin)
|
||||||
, _context(context)
|
, _context(context)
|
||||||
, _workModeIsAUTO(true) {
|
, _workModeIsAUTO(true)
|
||||||
|
, _network(network) {
|
||||||
if (_multiPlugin->GetCore() == nullptr) {
|
if (_multiPlugin->GetCore() == nullptr) {
|
||||||
IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object";
|
IE_THROW() << "Please, work with " << _multiPlugin->GetName() << " device via InferencEngine::Core object";
|
||||||
}
|
}
|
||||||
@ -667,10 +668,30 @@ InferenceEngine::Parameter MultiDeviceExecutableNetwork::GetMetric(const std::st
|
|||||||
real = _loadContext[ACTUALDEVICE].
|
real = _loadContext[ACTUALDEVICE].
|
||||||
executableNetwork->GetMetric(name).as<unsigned int>();
|
executableNetwork->GetMetric(name).as<unsigned int>();
|
||||||
} else {
|
} else {
|
||||||
|
IE_ASSERT(_loadContext[CPU].isAlready == true);
|
||||||
real = _loadContext[CPU].
|
real = _loadContext[CPU].
|
||||||
executableNetwork->GetMetric(name).as<unsigned int>();
|
executableNetwork->GetMetric(name).as<unsigned int>();
|
||||||
|
std::unique_lock<std::mutex> lock(_confMutex);
|
||||||
|
auto deviceInfo = _loadContext[ACTUALDEVICE].deviceInfo;
|
||||||
|
lock.unlock();
|
||||||
|
if (deviceInfo.deviceName.find("GPU") != std::string::npos) {
|
||||||
|
const auto& mode = deviceInfo.config.find(CONFIG_KEY(PERFORMANCE_HINT));
|
||||||
|
if (mode != deviceInfo.config.end() && mode->second == CONFIG_VALUE(THROUGHPUT)) {
|
||||||
|
std::map<std::string, InferenceEngine::Parameter> options;
|
||||||
|
options["MODEL_PTR"] = _network.getFunction(); // CNNntework
|
||||||
|
try {
|
||||||
|
auto optimalBatchSize = _core->GetMetric(deviceInfo.deviceName,
|
||||||
|
METRIC_KEY(OPTIMAL_BATCH_SIZE), options).as<unsigned int>();
|
||||||
|
auto rangeOfStreams = _core->GetMetric(deviceInfo.deviceName,
|
||||||
|
METRIC_KEY(RANGE_FOR_STREAMS), options).as<std::tuple<unsigned int, unsigned int>>();
|
||||||
|
real = (std::max)(real, std::get<1>(rangeOfStreams) * optimalBatchSize);
|
||||||
|
} catch (const InferenceEngine::Exception &iie) {
|
||||||
|
LOG_WARNING("[AUTOPLUGIN]get optimal infer requset num for GPU auto-batch failed :%s", iie.what());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
unsigned int res = std::max(8u, real);
|
unsigned int res = (std::max)(8u, real);
|
||||||
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res);
|
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, res);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -7,22 +7,17 @@
|
|||||||
|
|
||||||
#include <atomic>
|
#include <atomic>
|
||||||
#include <mutex>
|
#include <mutex>
|
||||||
#include <queue>
|
|
||||||
#include <unordered_map>
|
#include <unordered_map>
|
||||||
#include <map>
|
#include <map>
|
||||||
#include <vector>
|
#include <vector>
|
||||||
#include <string>
|
#include <string>
|
||||||
|
|
||||||
#include <cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp>
|
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
|
||||||
#include <ie_parallel.hpp>
|
#include "threading/ie_thread_safe_containers.hpp"
|
||||||
#include <threading/ie_itask_executor.hpp>
|
#include "threading/ie_itask_executor.hpp"
|
||||||
#include <threading/ie_executor_manager.hpp>
|
#include "threading/ie_executor_manager.hpp"
|
||||||
#include "ie_icore.hpp"
|
#include "ie_icore.hpp"
|
||||||
|
|
||||||
#if (IE_THREAD == IE_THREAD_TBB || IE_THREAD == IE_THREAD_TBB_AUTO)
|
|
||||||
# include <tbb/concurrent_queue.h>
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#ifdef MULTIUNITTEST
|
#ifdef MULTIUNITTEST
|
||||||
#define MOCKTESTMACRO virtual
|
#define MOCKTESTMACRO virtual
|
||||||
#define MultiDevicePlugin MockMultiDevicePlugin
|
#define MultiDevicePlugin MockMultiDevicePlugin
|
||||||
@ -79,66 +74,6 @@ enum AutoLoadContextIndex {
|
|||||||
template<typename T>
|
template<typename T>
|
||||||
using DeviceMap = std::unordered_map<DeviceName, T>;
|
using DeviceMap = std::unordered_map<DeviceName, T>;
|
||||||
|
|
||||||
#if ((IE_THREAD == IE_THREAD_TBB) || (IE_THREAD == IE_THREAD_TBB_AUTO))
|
|
||||||
template <typename T>
|
|
||||||
using ThreadSafeQueue = tbb::concurrent_queue<T>;
|
|
||||||
template <typename T>
|
|
||||||
using ThreadSafeBoundedQueue = tbb::concurrent_bounded_queue<T>;
|
|
||||||
#else
|
|
||||||
template <typename T>
|
|
||||||
class ThreadSafeQueue {
|
|
||||||
public:
|
|
||||||
void push(T value) {
|
|
||||||
std::lock_guard<std::mutex> lock(_mutex);
|
|
||||||
_queue.push(std::move(value));
|
|
||||||
}
|
|
||||||
bool try_pop(T& value) {
|
|
||||||
std::lock_guard<std::mutex> lock(_mutex);
|
|
||||||
if (!_queue.empty()) {
|
|
||||||
value = std::move(_queue.front());
|
|
||||||
_queue.pop();
|
|
||||||
return true;
|
|
||||||
} else {
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
protected:
|
|
||||||
std::queue<T> _queue;
|
|
||||||
std::mutex _mutex;
|
|
||||||
};
|
|
||||||
template <typename T>
|
|
||||||
class ThreadSafeBoundedQueue {
|
|
||||||
public:
|
|
||||||
ThreadSafeBoundedQueue() = default;
|
|
||||||
bool try_push(T value) {
|
|
||||||
std::lock_guard<std::mutex> lock(_mutex);
|
|
||||||
if (_capacity) {
|
|
||||||
_queue.push(std::move(value));
|
|
||||||
}
|
|
||||||
return _capacity;
|
|
||||||
}
|
|
||||||
bool try_pop(T& value) {
|
|
||||||
std::lock_guard<std::mutex> lock(_mutex);
|
|
||||||
if (_capacity && !_queue.empty()) {
|
|
||||||
value = std::move(_queue.front());
|
|
||||||
_queue.pop();
|
|
||||||
return true;
|
|
||||||
} else {
|
|
||||||
return false;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
void set_capacity(std::size_t newCapacity) {
|
|
||||||
std::lock_guard<std::mutex> lock(_mutex);
|
|
||||||
_capacity = newCapacity;
|
|
||||||
}
|
|
||||||
|
|
||||||
protected:
|
|
||||||
std::queue<T> _queue;
|
|
||||||
std::mutex _mutex;
|
|
||||||
bool _capacity = false;
|
|
||||||
};
|
|
||||||
#endif
|
|
||||||
|
|
||||||
class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault,
|
class MultiDeviceExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault,
|
||||||
public InferenceEngine::ITaskExecutor {
|
public InferenceEngine::ITaskExecutor {
|
||||||
public:
|
public:
|
||||||
@ -148,7 +83,7 @@ public:
|
|||||||
InferenceEngine::Task _task;
|
InferenceEngine::Task _task;
|
||||||
std::exception_ptr _exceptionPtr = nullptr;
|
std::exception_ptr _exceptionPtr = nullptr;
|
||||||
};
|
};
|
||||||
using NotBusyWorkerRequests = ThreadSafeBoundedQueue<WorkerInferRequest*>;
|
using NotBusyWorkerRequests = InferenceEngine::ThreadSafeBoundedQueue<WorkerInferRequest*>;
|
||||||
|
|
||||||
explicit MultiDeviceExecutableNetwork(const DeviceMap<InferenceEngine::SoExecutableNetworkInternal>& networksPerDevice,
|
explicit MultiDeviceExecutableNetwork(const DeviceMap<InferenceEngine::SoExecutableNetworkInternal>& networksPerDevice,
|
||||||
const std::vector<DeviceInformation>& networkDevices,
|
const std::vector<DeviceInformation>& networkDevices,
|
||||||
@ -186,8 +121,8 @@ public:
|
|||||||
std::vector<DeviceInformation> _devicePriorities;
|
std::vector<DeviceInformation> _devicePriorities;
|
||||||
const std::vector<DeviceInformation> _devicePrioritiesInitial;
|
const std::vector<DeviceInformation> _devicePrioritiesInitial;
|
||||||
DeviceMap<InferenceEngine::SoExecutableNetworkInternal> _networksPerDevice;
|
DeviceMap<InferenceEngine::SoExecutableNetworkInternal> _networksPerDevice;
|
||||||
ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
|
InferenceEngine::ThreadSafeQueue<InferenceEngine::Task> _inferPipelineTasks;
|
||||||
DeviceMap<std::unique_ptr<ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
|
DeviceMap<std::unique_ptr<InferenceEngine::ThreadSafeQueue<InferenceEngine::Task>>> _inferPipelineTasksDeviceSpecific;
|
||||||
DeviceMap<NotBusyWorkerRequests> _idleWorkerRequests;
|
DeviceMap<NotBusyWorkerRequests> _idleWorkerRequests;
|
||||||
DeviceMap<std::vector<WorkerInferRequest>> _workerRequests;
|
DeviceMap<std::vector<WorkerInferRequest>> _workerRequests;
|
||||||
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
|
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
|
||||||
@ -217,6 +152,7 @@ private:
|
|||||||
std::promise<void> _firstLoadPromise;
|
std::promise<void> _firstLoadPromise;
|
||||||
mutable AutoLoadContext _loadContext[CONTEXTNUM];
|
mutable AutoLoadContext _loadContext[CONTEXTNUM];
|
||||||
mutable std::mutex _confMutex;
|
mutable std::mutex _confMutex;
|
||||||
|
const InferenceEngine::CNNNetwork _network;
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace MultiDevicePlugin
|
} // namespace MultiDevicePlugin
|
||||||
|
20
src/plugins/auto_batch/CMakeLists.txt
Normal file
20
src/plugins/auto_batch/CMakeLists.txt
Normal file
@ -0,0 +1,20 @@
|
|||||||
|
# Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
# SPDX-License-Identifier: Apache-2.0
|
||||||
|
#
|
||||||
|
|
||||||
|
set(TARGET_NAME "ov_auto_batch_plugin")
|
||||||
|
|
||||||
|
file(GLOB SOURCES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
|
||||||
|
|
||||||
|
file(GLOB HEADERS ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
|
||||||
|
|
||||||
|
ie_add_plugin(NAME ${TARGET_NAME}
|
||||||
|
DEVICE_NAME "BATCH"
|
||||||
|
SOURCES ${SOURCES} ${HEADERS}
|
||||||
|
VERSION_DEFINES_FOR auto_batch.cpp ADD_CLANG_FORMAT)
|
||||||
|
|
||||||
|
target_link_libraries(${TARGET_NAME} PRIVATE Threads::Threads)
|
||||||
|
|
||||||
|
ie_add_api_validator_post_build_step(TARGET ${TARGET_NAME})
|
||||||
|
|
||||||
|
set_target_properties(${TARGET_NAME} PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE ${ENABLE_LTO})
|
731
src/plugins/auto_batch/auto_batch.cpp
Normal file
731
src/plugins/auto_batch/auto_batch.cpp
Normal file
@ -0,0 +1,731 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||||
|
#include "auto_batch.hpp"
|
||||||
|
|
||||||
|
#include <cpp_interfaces/interface/ie_internal_plugin_config.hpp>
|
||||||
|
#include <ie_icore.hpp>
|
||||||
|
#include <ie_ngraph_utils.hpp>
|
||||||
|
#include <ie_performance_hints.hpp>
|
||||||
|
#include <iostream>
|
||||||
|
#include <map>
|
||||||
|
#include <memory>
|
||||||
|
#include <string>
|
||||||
|
#include <unordered_map>
|
||||||
|
#include <unordered_set>
|
||||||
|
#include <utility>
|
||||||
|
#include <vector>
|
||||||
|
|
||||||
|
namespace AutoBatchPlugin {
|
||||||
|
using namespace InferenceEngine;
|
||||||
|
|
||||||
|
std::vector<std::string> supported_configKeys = {CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CONFIG_KEY(AUTO_BATCH_TIMEOUT)};
|
||||||
|
|
||||||
|
template <Precision::ePrecision precision>
|
||||||
|
Blob::Ptr create_shared_blob_on_top_of_batched_blob(Blob::Ptr batched_blob, size_t batch_id, size_t batch_num) {
|
||||||
|
typedef typename PrecisionTrait<precision>::value_type TYPE;
|
||||||
|
typedef typename std::add_pointer<TYPE>::type TYPEPTR;
|
||||||
|
auto ptr = batched_blob->buffer().as<TYPEPTR>();
|
||||||
|
auto sizePerBatch = batched_blob->size() / batch_num;
|
||||||
|
auto layout = batched_blob->getTensorDesc().getLayout();
|
||||||
|
SizeVector dims = batched_blob->getTensorDesc().getDims();
|
||||||
|
// the below code is a placeholder for the WIP (22.1) functionality
|
||||||
|
// that will check the reshaping by the batch is robust (CVS-51744)
|
||||||
|
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
|
||||||
|
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
|
||||||
|
layout == InferenceEngine::Layout::NDHWC) {
|
||||||
|
dims[0] = 1;
|
||||||
|
assert(batched_blob->getTensorDesc().getPrecision() == precision);
|
||||||
|
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()},
|
||||||
|
ptr + sizePerBatch * batch_id,
|
||||||
|
sizePerBatch);
|
||||||
|
} else {
|
||||||
|
// same blob for all requests (e.g. constants)
|
||||||
|
return make_shared_blob<TYPE>({precision, dims, batched_blob->getTensorDesc().getLayout()}, ptr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ------------------------------AutoBatchInferRequest----------------------------
|
||||||
|
AutoBatchInferRequest::AutoBatchInferRequest(const InputsDataMap& networkInputs,
|
||||||
|
const OutputsDataMap& networkOutputs,
|
||||||
|
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
|
||||||
|
int batch_id,
|
||||||
|
int num_batch,
|
||||||
|
bool needPerfCounters)
|
||||||
|
: IInferRequestInternal(networkInputs, networkOutputs),
|
||||||
|
_myBatchedRequestWrapper(workerRequestPtr),
|
||||||
|
_needPerfCounters(needPerfCounters),
|
||||||
|
_batchId(batch_id),
|
||||||
|
_batchSize(num_batch) {
|
||||||
|
// Allocate all input blobs
|
||||||
|
for (const auto& it : networkInputs) {
|
||||||
|
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
|
||||||
|
Blob::Ptr res;
|
||||||
|
switch (it.second->getTensorDesc().getPrecision()) {
|
||||||
|
case InferenceEngine::Precision::FP32:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::I32:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::I8:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::U16:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case InferenceEngine::Precision::I16:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::U8:
|
||||||
|
case InferenceEngine::Precision::BOOL:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
IE_THROW() << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
|
||||||
|
}
|
||||||
|
_inputs[it.first] = res;
|
||||||
|
}
|
||||||
|
// Allocate all output blobs
|
||||||
|
for (const auto& it : networkOutputs) {
|
||||||
|
auto blob = _myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first);
|
||||||
|
Blob::Ptr res;
|
||||||
|
switch (it.second->getTensorDesc().getPrecision()) {
|
||||||
|
case InferenceEngine::Precision::FP32:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::FP32>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::I32:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I32>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::I8:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I8>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::U16:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U16>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case InferenceEngine::Precision::I16:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::I16>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
|
||||||
|
break;
|
||||||
|
case InferenceEngine::Precision::U8:
|
||||||
|
case InferenceEngine::Precision::BOOL:
|
||||||
|
res = create_shared_blob_on_top_of_batched_blob<InferenceEngine::Precision::U8>(
|
||||||
|
_myBatchedRequestWrapper._inferRequestBatched->GetBlob(it.first),
|
||||||
|
batch_id,
|
||||||
|
num_batch);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
IE_THROW(NotImplemented) << "Unsupported input precision " << it.second->getTensorDesc().getPrecision();
|
||||||
|
}
|
||||||
|
_outputs[it.first] = res;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferRequest::SetBlobsToAnotherRequest(SoIInferRequestInternal& req) {
|
||||||
|
for (const auto& it : _networkInputs) {
|
||||||
|
auto& name = it.first;
|
||||||
|
// this request is already in BUSY state, so using the internal functions safely
|
||||||
|
auto blob = GetBlob(name);
|
||||||
|
if (req->GetBlob(name) != blob)
|
||||||
|
req->SetBlob(name, blob);
|
||||||
|
}
|
||||||
|
for (const auto& it : _networkOutputs) {
|
||||||
|
auto& name = it.first;
|
||||||
|
// this request is already in BUSY state, so using the internal functions safely
|
||||||
|
auto blob = GetBlob(name);
|
||||||
|
if (req->GetBlob(name) != blob)
|
||||||
|
req->SetBlob(name, blob);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferRequest::CopyInputsIfNeeded() {
|
||||||
|
for (const auto& it : _networkInputs) {
|
||||||
|
auto& name = it.first;
|
||||||
|
// this request is already in BUSY state, so using the internal functions safely
|
||||||
|
CopyBlobIfNeeded(GetBlob(name), _myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), true);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferRequest::CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src,
|
||||||
|
InferenceEngine::Blob::Ptr dst,
|
||||||
|
bool bInput) {
|
||||||
|
auto bufferDst = dst->buffer();
|
||||||
|
auto ptrDst = bufferDst.as<char*>();
|
||||||
|
auto bufferSrc = src->cbuffer();
|
||||||
|
auto ptrSrc = bufferSrc.as<const char*>();
|
||||||
|
ptrdiff_t szDst = dst->byteSize();
|
||||||
|
ptrdiff_t szSrc = src->byteSize();
|
||||||
|
if (bInput) {
|
||||||
|
ptrdiff_t offset = szSrc != szDst ? _batchId * szDst / _batchSize : 0;
|
||||||
|
if ((ptrDst + offset) == ptrSrc)
|
||||||
|
return;
|
||||||
|
else
|
||||||
|
memcpy(ptrDst + offset, ptrSrc, szSrc);
|
||||||
|
} else {
|
||||||
|
ptrdiff_t offset = szSrc != szDst ? _batchId * szSrc / _batchSize : 0;
|
||||||
|
if ((ptrSrc + offset) == ptrDst)
|
||||||
|
return;
|
||||||
|
else
|
||||||
|
memcpy(ptrDst, ptrSrc + offset, szDst);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferRequest::CopyOutputsIfNeeded() {
|
||||||
|
for (const auto& it : _networkOutputs) {
|
||||||
|
auto& name = it.first;
|
||||||
|
// this request is already in BUSY state, so using the internal functions safely
|
||||||
|
CopyBlobIfNeeded(_myBatchedRequestWrapper._inferRequestBatched->GetBlob(name), GetBlob(name), false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> AutoBatchInferRequest::GetPerformanceCounts() const {
|
||||||
|
return _perfMap;
|
||||||
|
}
|
||||||
|
|
||||||
|
AutoBatchAsyncInferRequest::AutoBatchAsyncInferRequest(
|
||||||
|
const AutoBatchInferRequest::Ptr& inferRequest,
|
||||||
|
const bool needPerfCounters,
|
||||||
|
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
|
||||||
|
const ITaskExecutor::Ptr& callbackExecutor)
|
||||||
|
: AsyncInferRequestThreadSafeDefault(inferRequest, nullptr, callbackExecutor),
|
||||||
|
_inferRequestWithoutBatch(inferRequestWithoutBatch),
|
||||||
|
_inferRequest{inferRequest} {
|
||||||
|
// this executor starts the inference while the task (checking the result) is passed to the next stage
|
||||||
|
struct ThisRequestExecutor : public ITaskExecutor {
|
||||||
|
explicit ThisRequestExecutor(AutoBatchAsyncInferRequest* _this_) : _this{_this_} {}
|
||||||
|
void run(Task task) override {
|
||||||
|
auto& workerInferRequest = _this->_inferRequest->_myBatchedRequestWrapper;
|
||||||
|
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||||
|
t.first = _this;
|
||||||
|
t.second = std::move(task);
|
||||||
|
workerInferRequest._tasks.push(t);
|
||||||
|
// it is ok to call size() here as the queue only grows (and the bulk removal happens under the mutex)
|
||||||
|
const int sz = workerInferRequest._tasks.size();
|
||||||
|
if (sz == workerInferRequest._batchSize) {
|
||||||
|
workerInferRequest._cond.notify_one();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
AutoBatchAsyncInferRequest* _this = nullptr;
|
||||||
|
};
|
||||||
|
_pipeline = {
|
||||||
|
{/*TaskExecutor*/ std::make_shared<ThisRequestExecutor>(this), /*task*/ [this, needPerfCounters] {
|
||||||
|
if (this->_inferRequest->_exceptionPtr) // if the exception happened in the batch1 fallback
|
||||||
|
std::rethrow_exception(this->_inferRequest->_exceptionPtr);
|
||||||
|
if (this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr) // when the batchN execution failed
|
||||||
|
std::rethrow_exception(this->_inferRequest->_myBatchedRequestWrapper._exceptionPtr);
|
||||||
|
this->_inferRequest->CopyOutputsIfNeeded();
|
||||||
|
}}};
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchAsyncInferRequest::Infer_ThreadUnsafe() {
|
||||||
|
InferUsingAsync();
|
||||||
|
}
|
||||||
|
|
||||||
|
AutoBatchAsyncInferRequest::~AutoBatchAsyncInferRequest() {
|
||||||
|
StopAndWait();
|
||||||
|
}
|
||||||
|
|
||||||
|
// ------------------------------AutoBatchExecutableNetwork----------------------------
|
||||||
|
AutoBatchExecutableNetwork::AutoBatchExecutableNetwork(
|
||||||
|
const InferenceEngine::SoExecutableNetworkInternal& networkWithBatch,
|
||||||
|
const InferenceEngine::SoExecutableNetworkInternal& networkWithoutBatch,
|
||||||
|
const DeviceInformation& networkDevice,
|
||||||
|
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
|
||||||
|
const bool needPerfCounters)
|
||||||
|
: InferenceEngine::ExecutableNetworkThreadSafeDefault(nullptr,
|
||||||
|
std::make_shared<InferenceEngine::ImmediateExecutor>()),
|
||||||
|
_network{networkWithBatch},
|
||||||
|
_networkWithoutBatch{networkWithoutBatch},
|
||||||
|
_config{config},
|
||||||
|
_needPerfCounters{needPerfCounters} {
|
||||||
|
// WA for gcc 4.8 ( fails compilation with member init-list)
|
||||||
|
_device = networkDevice;
|
||||||
|
auto time_out = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
|
||||||
|
if (time_out != config.end())
|
||||||
|
_timeOut = ParseTimeoutValue(time_out->second.as<std::string>());
|
||||||
|
}
|
||||||
|
|
||||||
|
AutoBatchExecutableNetwork::~AutoBatchExecutableNetwork() {
|
||||||
|
_terminate = true;
|
||||||
|
for (auto w : _workerRequests) {
|
||||||
|
w->_thread.join();
|
||||||
|
}
|
||||||
|
_workerRequests.clear();
|
||||||
|
}
|
||||||
|
|
||||||
|
unsigned int AutoBatchExecutableNetwork::ParseTimeoutValue(const std::string& s) {
|
||||||
|
auto val = std::stoi(s);
|
||||||
|
if (val < 0)
|
||||||
|
IE_THROW(ParameterMismatch) << "Value for the " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " should be unsigned int";
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::shared_ptr<InferenceEngine::RemoteContext> AutoBatchExecutableNetwork::GetContext() const {
|
||||||
|
return _network->GetContext();
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequestImpl(
|
||||||
|
InferenceEngine::InputsDataMap networkInputs,
|
||||||
|
InferenceEngine::OutputsDataMap networkOutputs) {
|
||||||
|
// todo : guard request creation from another thread/on-the-fly
|
||||||
|
auto num = _numRequestsCreated++;
|
||||||
|
auto batch_id = num % _device.batchForDevice;
|
||||||
|
if (!batch_id) { // need new request
|
||||||
|
_workerRequests.push_back(std::make_shared<WorkerInferRequest>());
|
||||||
|
auto workerRequestPtr = _workerRequests.back();
|
||||||
|
workerRequestPtr->_inferRequestBatched = {_network->CreateInferRequest(), _network._so};
|
||||||
|
workerRequestPtr->_batchSize = _device.batchForDevice;
|
||||||
|
workerRequestPtr->_completionTasks.resize(workerRequestPtr->_batchSize);
|
||||||
|
workerRequestPtr->_inferRequestBatched->SetCallback(
|
||||||
|
[workerRequestPtr, this](std::exception_ptr exceptionPtr) mutable {
|
||||||
|
if (exceptionPtr)
|
||||||
|
workerRequestPtr->_exceptionPtr = exceptionPtr;
|
||||||
|
IE_ASSERT(workerRequestPtr->_completionTasks.size() == (size_t)workerRequestPtr->_batchSize);
|
||||||
|
// notify the individual requests on the completion
|
||||||
|
for (int c = 0; c < workerRequestPtr->_batchSize; c++) {
|
||||||
|
workerRequestPtr->_completionTasks[c]();
|
||||||
|
}
|
||||||
|
// reset the timeout
|
||||||
|
workerRequestPtr->_cond.notify_one();
|
||||||
|
});
|
||||||
|
|
||||||
|
workerRequestPtr->_thread = std::thread([workerRequestPtr, this] {
|
||||||
|
while (1) {
|
||||||
|
std::cv_status status;
|
||||||
|
{
|
||||||
|
std::unique_lock<std::mutex> lock(workerRequestPtr->_mutex);
|
||||||
|
status = workerRequestPtr->_cond.wait_for(lock, std::chrono::milliseconds(_timeOut));
|
||||||
|
}
|
||||||
|
if (_terminate) {
|
||||||
|
break;
|
||||||
|
} else {
|
||||||
|
// as we pop the tasks from the queue only here
|
||||||
|
// it is ok to call size() (as the _tasks can only grow in parallel)
|
||||||
|
const int sz = workerRequestPtr->_tasks.size();
|
||||||
|
if (sz == workerRequestPtr->_batchSize) {
|
||||||
|
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||||
|
for (int n = 0; n < sz; n++) {
|
||||||
|
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
|
||||||
|
workerRequestPtr->_completionTasks[n] = std::move(t.second);
|
||||||
|
t.first->_inferRequest->CopyInputsIfNeeded();
|
||||||
|
}
|
||||||
|
workerRequestPtr->_inferRequestBatched->StartAsync();
|
||||||
|
} else if ((status == std::cv_status::timeout) && sz) {
|
||||||
|
// timeout to collect the batch is over, have to execute the requests in the batch1 mode
|
||||||
|
std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task> t;
|
||||||
|
// popping all tasks collected by the moment of the time-out and execute each with batch1
|
||||||
|
std::atomic<int> arrived = {0};
|
||||||
|
std::promise<void> all_completed;
|
||||||
|
auto all_completed_future = all_completed.get_future();
|
||||||
|
for (int n = 0; n < sz; n++) {
|
||||||
|
IE_ASSERT(workerRequestPtr->_tasks.try_pop(t));
|
||||||
|
t.first->_inferRequestWithoutBatch->SetCallback(
|
||||||
|
[t, sz, &arrived, &all_completed](std::exception_ptr p) {
|
||||||
|
if (p)
|
||||||
|
t.first->_inferRequest->_exceptionPtr = p;
|
||||||
|
t.second();
|
||||||
|
if (sz == ++arrived)
|
||||||
|
all_completed.set_value();
|
||||||
|
});
|
||||||
|
t.first->_inferRequest->SetBlobsToAnotherRequest(t.first->_inferRequestWithoutBatch);
|
||||||
|
t.first->_inferRequestWithoutBatch->StartAsync();
|
||||||
|
}
|
||||||
|
all_completed_future.get();
|
||||||
|
// now when all the tasks for this batch are completed, start waiting for the timeout again
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
|
return std::make_shared<AutoBatchInferRequest>(networkInputs,
|
||||||
|
networkOutputs,
|
||||||
|
*_workerRequests.back(),
|
||||||
|
batch_id,
|
||||||
|
_device.batchForDevice,
|
||||||
|
_needPerfCounters);
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::IInferRequestInternal::Ptr AutoBatchExecutableNetwork::CreateInferRequest() {
|
||||||
|
auto syncRequestImpl = CreateInferRequestImpl(_networkInputs, _networkOutputs);
|
||||||
|
syncRequestImpl->setPointerToExecutableNetworkInternal(shared_from_this());
|
||||||
|
InferenceEngine::SoIInferRequestInternal inferRequestWithoutBatch = {_networkWithoutBatch->CreateInferRequest(),
|
||||||
|
_networkWithoutBatch._so};
|
||||||
|
return std::make_shared<AutoBatchAsyncInferRequest>(
|
||||||
|
std::static_pointer_cast<AutoBatchInferRequest>(syncRequestImpl),
|
||||||
|
_needPerfCounters,
|
||||||
|
inferRequestWithoutBatch,
|
||||||
|
_callbackExecutor);
|
||||||
|
}
|
||||||
|
|
||||||
|
std::shared_ptr<ngraph::Function> AutoBatchExecutableNetwork::GetExecGraphInfo() {
|
||||||
|
return _network->GetExecGraphInfo() ? _network->GetExecGraphInfo() : _networkWithoutBatch->GetExecGraphInfo();
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchExecutableNetwork::SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) {
|
||||||
|
auto timeout = config.find(CONFIG_KEY(AUTO_BATCH_TIMEOUT));
|
||||||
|
if (timeout == config.end() || config.size() > 1) {
|
||||||
|
IE_THROW() << "The only config that can be changed on the fly for the AutoBatching the is the "
|
||||||
|
<< CONFIG_KEY(AUTO_BATCH_TIMEOUT);
|
||||||
|
} else {
|
||||||
|
_timeOut = ParseTimeoutValue(timeout->second.as<std::string>());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetConfig(const std::string& name) const {
|
||||||
|
auto it = _config.find(name);
|
||||||
|
if (it != _config.end()) {
|
||||||
|
return it->second;
|
||||||
|
} else {
|
||||||
|
// find config key among networks config keys
|
||||||
|
auto param = _network->GetMetric(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||||
|
for (auto&& configKey : param.as<std::vector<std::string>>()) {
|
||||||
|
if (configKey == name) {
|
||||||
|
return _network->GetConfig(configKey);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
IE_THROW(NotFound) << name << " not found in the ExecutableNetwork config";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::Parameter AutoBatchExecutableNetwork::GetMetric(const std::string& name) const {
|
||||||
|
if (name == METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)) {
|
||||||
|
auto reqs = 0;
|
||||||
|
try {
|
||||||
|
auto hint = _network->GetConfig(CONFIG_KEY(PERFORMANCE_HINT_NUM_REQUESTS)).as<std::string>();
|
||||||
|
reqs = InferenceEngine::PerfHintsConfig::CheckPerformanceHintRequestValue(hint);
|
||||||
|
if (!reqs) // no limitations from user, let's deduce the full blown #requests
|
||||||
|
// (multiplied by the devices capabilities to run multiple <batched> requests for further perf)
|
||||||
|
reqs = _device.batchForDevice *
|
||||||
|
_network->GetMetric(METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)).as<unsigned int>();
|
||||||
|
} catch (const InferenceEngine::Exception& iie) {
|
||||||
|
}
|
||||||
|
reqs = std::max(reqs, _device.batchForDevice); // round up to the possible user's value
|
||||||
|
IE_SET_METRIC_RETURN(OPTIMAL_NUMBER_OF_INFER_REQUESTS, reqs);
|
||||||
|
} else if (name == METRIC_KEY(NETWORK_NAME)) {
|
||||||
|
IE_SET_METRIC_RETURN(NETWORK_NAME, _network->GetMetric(METRIC_KEY(NETWORK_NAME)).as<std::string>());
|
||||||
|
} else if (name == METRIC_KEY(SUPPORTED_METRICS)) {
|
||||||
|
IE_SET_METRIC_RETURN(SUPPORTED_METRICS,
|
||||||
|
{METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS),
|
||||||
|
METRIC_KEY(SUPPORTED_METRICS),
|
||||||
|
METRIC_KEY(NETWORK_NAME),
|
||||||
|
METRIC_KEY(SUPPORTED_CONFIG_KEYS)});
|
||||||
|
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
|
||||||
|
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS,
|
||||||
|
{CONFIG_KEY(AUTO_BATCH_TIMEOUT)}); // only timeout can be changed on the fly
|
||||||
|
} else {
|
||||||
|
IE_THROW() << "Unsupported Network metric: " << name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ------------------------------AutoBatchInferencePlugin----------------------------
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
|
||||||
|
std::map<std::string, std::string> mergeConfigs(std::map<std::string, std::string> config,
|
||||||
|
const std::map<std::string, std::string>& local) {
|
||||||
|
for (auto&& kvp : local) {
|
||||||
|
config[kvp.first] = kvp.second;
|
||||||
|
}
|
||||||
|
return config;
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace
|
||||||
|
|
||||||
|
std::map<std::string, std::string> AutoBatchInferencePlugin::GetSupportedConfig(
|
||||||
|
const std::map<std::string, std::string>& config,
|
||||||
|
const std::string& deviceName) const {
|
||||||
|
std::vector<std::string> supportedConfigKeys = GetCore()->GetMetric(deviceName, METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||||
|
std::map<std::string, std::string> supportedConfig;
|
||||||
|
for (auto&& key : supportedConfigKeys) {
|
||||||
|
auto itKey = config.find(key);
|
||||||
|
if (config.end() != itKey) {
|
||||||
|
supportedConfig[key] = itKey->second;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return supportedConfig;
|
||||||
|
}
|
||||||
|
|
||||||
|
DeviceInformation AutoBatchInferencePlugin::ParseBatchDevice(const std::string& deviceWithBatch) {
|
||||||
|
auto&& d = deviceWithBatch;
|
||||||
|
auto openingBracket = d.find_first_of('(');
|
||||||
|
auto closingBracket = d.find_first_of(')', openingBracket);
|
||||||
|
auto deviceName = d.substr(0, openingBracket);
|
||||||
|
|
||||||
|
int batch = 1;
|
||||||
|
if (closingBracket != std::string::npos && openingBracket < closingBracket) {
|
||||||
|
batch = std::stol(d.substr(openingBracket + 1, closingBracket - 1));
|
||||||
|
|
||||||
|
if (batch <= 0) {
|
||||||
|
IE_THROW() << "Batch value for '" << deviceName << "' must be > 0, while " << batch << "is passed";
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return {deviceName, {{}}, batch};
|
||||||
|
}
|
||||||
|
|
||||||
|
DeviceInformation AutoBatchInferencePlugin::ParseMetaDevice(const std::string& devicesBatchCfg,
|
||||||
|
const std::map<std::string, std::string>& config) const {
|
||||||
|
auto getDeviceConfig = [&](const DeviceName& deviceWithID) {
|
||||||
|
DeviceIDParser deviceParser(deviceWithID);
|
||||||
|
std::string deviceName = deviceParser.getDeviceName();
|
||||||
|
std::map<std::string, std::string> tconfig = mergeConfigs(_config, config);
|
||||||
|
|
||||||
|
// set device ID if any
|
||||||
|
std::string deviceIDLocal = deviceParser.getDeviceID();
|
||||||
|
if (!deviceIDLocal.empty()) {
|
||||||
|
tconfig[PluginConfigParams::KEY_DEVICE_ID] = deviceIDLocal;
|
||||||
|
}
|
||||||
|
|
||||||
|
return GetSupportedConfig(tconfig, deviceName);
|
||||||
|
};
|
||||||
|
|
||||||
|
auto metaDevice = ParseBatchDevice(devicesBatchCfg);
|
||||||
|
metaDevice.config = getDeviceConfig(metaDevice.deviceName);
|
||||||
|
|
||||||
|
auto cfg = config;
|
||||||
|
// check that no irrelevant config-keys left
|
||||||
|
for (auto k : config) {
|
||||||
|
const auto& name = k.first;
|
||||||
|
auto found_in_supported_cfg = std::find(supported_configKeys.begin(), supported_configKeys.end(), k.first);
|
||||||
|
auto found_in_device_cfg = metaDevice.config.find(k.first);
|
||||||
|
if (found_in_device_cfg == metaDevice.config.end() && found_in_supported_cfg == supported_configKeys.end()) {
|
||||||
|
IE_THROW() << "Unsupported config key: " << name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return metaDevice;
|
||||||
|
}
|
||||||
|
|
||||||
|
RemoteContext::Ptr AutoBatchInferencePlugin::CreateContext(const InferenceEngine::ParamMap& config) {
|
||||||
|
auto cfg = config;
|
||||||
|
auto it = cfg.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
|
||||||
|
if (it == cfg.end())
|
||||||
|
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
|
||||||
|
|
||||||
|
auto val = it->second;
|
||||||
|
auto metaDevice = ParseMetaDevice(val, std::map<std::string, std::string>());
|
||||||
|
cfg.erase(it);
|
||||||
|
return GetCore()->CreateContext(metaDevice.deviceName, cfg);
|
||||||
|
}
|
||||||
|
|
||||||
|
Parameter AutoBatchInferencePlugin::GetConfig(const std::string& name,
|
||||||
|
const std::map<std::string, Parameter>& options) const {
|
||||||
|
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), name)) {
|
||||||
|
auto it = _config.find(name);
|
||||||
|
if (it == _config.end()) {
|
||||||
|
IE_THROW() << "Value for " << name << " is not set";
|
||||||
|
} else {
|
||||||
|
return {it->second};
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
IE_THROW() << "Unsupported config key: " << name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferencePlugin::CheckConfig(const std::map<std::string, std::string>& config) {
|
||||||
|
for (auto&& kvp : config) {
|
||||||
|
const auto name = kvp.first;
|
||||||
|
const auto val = kvp.second;
|
||||||
|
if (supported_configKeys.end() == std::find(supported_configKeys.begin(), supported_configKeys.end(), name))
|
||||||
|
IE_THROW() << "Unsupported config key: " << name;
|
||||||
|
if (name == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
|
||||||
|
ParseBatchDevice(val);
|
||||||
|
} else if (name == CONFIG_KEY(AUTO_BATCH_TIMEOUT)) {
|
||||||
|
try {
|
||||||
|
auto t = std::stoi(val);
|
||||||
|
if (t < 0)
|
||||||
|
IE_THROW(ParameterMismatch);
|
||||||
|
} catch (const std::exception& e) {
|
||||||
|
IE_THROW(ParameterMismatch)
|
||||||
|
<< " Expecting unsigned int value for " << CONFIG_KEY(AUTO_BATCH_TIMEOUT) << " got " << val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void AutoBatchInferencePlugin::SetConfig(const std::map<std::string, std::string>& config) {
|
||||||
|
CheckConfig(config);
|
||||||
|
for (auto&& kvp : config) {
|
||||||
|
_config[kvp.first] = kvp.second;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static const Version version = {{2, 1}, CI_BUILD_NUMBER, "AutoBatchPlugin"};
|
||||||
|
IE_DEFINE_PLUGIN_CREATE_FUNCTION(AutoBatchInferencePlugin, version)
|
||||||
|
|
||||||
|
AutoBatchInferencePlugin::AutoBatchInferencePlugin() {
|
||||||
|
_pluginName = "BATCH";
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::Parameter AutoBatchInferencePlugin::GetMetric(
|
||||||
|
const std::string& name,
|
||||||
|
const std::map<std::string, InferenceEngine::Parameter>& options) const {
|
||||||
|
if (name == METRIC_KEY(SUPPORTED_METRICS)) {
|
||||||
|
std::vector<std::string> metrics;
|
||||||
|
metrics.push_back(METRIC_KEY(SUPPORTED_METRICS));
|
||||||
|
metrics.push_back(METRIC_KEY(FULL_DEVICE_NAME));
|
||||||
|
metrics.push_back(METRIC_KEY(SUPPORTED_CONFIG_KEYS));
|
||||||
|
IE_SET_METRIC_RETURN(SUPPORTED_METRICS, metrics);
|
||||||
|
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
|
||||||
|
IE_SET_METRIC_RETURN(FULL_DEVICE_NAME, _pluginName);
|
||||||
|
} else if (name == METRIC_KEY(SUPPORTED_CONFIG_KEYS)) {
|
||||||
|
IE_SET_METRIC_RETURN(SUPPORTED_CONFIG_KEYS, supported_configKeys);
|
||||||
|
} else {
|
||||||
|
IE_THROW(NotFound) << "Unsupported metric key " << name;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::map<std::string, std::string>& config) {
|
||||||
|
return LoadNetworkImpl(network, nullptr, config);
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::shared_ptr<InferenceEngine::RemoteContext> ctx,
|
||||||
|
const std::map<std::string, std::string>& config) {
|
||||||
|
if (GetCore() == nullptr) {
|
||||||
|
IE_THROW() << "Please, work with MULTI device via InferencEngine::Core object";
|
||||||
|
}
|
||||||
|
|
||||||
|
auto fullConfig = mergeConfigs(_config, config);
|
||||||
|
auto device_batch = fullConfig.find(CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG));
|
||||||
|
if (device_batch == fullConfig.end()) {
|
||||||
|
IE_THROW() << "KEY_AUTO_BATCH key is not set for BATCH device";
|
||||||
|
}
|
||||||
|
|
||||||
|
auto metaDevice = ParseMetaDevice(device_batch->second, fullConfig);
|
||||||
|
const auto& deviceName = metaDevice.deviceName;
|
||||||
|
const auto& deviceConfig = metaDevice.config;
|
||||||
|
const auto perfConfig = fullConfig.find(PluginConfigParams::KEY_PERF_COUNT);
|
||||||
|
const bool enablePerfCounters = (fullConfig.end() != perfConfig) && (perfConfig->second == PluginConfigParams::YES);
|
||||||
|
|
||||||
|
auto report_footprint = [](std::shared_ptr<ICore> pCore, std::string device) -> size_t {
|
||||||
|
size_t footprint = 0;
|
||||||
|
// TODO: use the per-network metric (22.2) rather than plugin-level
|
||||||
|
auto stats = pCore->GetMetric(device, GPU_METRIC_KEY(MEMORY_STATISTICS)).as<std::map<std::string, uint64_t>>();
|
||||||
|
for (auto s : stats)
|
||||||
|
if (s.first.find("_current") != std::string::npos)
|
||||||
|
footprint += s.second;
|
||||||
|
return footprint;
|
||||||
|
};
|
||||||
|
|
||||||
|
size_t batch1_footprint = 0;
|
||||||
|
if (deviceName.find("GPU") != std::string::npos)
|
||||||
|
batch1_footprint = report_footprint(GetCore(), deviceName);
|
||||||
|
auto executableNetworkWithoutBatch = ctx ? GetCore()->LoadNetwork(network, ctx, deviceConfig)
|
||||||
|
: GetCore()->LoadNetwork(network, deviceName, deviceConfig);
|
||||||
|
if (deviceName.find("GPU") != std::string::npos) {
|
||||||
|
batch1_footprint = report_footprint(GetCore(), deviceName) - batch1_footprint;
|
||||||
|
if (batch1_footprint) {
|
||||||
|
const uint64_t total_mem = GetCore()->GetMetric(deviceName, GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
|
||||||
|
const int estimated_batch = (total_mem - batch1_footprint) / batch1_footprint;
|
||||||
|
int closest = pow(2, floor(log(estimated_batch) / log(2)));
|
||||||
|
closest = std::max(1, closest);
|
||||||
|
metaDevice.batchForDevice = std::min(metaDevice.batchForDevice, closest);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// auto-batch settings
|
||||||
|
std::unordered_map<std::string, InferenceEngine::Parameter> networkConfig;
|
||||||
|
for (auto c : fullConfig) {
|
||||||
|
if (supported_configKeys.end() != std::find(supported_configKeys.begin(), supported_configKeys.end(), c.first))
|
||||||
|
networkConfig.insert(c);
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::SoExecutableNetworkInternal executableNetworkWithBatch;
|
||||||
|
if (metaDevice.batchForDevice > 1) {
|
||||||
|
try {
|
||||||
|
CNNNetwork clonedNetwork(InferenceEngine::details::cloneNetwork(network));
|
||||||
|
const InputsDataMap inputInfo = clonedNetwork.getInputsInfo();
|
||||||
|
ICNNNetwork::InputShapes shapes = clonedNetwork.getInputShapes();
|
||||||
|
for (const InputsDataMap::value_type& item : inputInfo) {
|
||||||
|
auto layout = item.second->getTensorDesc().getLayout();
|
||||||
|
// the below code is a placeholder for the WIP (22.1) functionality
|
||||||
|
// that will check the reshaping by the batch is robust (CVS-51744)
|
||||||
|
if (layout == InferenceEngine::Layout::NC || layout == InferenceEngine::Layout::NCDHW ||
|
||||||
|
layout == InferenceEngine::Layout::NCHW || layout == InferenceEngine::Layout::NHWC ||
|
||||||
|
layout == InferenceEngine::Layout::NDHWC) {
|
||||||
|
assert(1 == shapes[item.first][0]); // do not reshape/re-batch originally batched networks
|
||||||
|
shapes[item.first][0] = metaDevice.batchForDevice;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
clonedNetwork.reshape(shapes);
|
||||||
|
executableNetworkWithBatch =
|
||||||
|
ctx ? GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, ctx, deviceConfig)
|
||||||
|
: GetCore()->LoadNetwork(CNNNetwork{clonedNetwork}, deviceName, deviceConfig);
|
||||||
|
} catch (...) {
|
||||||
|
executableNetworkWithBatch = {nullptr, nullptr};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!executableNetworkWithBatch) {
|
||||||
|
executableNetworkWithBatch = executableNetworkWithoutBatch;
|
||||||
|
metaDevice.batchForDevice = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
return std::make_shared<AutoBatchExecutableNetwork>(executableNetworkWithBatch,
|
||||||
|
executableNetworkWithoutBatch,
|
||||||
|
metaDevice,
|
||||||
|
networkConfig,
|
||||||
|
enablePerfCounters);
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::IExecutableNetworkInternal::Ptr AutoBatchInferencePlugin::LoadExeNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
|
||||||
|
const std::map<std::string, std::string>& config) {
|
||||||
|
return LoadNetworkImpl(network, context, config);
|
||||||
|
}
|
||||||
|
|
||||||
|
InferenceEngine::QueryNetworkResult AutoBatchInferencePlugin::QueryNetwork(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::map<std::string, std::string>& config) const {
|
||||||
|
auto cfg = config;
|
||||||
|
for (auto c : cfg) {
|
||||||
|
if (c.first == CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG)) {
|
||||||
|
auto val = c.second;
|
||||||
|
cfg.erase(c.first);
|
||||||
|
auto metaDevice = ParseMetaDevice(val, cfg);
|
||||||
|
return GetCore()->QueryNetwork(network, metaDevice.deviceName, cfg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
IE_THROW() << "Value for KEY_AUTO_BATCH is not set";
|
||||||
|
}
|
||||||
|
} // namespace AutoBatchPlugin
|
159
src/plugins/auto_batch/auto_batch.hpp
Normal file
159
src/plugins/auto_batch/auto_batch.hpp
Normal file
@ -0,0 +1,159 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////////////////////////////
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <atomic>
|
||||||
|
#include <map>
|
||||||
|
#include <mutex>
|
||||||
|
#include <string>
|
||||||
|
#include <unordered_map>
|
||||||
|
#include <utility>
|
||||||
|
#include <vector>
|
||||||
|
|
||||||
|
#include "cpp_interfaces/impl/ie_executable_network_thread_safe_default.hpp"
|
||||||
|
#include "cpp_interfaces/impl/ie_infer_async_request_thread_safe_default.hpp"
|
||||||
|
#include "cpp_interfaces/interface/ie_iplugin_internal.hpp"
|
||||||
|
#include "ie_metric_helpers.hpp"
|
||||||
|
#include "threading/ie_thread_safe_containers.hpp"
|
||||||
|
|
||||||
|
namespace AutoBatchPlugin {
|
||||||
|
|
||||||
|
using DeviceName = std::string;
|
||||||
|
|
||||||
|
struct DeviceInformation {
|
||||||
|
DeviceName deviceName;
|
||||||
|
std::map<std::string, std::string> config;
|
||||||
|
int batchForDevice;
|
||||||
|
};
|
||||||
|
|
||||||
|
class AutoBatchAsyncInferRequest;
|
||||||
|
class AutoBatchExecutableNetwork : public InferenceEngine::ExecutableNetworkThreadSafeDefault {
|
||||||
|
public:
|
||||||
|
using Ptr = std::shared_ptr<AutoBatchExecutableNetwork>;
|
||||||
|
struct WorkerInferRequest {
|
||||||
|
using Ptr = std::shared_ptr<WorkerInferRequest>;
|
||||||
|
InferenceEngine::SoIInferRequestInternal _inferRequestBatched;
|
||||||
|
int _batchSize;
|
||||||
|
InferenceEngine::ThreadSafeQueueWithSize<std::pair<AutoBatchAsyncInferRequest*, InferenceEngine::Task>> _tasks;
|
||||||
|
std::vector<InferenceEngine::Task> _completionTasks;
|
||||||
|
std::thread _thread;
|
||||||
|
std::condition_variable _cond;
|
||||||
|
std::mutex _mutex;
|
||||||
|
std::exception_ptr _exceptionPtr;
|
||||||
|
};
|
||||||
|
|
||||||
|
explicit AutoBatchExecutableNetwork(
|
||||||
|
const InferenceEngine::SoExecutableNetworkInternal& networkForDevice,
|
||||||
|
const InferenceEngine::SoExecutableNetworkInternal& networkForDeviceWithoutBatch,
|
||||||
|
const DeviceInformation& networkDevices,
|
||||||
|
const std::unordered_map<std::string, InferenceEngine::Parameter>& config,
|
||||||
|
const bool needPerfCounters = false);
|
||||||
|
|
||||||
|
void SetConfig(const std::map<std::string, InferenceEngine::Parameter>& config) override;
|
||||||
|
InferenceEngine::Parameter GetConfig(const std::string& name) const override;
|
||||||
|
InferenceEngine::Parameter GetMetric(const std::string& name) const override;
|
||||||
|
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequest() override;
|
||||||
|
InferenceEngine::IInferRequestInternal::Ptr CreateInferRequestImpl(
|
||||||
|
InferenceEngine::InputsDataMap networkInputs,
|
||||||
|
InferenceEngine::OutputsDataMap networkOutputs) override;
|
||||||
|
std::shared_ptr<InferenceEngine::RemoteContext> GetContext() const override;
|
||||||
|
std::shared_ptr<ngraph::Function> GetExecGraphInfo() override;
|
||||||
|
virtual ~AutoBatchExecutableNetwork();
|
||||||
|
|
||||||
|
protected:
|
||||||
|
static unsigned int ParseTimeoutValue(const std::string&);
|
||||||
|
std::atomic_bool _terminate = {false};
|
||||||
|
DeviceInformation _device;
|
||||||
|
InferenceEngine::SoExecutableNetworkInternal _network;
|
||||||
|
InferenceEngine::SoExecutableNetworkInternal _networkWithoutBatch;
|
||||||
|
std::vector<WorkerInferRequest::Ptr> _workerRequests;
|
||||||
|
std::unordered_map<std::string, InferenceEngine::Parameter> _config;
|
||||||
|
bool _needPerfCounters = false;
|
||||||
|
std::atomic_size_t _numRequestsCreated = {0};
|
||||||
|
std::atomic_int _timeOut = {1000}; // in ms
|
||||||
|
};
|
||||||
|
|
||||||
|
class AutoBatchInferRequest : public InferenceEngine::IInferRequestInternal {
|
||||||
|
public:
|
||||||
|
using Ptr = std::shared_ptr<AutoBatchInferRequest>;
|
||||||
|
explicit AutoBatchInferRequest(const InferenceEngine::InputsDataMap& networkInputs,
|
||||||
|
const InferenceEngine::OutputsDataMap& networkOutputs,
|
||||||
|
AutoBatchExecutableNetwork::WorkerInferRequest& workerRequestPtr,
|
||||||
|
int batch_id,
|
||||||
|
int num_batch,
|
||||||
|
bool _needPerfCounters = false);
|
||||||
|
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> GetPerformanceCounts() const override;
|
||||||
|
|
||||||
|
// Batch-Device impl specific: sets the data (blobs from the device request to the batched device request)
|
||||||
|
void SetBlobsToAnotherRequest(InferenceEngine::SoIInferRequestInternal& req);
|
||||||
|
void CopyInputsIfNeeded();
|
||||||
|
void CopyOutputsIfNeeded();
|
||||||
|
AutoBatchExecutableNetwork::WorkerInferRequest& _myBatchedRequestWrapper;
|
||||||
|
std::exception_ptr _exceptionPtr;
|
||||||
|
|
||||||
|
protected:
|
||||||
|
std::map<std::string, InferenceEngine::InferenceEngineProfileInfo> _perfMap;
|
||||||
|
bool _needPerfCounters = false;
|
||||||
|
void CopyBlobIfNeeded(InferenceEngine::Blob::CPtr src, InferenceEngine::Blob::Ptr dst, bool bInput);
|
||||||
|
size_t _batchId;
|
||||||
|
size_t _batchSize;
|
||||||
|
};
|
||||||
|
|
||||||
|
class AutoBatchAsyncInferRequest : public InferenceEngine::AsyncInferRequestThreadSafeDefault {
|
||||||
|
public:
|
||||||
|
using Ptr = std::shared_ptr<AutoBatchAsyncInferRequest>;
|
||||||
|
|
||||||
|
explicit AutoBatchAsyncInferRequest(const AutoBatchInferRequest::Ptr& inferRequest,
|
||||||
|
const bool needPerfCounters,
|
||||||
|
InferenceEngine::SoIInferRequestInternal& inferRequestWithoutBatch,
|
||||||
|
const InferenceEngine::ITaskExecutor::Ptr& callbackExecutor);
|
||||||
|
void Infer_ThreadUnsafe() override;
|
||||||
|
virtual ~AutoBatchAsyncInferRequest();
|
||||||
|
|
||||||
|
InferenceEngine::SoIInferRequestInternal _inferRequestWithoutBatch;
|
||||||
|
AutoBatchInferRequest::Ptr _inferRequest;
|
||||||
|
};
|
||||||
|
|
||||||
|
class AutoBatchInferencePlugin : public InferenceEngine::IInferencePlugin {
|
||||||
|
public:
|
||||||
|
AutoBatchInferencePlugin();
|
||||||
|
virtual ~AutoBatchInferencePlugin() = default;
|
||||||
|
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::map<std::string, std::string>& config) override;
|
||||||
|
InferenceEngine::IExecutableNetworkInternal::Ptr LoadExeNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::shared_ptr<InferenceEngine::RemoteContext>& context,
|
||||||
|
const std::map<std::string, std::string>& config) override;
|
||||||
|
|
||||||
|
void SetConfig(const std::map<std::string, std::string>& config) override;
|
||||||
|
void CheckConfig(const std::map<std::string, std::string>& config);
|
||||||
|
|
||||||
|
InferenceEngine::Parameter GetConfig(
|
||||||
|
const std::string& name,
|
||||||
|
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
|
||||||
|
InferenceEngine::QueryNetworkResult QueryNetwork(const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::map<std::string, std::string>& config) const override;
|
||||||
|
InferenceEngine::Parameter GetMetric(
|
||||||
|
const std::string& name,
|
||||||
|
const std::map<std::string, InferenceEngine::Parameter>& options) const override;
|
||||||
|
InferenceEngine::RemoteContext::Ptr CreateContext(const InferenceEngine::ParamMap&) override;
|
||||||
|
|
||||||
|
protected:
|
||||||
|
DeviceInformation ParseMetaDevice(const std::string& devicesBatchCfg,
|
||||||
|
const std::map<std::string, std::string>& config) const;
|
||||||
|
|
||||||
|
std::map<std::string, std::string> GetSupportedConfig(const std::map<std::string, std::string>& config,
|
||||||
|
const DeviceName& deviceName) const;
|
||||||
|
static DeviceInformation ParseBatchDevice(const std::string& deviceWithBatch);
|
||||||
|
|
||||||
|
InferenceEngine::IExecutableNetworkInternal::Ptr LoadNetworkImpl(
|
||||||
|
const InferenceEngine::CNNNetwork& network,
|
||||||
|
const std::shared_ptr<InferenceEngine::RemoteContext> context,
|
||||||
|
const std::map<std::string, std::string>& config);
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace AutoBatchPlugin
|
@ -609,11 +609,9 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::CNNNetwork &network, const std
|
|||||||
// the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized
|
// the more "capable" the CPU in general, the more streams we may want to keep to keep it utilized
|
||||||
const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold;
|
const float memThresholdAssumeLimitedForISA = ov::MemBandwidthPressure::LIMITED/isaSpecificThreshold;
|
||||||
const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */);
|
const float L2_cache_size = mkldnn::utils::get_cache_size(2 /*level*/, true /*per core */);
|
||||||
const float L3_cache_size = mkldnn::utils::get_cache_size(3, false);
|
|
||||||
ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance(
|
ov::MemBandwidthPressure networkToleranceForLowCache = ov::MemBandwidthPressureTolerance(
|
||||||
clonedNetwork.getFunction(),
|
clonedNetwork.getFunction(),
|
||||||
L2_cache_size, L3_cache_size,
|
L2_cache_size, memThresholdAssumeLimitedForISA);
|
||||||
memThresholdAssumeLimitedForISA);
|
|
||||||
// num of phys CPU cores (most aggressive value for #streams)
|
// num of phys CPU cores (most aggressive value for #streams)
|
||||||
const auto num_cores = getNumberOfCPUCores();
|
const auto num_cores = getNumberOfCPUCores();
|
||||||
// less aggressive
|
// less aggressive
|
||||||
|
@ -28,6 +28,7 @@
|
|||||||
|
|
||||||
#include "intel_gpu/runtime/device_query.hpp"
|
#include "intel_gpu/runtime/device_query.hpp"
|
||||||
#include "intel_gpu/runtime/debug_configuration.hpp"
|
#include "intel_gpu/runtime/debug_configuration.hpp"
|
||||||
|
#include <performance_heuristics.hpp>
|
||||||
#ifdef __linux__
|
#ifdef __linux__
|
||||||
# include <dlfcn.h>
|
# include <dlfcn.h>
|
||||||
#endif
|
#endif
|
||||||
@ -681,6 +682,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
|||||||
metrics.push_back(METRIC_KEY(RANGE_FOR_STREAMS));
|
metrics.push_back(METRIC_KEY(RANGE_FOR_STREAMS));
|
||||||
metrics.push_back(METRIC_KEY(DEVICE_TYPE));
|
metrics.push_back(METRIC_KEY(DEVICE_TYPE));
|
||||||
metrics.push_back(METRIC_KEY(DEVICE_GOPS));
|
metrics.push_back(METRIC_KEY(DEVICE_GOPS));
|
||||||
|
metrics.push_back(METRIC_KEY(OPTIMAL_BATCH_SIZE));
|
||||||
metrics.push_back(GPU_METRIC_KEY(MAX_BATCH_SIZE));
|
metrics.push_back(GPU_METRIC_KEY(MAX_BATCH_SIZE));
|
||||||
metrics.push_back(GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
|
metrics.push_back(GPU_METRIC_KEY(DEVICE_TOTAL_MEM_SIZE));
|
||||||
metrics.push_back(GPU_METRIC_KEY(UARCH_VERSION));
|
metrics.push_back(GPU_METRIC_KEY(UARCH_VERSION));
|
||||||
@ -716,6 +718,76 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
|||||||
<< static_cast<int>(device_info.gfx_ver.revision);
|
<< static_cast<int>(device_info.gfx_ver.revision);
|
||||||
}
|
}
|
||||||
IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str());
|
IE_SET_METRIC_RETURN(GPU_UARCH_VERSION, s.str());
|
||||||
|
} else if (name == METRIC_KEY(OPTIMAL_BATCH_SIZE)) {
|
||||||
|
auto next_pow_of_2 = [] (float x) {
|
||||||
|
return pow(2, ceil(log(x)/log(2)));
|
||||||
|
};
|
||||||
|
auto closest_pow_of_2 = [] (float x) {
|
||||||
|
return pow(2, floor(log(x)/log(2)));
|
||||||
|
};
|
||||||
|
auto model_param = options.find("MODEL_PTR");
|
||||||
|
if (model_param == options.end()) {
|
||||||
|
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||||
|
GPU_DEBUG_COUT << "[GPU_OPTIMAL_BATCH_SIZE] MODELS_PTR is not set: return 1" << std::endl;
|
||||||
|
}
|
||||||
|
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, static_cast<unsigned int>(1));
|
||||||
|
}
|
||||||
|
std::shared_ptr<ngraph::Function> model;
|
||||||
|
try {
|
||||||
|
model = model_param->second.as<std::shared_ptr<ngraph::Function>>();
|
||||||
|
} catch (...) {
|
||||||
|
IE_THROW() << "[GPU_OPTIMAL_BATCH_SIZE] MODEL_PTR should be std::shared_ptr<ngraph::Function> type";
|
||||||
|
}
|
||||||
|
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||||
|
GPU_DEBUG_COUT << "DEVICE_INFO:"
|
||||||
|
<< "gfx_version.major, " << device_info.gfx_ver.major
|
||||||
|
<< "gfx_version.minor " << std::to_string(device_info.gfx_ver.minor) << std::endl;
|
||||||
|
}
|
||||||
|
static std::map<cldnn::gfx_version, size_t> gen_kbytes_per_bank = {
|
||||||
|
{{12, 0, 0}, 480}, // TGL
|
||||||
|
{{12, 1, 0}, 2048}, // DG1
|
||||||
|
{{12, 5, 0}, 320},
|
||||||
|
{{12, 7, 0}, 512},
|
||||||
|
};
|
||||||
|
size_t L3_cache_size = device_info.gfx_ver.major && (device_info.gfx_ver.major <= 9)
|
||||||
|
? 768 * 1024 // Gen9
|
||||||
|
: 2 * 768 * 1024; //reasonable default when no arch has been detected (e.g. due to old driver ver)
|
||||||
|
cldnn::gfx_version gen = {device_info.gfx_ver.major, device_info.gfx_ver.minor, 0 /*ignore the revision*/};
|
||||||
|
auto val = gen_kbytes_per_bank.find(gen);
|
||||||
|
if (gen_kbytes_per_bank.end() != val) {
|
||||||
|
auto kbytes_per_bank = val->second;
|
||||||
|
auto num_banks_per_slice = device_info.num_sub_slices_per_slice > 4
|
||||||
|
? next_pow_of_2(device_info.num_sub_slices_per_slice)
|
||||||
|
: 2 * device_info.num_sub_slices_per_slice;
|
||||||
|
L3_cache_size = kbytes_per_bank * 1024 * num_banks_per_slice * device_info.num_slices;
|
||||||
|
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||||
|
GPU_DEBUG_COUT << "DEVICE_INFO:"
|
||||||
|
<< "num_slices " << device_info.num_slices
|
||||||
|
<< ", num_sub_slices_per_slice " << device_info.num_sub_slices_per_slice
|
||||||
|
<< ", num_banks_per_slice " << num_banks_per_slice
|
||||||
|
<< ", gen_kbytes_per_bank : " << kbytes_per_bank
|
||||||
|
<< ", L3_cache_size is (MB): " << float(L3_cache_size) / 1024 / 1024 << std::endl;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
Config config = _impl->m_configs.GetConfig(device_id);
|
||||||
|
auto networkCloned = CloneAndTransformNetwork(CNNNetwork(model), config);
|
||||||
|
ov::MemBandwidthPressure memPressure = ov::MemBandwidthPressureTolerance(networkCloned.getFunction(), L3_cache_size);
|
||||||
|
unsigned int batch = 1;
|
||||||
|
if (memPressure.max_mem_tolerance != ov::MemBandwidthPressure::UNKNOWN)
|
||||||
|
batch = std::max(1.0, 16 * closest_pow_of_2(memPressure.max_mem_tolerance));
|
||||||
|
std::map<std::string, InferenceEngine::Parameter> options_for_max_batch;
|
||||||
|
options_for_max_batch["MODEL_PTR"] = model;
|
||||||
|
options_for_max_batch["GPU_THROUGHPUT_STREAMS"] = CONFIG_VALUE(GPU_THROUGHPUT_AUTO);
|
||||||
|
auto max_batch_size = GetMetric(GPU_METRIC_KEY(MAX_BATCH_SIZE), options_for_max_batch).as<unsigned int>();
|
||||||
|
unsigned int closest = closest_pow_of_2(max_batch_size);
|
||||||
|
batch = std::min(closest, batch);
|
||||||
|
batch = std::min(256u, batch); //batch 256 is a max
|
||||||
|
GPU_DEBUG_IF(debug_config->verbose >= 1) {
|
||||||
|
GPU_DEBUG_COUT << memPressure.max_mem_tolerance << std::endl;
|
||||||
|
GPU_DEBUG_COUT << "MAX_BATCH: " << max_batch_size << std::endl;
|
||||||
|
GPU_DEBUG_COUT << "ACTUAL OPTIMAL BATCH: " << batch << std::endl;
|
||||||
|
}
|
||||||
|
IE_SET_METRIC_RETURN(OPTIMAL_BATCH_SIZE, batch);
|
||||||
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
|
} else if (name == METRIC_KEY(FULL_DEVICE_NAME)) {
|
||||||
auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false);
|
auto deviceName = StringRightTrim(device_info.dev_name, "NEO", false);
|
||||||
deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")";
|
deviceName += std::string(" (") + (device_info.dev_type == cldnn::device_type::discrete_gpu ? "dGPU" : "iGPU") + ")";
|
||||||
@ -885,7 +957,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
|||||||
TransformationsPipeline transformations(config, device_info);
|
TransformationsPipeline transformations(config, device_info);
|
||||||
transformations.apply(nGraphFunc);
|
transformations.apply(nGraphFunc);
|
||||||
program = std::make_shared<Program>(cloned_network, engine, config, false, true);
|
program = std::make_shared<Program>(cloned_network, engine, config, false, true);
|
||||||
std::pair<int64_t, int64_t> device_memory_usage = program->GetCompiledProgram(0)->get_estimated_device_mem_usage();
|
std::pair<int64_t, int64_t> device_memory_usage = program->GetCompiledProgram(0)->get_estimated_device_mem_usage();
|
||||||
int64_t mem_for_general = std::max(static_cast<int64_t>(1L),
|
int64_t mem_for_general = std::max(static_cast<int64_t>(1L),
|
||||||
static_cast<int64_t>(static_cast<int64_t>(available_device_mem) - device_memory_usage.first));
|
static_cast<int64_t>(static_cast<int64_t>(available_device_mem) - device_memory_usage.first));
|
||||||
int64_t mem_per_batch = std::max(static_cast<int64_t>(1L), (device_memory_usage.second / static_cast<int64_t>(base_batch_size)));
|
int64_t mem_per_batch = std::max(static_cast<int64_t>(1L), (device_memory_usage.second / static_cast<int64_t>(base_batch_size)));
|
||||||
|
@ -48,6 +48,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
|||||||
list(APPEND DEPENDENCIES ov_auto_plugin)
|
list(APPEND DEPENDENCIES ov_auto_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
|
||||||
|
endif()
|
||||||
|
|
||||||
if (NOT ENABLE_OV_ONNX_FRONTEND)
|
if (NOT ENABLE_OV_ONNX_FRONTEND)
|
||||||
list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader")
|
list(APPEND EXCLUDED_SOURCE_PATHS "${CMAKE_CURRENT_SOURCE_DIR}/onnx_reader")
|
||||||
endif()
|
endif()
|
||||||
|
@ -24,6 +24,7 @@ inline const std::string getPluginLibNameByDevice(const std::string& deviceName)
|
|||||||
{ "GNA", "ov_intel_gna_plugin" },
|
{ "GNA", "ov_intel_gna_plugin" },
|
||||||
{ "GPU", "ov_intel_gpu_plugin" },
|
{ "GPU", "ov_intel_gpu_plugin" },
|
||||||
{ "HETERO", "ov_hetero_plugin" },
|
{ "HETERO", "ov_hetero_plugin" },
|
||||||
|
{ "BATCH", "ov_auto_batch_plugin" },
|
||||||
{ "MULTI", "ov_multi_plugin" },
|
{ "MULTI", "ov_multi_plugin" },
|
||||||
{ "MYRIAD", "myriadPlugin" },
|
{ "MYRIAD", "myriadPlugin" },
|
||||||
{ "TEMPLATE", "ov_template_plugin" },
|
{ "TEMPLATE", "ov_template_plugin" },
|
||||||
@ -42,6 +43,11 @@ inline const std::pair<std::string, std::string> generateDefaultHeteroConfig() {
|
|||||||
return { "TARGET_FALLBACK" , ConformanceTests::targetDevice };
|
return { "TARGET_FALLBACK" , ConformanceTests::targetDevice };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
inline const std::pair<std::string, std::string> generateDefaultBatchConfig() {
|
||||||
|
// auto-batching with batch 1 (no real batching in fact, but full machinery is in action)
|
||||||
|
return { CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , std::string(ConformanceTests::targetDevice)};
|
||||||
|
}
|
||||||
|
|
||||||
inline const std::vector<std::map<std::string, std::string>> generateConfigs(const std::string& targetDevice,
|
inline const std::vector<std::map<std::string, std::string>> generateConfigs(const std::string& targetDevice,
|
||||||
const std::vector<std::map<std::string, std::string>>& config = {}) {
|
const std::vector<std::map<std::string, std::string>>& config = {}) {
|
||||||
std::pair<std::string, std::string> defaultConfig;
|
std::pair<std::string, std::string> defaultConfig;
|
||||||
@ -49,6 +55,8 @@ inline const std::vector<std::map<std::string, std::string>> generateConfigs(con
|
|||||||
defaultConfig = generateDefaultMultiConfig();
|
defaultConfig = generateDefaultMultiConfig();
|
||||||
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_HETERO)) {
|
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_HETERO)) {
|
||||||
defaultConfig = generateDefaultHeteroConfig();
|
defaultConfig = generateDefaultHeteroConfig();
|
||||||
|
} else if (targetDevice == std::string(CommonTestUtils::DEVICE_BATCH)) {
|
||||||
|
defaultConfig = generateDefaultBatchConfig();
|
||||||
} else {
|
} else {
|
||||||
throw std::runtime_error("Incorrect target device: " + targetDevice);
|
throw std::runtime_error("Incorrect target device: " + targetDevice);
|
||||||
}
|
}
|
||||||
@ -70,7 +78,8 @@ inline const std::string generateComplexDeviceName(const std::string& deviceName
|
|||||||
|
|
||||||
inline const std::vector<std::string> returnAllPossibleDeviceCombination() {
|
inline const std::vector<std::string> returnAllPossibleDeviceCombination() {
|
||||||
std::vector<std::string> res{ConformanceTests::targetDevice};
|
std::vector<std::string> res{ConformanceTests::targetDevice};
|
||||||
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO, CommonTestUtils::DEVICE_MULTI};
|
std::vector<std::string> devices{CommonTestUtils::DEVICE_HETERO, CommonTestUtils::DEVICE_AUTO,
|
||||||
|
CommonTestUtils::DEVICE_BATCH, CommonTestUtils::DEVICE_MULTI};
|
||||||
for (const auto& device : devices) {
|
for (const auto& device : devices) {
|
||||||
res.emplace_back(generateComplexDeviceName(device));
|
res.emplace_back(generateComplexDeviceName(device));
|
||||||
}
|
}
|
||||||
|
@ -33,4 +33,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestCallbackTests,
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||||
InferRequestCallbackTests::getTestCaseName);
|
InferRequestCallbackTests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestCallbackTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||||
|
InferRequestCallbackTests::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -36,4 +36,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestIOBBlobTest,
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||||
InferRequestIOBBlobTest::getTestCaseName);
|
InferRequestIOBBlobTest::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestIOBBlobTest,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||||
|
InferRequestIOBBlobTest::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -38,4 +38,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestMultithreadingT
|
|||||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||||
InferRequestMultithreadingTests::getTestCaseName);
|
InferRequestMultithreadingTests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestMultithreadingTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||||
|
InferRequestMultithreadingTests::getTestCaseName);
|
||||||
|
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -46,4 +46,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Hetero, InferRequestSetBlobByType,
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
::testing::Values(CommonTestUtils::DEVICE_HETERO),
|
||||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||||
InferRequestSetBlobByType::getTestCaseName);
|
InferRequestSetBlobByType::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_Behavior_Batch, InferRequestSetBlobByType,
|
||||||
|
::testing::Combine(::testing::ValuesIn(setBlobTypes),
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||||
|
InferRequestSetBlobByType::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -37,4 +37,9 @@ INSTANTIATE_TEST_SUITE_P(smoke_Hetero_BehaviorTests, InferRequestWaitTests,
|
|||||||
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_HETERO))),
|
||||||
InferRequestWaitTests::getTestCaseName);
|
InferRequestWaitTests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_Batch_BehaviorTests, InferRequestWaitTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(generateConfigs(CommonTestUtils::DEVICE_BATCH))),
|
||||||
|
InferRequestWaitTests::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -0,0 +1,31 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
#include <auto_batching/auto_batching_tests.hpp>
|
||||||
|
|
||||||
|
const std::vector<bool> get_vs_set{ true, false };
|
||||||
|
const std::vector<size_t> num_streams{ 1, 2 };
|
||||||
|
const std::vector<size_t> num_requests{ 1, 3, 8, 9, 16, 64 };
|
||||||
|
const std::vector<size_t> num_batch{ 1, 4, 8, 16, 32, 64, 128, 256 };
|
||||||
|
using namespace AutoBatchingTests;
|
||||||
|
|
||||||
|
namespace {
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_CPU),
|
||||||
|
::testing::ValuesIn(get_vs_set),
|
||||||
|
::testing::ValuesIn(num_streams),
|
||||||
|
::testing::ValuesIn(num_requests),
|
||||||
|
::testing::ValuesIn(num_batch)),
|
||||||
|
AutoBatching_Test::getTestCaseName);
|
||||||
|
// TODO: for 22.2 (CVS-68949)
|
||||||
|
//INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_CPU, AutoBatching_Test_DetectionOutput,
|
||||||
|
// ::testing::Combine(
|
||||||
|
// ::testing::Values(CommonTestUtils::DEVICE_CPU),
|
||||||
|
// ::testing::ValuesIn(get_vs_set),
|
||||||
|
// ::testing::ValuesIn(num_streams),
|
||||||
|
// ::testing::ValuesIn(num_requests),
|
||||||
|
// ::testing::ValuesIn(num_batch)),
|
||||||
|
// AutoBatching_Test_DetectionOutput::getTestCaseName);
|
||||||
|
|
||||||
|
} // namespace
|
@ -21,16 +21,27 @@ using namespace ::testing;
|
|||||||
using namespace InferenceEngine;
|
using namespace InferenceEngine;
|
||||||
using namespace InferenceEngine::gpu;
|
using namespace InferenceEngine::gpu;
|
||||||
|
|
||||||
class RemoteBlob_Test : public CommonTestUtils::TestsCommon {
|
class RemoteBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<bool> {
|
||||||
protected:
|
protected:
|
||||||
std::shared_ptr<ngraph::Function> fn_ptr;
|
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||||
|
std::string deviceName;
|
||||||
|
|
||||||
|
public:
|
||||||
void SetUp() override {
|
void SetUp() override {
|
||||||
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||||
|
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||||
|
auto with_auto_batching = this->GetParam();
|
||||||
|
if (with_auto_batching) { // BATCH:GPU
|
||||||
|
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||||
|
auto with_auto_batch = obj.param;
|
||||||
|
return std::string("RemoteBlob_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
TEST_P(RemoteBlob_Test, smoke_canInputUserBlob) {
|
||||||
#if defined(ANDROID)
|
#if defined(ANDROID)
|
||||||
GTEST_SKIP();
|
GTEST_SKIP();
|
||||||
#endif
|
#endif
|
||||||
@ -41,7 +52,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
|||||||
|
|
||||||
// TODO: Issue: investigate issue with IECore
|
// TODO: Issue: investigate issue with IECore
|
||||||
auto ie = InferenceEngine::Core();
|
auto ie = InferenceEngine::Core();
|
||||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
auto exec_net = ie.LoadNetwork(net, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net.CreateInferRequest();
|
auto inf_req_regular = exec_net.CreateInferRequest();
|
||||||
@ -70,6 +81,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
|||||||
|
|
||||||
Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context,
|
Blob::Ptr shared_blob = make_shared_blob(net.getInputsInfo().begin()->second->getTensorDesc(), cldnn_context,
|
||||||
shared_buffer);
|
shared_buffer);
|
||||||
|
shared_blob->allocate();
|
||||||
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob);
|
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, shared_blob);
|
||||||
|
|
||||||
inf_req_shared.Infer();
|
inf_req_shared.Infer();
|
||||||
@ -85,7 +97,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputUserBlob) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
TEST_P(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
||||||
#if defined(ANDROID)
|
#if defined(ANDROID)
|
||||||
GTEST_SKIP();
|
GTEST_SKIP();
|
||||||
#endif
|
#endif
|
||||||
@ -96,7 +108,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
|||||||
|
|
||||||
// TODO: Issue: investigate issue with IECore
|
// TODO: Issue: investigate issue with IECore
|
||||||
auto ie = InferenceEngine::Core();
|
auto ie = InferenceEngine::Core();
|
||||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
auto exec_net = ie.LoadNetwork(net, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net.CreateInferRequest();
|
auto inf_req_regular = exec_net.CreateInferRequest();
|
||||||
@ -139,7 +151,7 @@ TEST_F(RemoteBlob_Test, smoke_canInputPluginRemoteBlob) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||||
auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
auto fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||||
CNNNetwork net(fn_ptr);
|
CNNNetwork net(fn_ptr);
|
||||||
|
|
||||||
@ -149,7 +161,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
|||||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||||
|
|
||||||
auto ie = PluginCache::get().ie();
|
auto ie = PluginCache::get().ie();
|
||||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||||
@ -161,7 +173,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
|||||||
|
|
||||||
// inference using remote blob
|
// inference using remote blob
|
||||||
auto ocl_instance = std::make_shared<OpenCL>();
|
auto ocl_instance = std::make_shared<OpenCL>();
|
||||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_context.get());
|
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_context.get());
|
||||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||||
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData);
|
inf_req_shared.SetBlob(net.getInputsInfo().begin()->first, fakeImageData);
|
||||||
@ -178,7 +190,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||||
#if defined _WIN32
|
#if defined _WIN32
|
||||||
GTEST_SKIP();
|
GTEST_SKIP();
|
||||||
#endif
|
#endif
|
||||||
@ -191,7 +203,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
|||||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||||
|
|
||||||
auto ie = PluginCache::get().ie();
|
auto ie = PluginCache::get().ie();
|
||||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||||
@ -214,7 +226,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
|||||||
|
|
||||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||||
// without calling thread blocks
|
// without calling thread blocks
|
||||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
|
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||||
|
|
||||||
@ -270,7 +282,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||||
#if defined _WIN32
|
#if defined _WIN32
|
||||||
GTEST_SKIP();
|
GTEST_SKIP();
|
||||||
#endif
|
#endif
|
||||||
@ -283,7 +295,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
|||||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||||
|
|
||||||
auto ie = PluginCache::get().ie();
|
auto ie = PluginCache::get().ie();
|
||||||
auto exec_net_regular = ie->LoadNetwork(net, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie->LoadNetwork(net, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||||
@ -307,7 +319,7 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
|||||||
|
|
||||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||||
// without calling thread blocks
|
// without calling thread blocks
|
||||||
auto remote_context = make_shared_context(*ie, CommonTestUtils::DEVICE_GPU, ocl_instance->_queue.get());
|
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
auto exec_net_shared = ie->LoadNetwork(net, remote_context);
|
||||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||||
|
|
||||||
@ -358,6 +370,10 @@ TEST_F(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
std::vector<bool> with_auto_batching {true, false};
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_RemoteBlob, RemoteBlob_Test, ::testing::ValuesIn(with_auto_batching),
|
||||||
|
RemoteBlob_Test::getTestCaseName);
|
||||||
|
|
||||||
class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<size_t> {
|
class BatchedBlob_Test : public CommonTestUtils::TestsCommon, public testing::WithParamInterface<size_t> {
|
||||||
void SetUp() override {
|
void SetUp() override {
|
||||||
num_batch = this->GetParam();
|
num_batch = this->GetParam();
|
||||||
|
@ -30,6 +30,7 @@ protected:
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
std::vector<bool> ov_with_auto_batching {true, false};
|
||||||
enum class RemoteTensorSharingType {
|
enum class RemoteTensorSharingType {
|
||||||
USER_CL_TENSOR = 0,
|
USER_CL_TENSOR = 0,
|
||||||
PLUGIN_CL_TENSOR = 1,
|
PLUGIN_CL_TENSOR = 1,
|
||||||
@ -54,17 +55,34 @@ std::ostream& operator<<(std::ostream& stream, RemoteTensorSharingType sharing_t
|
|||||||
return stream;
|
return stream;
|
||||||
}
|
}
|
||||||
|
|
||||||
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test, public testing::WithParamInterface<RemoteTensorSharingType> {
|
using RemoteTensorSharingTestOptionsParams = std::tuple<RemoteTensorSharingType, bool /*auto-batching*/>;
|
||||||
|
|
||||||
|
class OVRemoteTensorInputBlob_Test : public OVRemoteTensor_Test,
|
||||||
|
public testing::WithParamInterface<RemoteTensorSharingTestOptionsParams> {
|
||||||
|
protected:
|
||||||
|
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||||
|
std::string deviceName;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
void SetUp() override {
|
void SetUp() override {
|
||||||
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||||
|
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||||
|
RemoteTensorSharingType sharing_type;
|
||||||
|
bool with_auto_batching;
|
||||||
|
std::tie(sharing_type, with_auto_batching) = this->GetParam();
|
||||||
|
if (with_auto_batching) // BATCH:GPU
|
||||||
|
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||||
}
|
}
|
||||||
|
static std::string getTestCaseName(const testing::TestParamInfo<RemoteTensorSharingTestOptionsParams>& obj) {
|
||||||
static std::string getTestCaseName(testing::TestParamInfo<RemoteTensorSharingType> obj) {
|
RemoteTensorSharingType sharing_type;
|
||||||
RemoteTensorSharingType sharing_type = obj.param;
|
bool with_auto_batching;
|
||||||
|
std::tie(sharing_type, with_auto_batching) = obj.param;
|
||||||
|
|
||||||
std::ostringstream result;
|
std::ostringstream result;
|
||||||
|
result << "OVRemoteTensorInputBlob_Test_";
|
||||||
result << sharing_type;
|
result << sharing_type;
|
||||||
|
if (with_auto_batching)
|
||||||
|
result << "_WITH_AUTO_BATCHING";
|
||||||
return result.str();
|
return result.str();
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@ -81,9 +99,17 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
|
|||||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||||
|
|
||||||
auto function = p.build();
|
auto function = p.build();
|
||||||
auto exec_net = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
RemoteTensorSharingType sharing_type;
|
||||||
|
bool with_auto_batching;
|
||||||
|
std::tie(sharing_type, with_auto_batching) = GetParam();
|
||||||
|
|
||||||
RemoteTensorSharingType sharing_type = GetParam();
|
// auto-batching relies on availability of the lock() for the tensor (and the *USM_DEVICE is not lockable)
|
||||||
|
if (with_auto_batching
|
||||||
|
&& (RemoteTensorSharingType::USER_USM_DEVICE_TENSOR == sharing_type
|
||||||
|
|| RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR == sharing_type))
|
||||||
|
GTEST_SKIP();
|
||||||
|
|
||||||
|
auto exec_net = ie.compile_model(function, deviceName);
|
||||||
|
|
||||||
// regular inference
|
// regular inference
|
||||||
auto inf_req_regular = exec_net.create_infer_request();
|
auto inf_req_regular = exec_net.create_infer_request();
|
||||||
@ -244,6 +270,7 @@ TEST_P(OVRemoteTensorInputBlob_Test, smoke_canInputRemoteTensor) {
|
|||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
smoke_GPU,
|
smoke_GPU,
|
||||||
OVRemoteTensorInputBlob_Test,
|
OVRemoteTensorInputBlob_Test,
|
||||||
|
::testing::Combine(
|
||||||
::testing::ValuesIn(std::vector<RemoteTensorSharingType>{RemoteTensorSharingType::USER_CL_TENSOR,
|
::testing::ValuesIn(std::vector<RemoteTensorSharingType>{RemoteTensorSharingType::USER_CL_TENSOR,
|
||||||
RemoteTensorSharingType::PLUGIN_CL_TENSOR,
|
RemoteTensorSharingType::PLUGIN_CL_TENSOR,
|
||||||
RemoteTensorSharingType::USER_USM_HOST_TENSOR,
|
RemoteTensorSharingType::USER_USM_HOST_TENSOR,
|
||||||
@ -251,9 +278,29 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR,
|
RemoteTensorSharingType::PLUGIN_USM_HOST_TENSOR,
|
||||||
RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR,
|
RemoteTensorSharingType::PLUGIN_USM_DEVICE_TENSOR,
|
||||||
RemoteTensorSharingType::PLUGIN_HOST_TENSOR}),
|
RemoteTensorSharingType::PLUGIN_HOST_TENSOR}),
|
||||||
|
::testing::ValuesIn(ov_with_auto_batching)),
|
||||||
OVRemoteTensorInputBlob_Test::getTestCaseName);
|
OVRemoteTensorInputBlob_Test::getTestCaseName);
|
||||||
|
|
||||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
class OVRemoteTensor_TestsWithContext : public OVRemoteTensor_Test, public testing::WithParamInterface<bool> {
|
||||||
|
protected:
|
||||||
|
std::shared_ptr<ngraph::Function> fn_ptr;
|
||||||
|
std::string deviceName;
|
||||||
|
public:
|
||||||
|
void SetUp() override {
|
||||||
|
fn_ptr = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||||
|
deviceName = CommonTestUtils::DEVICE_GPU;
|
||||||
|
auto with_auto_batching = this->GetParam();
|
||||||
|
if (with_auto_batching) { // BATCH:GPU
|
||||||
|
deviceName = std::string(CommonTestUtils::DEVICE_BATCH) + ":" + deviceName;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||||
|
auto with_auto_batch = obj.param;
|
||||||
|
return std::string("RemoteTensor_Test") + (with_auto_batch ? "_WITH_AUTO_BATCHING": "");
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContext) {
|
||||||
auto ie = ov::runtime::Core();
|
auto ie = ov::runtime::Core();
|
||||||
|
|
||||||
using namespace ov::preprocess;
|
using namespace ov::preprocess;
|
||||||
@ -262,7 +309,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
|||||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||||
auto function = p.build();
|
auto function = p.build();
|
||||||
|
|
||||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||||
auto input = function->get_parameters().at(0);
|
auto input = function->get_parameters().at(0);
|
||||||
auto output = function->get_results().at(0);
|
auto output = function->get_results().at(0);
|
||||||
|
|
||||||
@ -296,7 +343,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContext) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserContextWithMultipleDevices) {
|
||||||
auto ie = ov::runtime::Core();
|
auto ie = ov::runtime::Core();
|
||||||
|
|
||||||
using namespace ov::preprocess;
|
using namespace ov::preprocess;
|
||||||
@ -305,7 +352,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
|||||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||||
auto function = p.build();
|
auto function = p.build();
|
||||||
|
|
||||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||||
auto input = function->get_parameters().at(0);
|
auto input = function->get_parameters().at(0);
|
||||||
auto output = function->get_results().at(0);
|
auto output = function->get_results().at(0);
|
||||||
|
|
||||||
@ -344,7 +391,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserContextWithMultipleDevices) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_out_of_order) {
|
||||||
auto ie = ov::runtime::Core();
|
auto ie = ov::runtime::Core();
|
||||||
|
|
||||||
using namespace ov::preprocess;
|
using namespace ov::preprocess;
|
||||||
@ -353,7 +400,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
|||||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||||
auto function = p.build();
|
auto function = p.build();
|
||||||
|
|
||||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||||
auto input = function->get_parameters().at(0);
|
auto input = function->get_parameters().at(0);
|
||||||
auto output = function->get_results().at(0);
|
auto output = function->get_results().at(0);
|
||||||
|
|
||||||
@ -423,7 +470,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_out_of_order) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
TEST_P(OVRemoteTensor_TestsWithContext, smoke_canInferOnUserQueue_in_order) {
|
||||||
auto ie = ov::runtime::Core();
|
auto ie = ov::runtime::Core();
|
||||||
|
|
||||||
using namespace ov::preprocess;
|
using namespace ov::preprocess;
|
||||||
@ -432,7 +479,7 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
|||||||
p.input().preprocess().convert_element_type(ov::element::f32);
|
p.input().preprocess().convert_element_type(ov::element::f32);
|
||||||
auto function = p.build();
|
auto function = p.build();
|
||||||
|
|
||||||
auto exec_net_regular = ie.compile_model(function, CommonTestUtils::DEVICE_GPU);
|
auto exec_net_regular = ie.compile_model(function, deviceName);
|
||||||
auto input = function->get_parameters().at(0);
|
auto input = function->get_parameters().at(0);
|
||||||
auto output = function->get_results().at(0);
|
auto output = function->get_results().at(0);
|
||||||
|
|
||||||
@ -498,6 +545,9 @@ TEST_F(OVRemoteTensor_Test, smoke_canInferOnUserQueue_in_order) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_RemoteTensor, OVRemoteTensor_TestsWithContext, ::testing::ValuesIn(ov_with_auto_batching),
|
||||||
|
OVRemoteTensor_TestsWithContext::getTestCaseName);
|
||||||
|
|
||||||
TEST_F(OVRemoteTensor_Test, NV12toBGR_image) {
|
TEST_F(OVRemoteTensor_Test, NV12toBGR_image) {
|
||||||
#if defined(ANDROID)
|
#if defined(ANDROID)
|
||||||
GTEST_SKIP();
|
GTEST_SKIP();
|
||||||
|
@ -0,0 +1,31 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
#include <auto_batching/auto_batching_tests.hpp>
|
||||||
|
|
||||||
|
const std::vector<size_t> num_streams{ 2 };
|
||||||
|
const std::vector<bool> get_vs_set{ true, false };
|
||||||
|
const std::vector<size_t> num_requests{ 1, 8, 16, 64 };
|
||||||
|
const std::vector<size_t> num_batch{ 1, 8, 32, 256 };
|
||||||
|
using namespace AutoBatchingTests;
|
||||||
|
|
||||||
|
namespace AutoBatchingTests {
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
|
::testing::ValuesIn(get_vs_set),
|
||||||
|
::testing::ValuesIn(num_streams),
|
||||||
|
::testing::ValuesIn(num_requests),
|
||||||
|
::testing::ValuesIn(num_batch)),
|
||||||
|
AutoBatching_Test::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatching_GPU, AutoBatching_Test_DetectionOutput,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
|
::testing::ValuesIn(get_vs_set),
|
||||||
|
::testing::ValuesIn(num_streams),
|
||||||
|
::testing::ValuesIn(num_requests),
|
||||||
|
::testing::ValuesIn(num_batch)),
|
||||||
|
AutoBatching_Test_DetectionOutput::getTestCaseName);
|
||||||
|
} // namespace AutoBatchingTests
|
@ -52,6 +52,10 @@ const std::vector<std::map<std::string, std::string>> autoConfig = {
|
|||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> autoBatchConfig = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||||
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, ExecNetSetPrecision,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::ValuesIn(netPrecisions),
|
::testing::ValuesIn(netPrecisions),
|
||||||
@ -72,4 +76,11 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetSetPrecision,
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||||
::testing::ValuesIn(autoConfig)),
|
::testing::ValuesIn(autoConfig)),
|
||||||
ExecNetSetPrecision::getTestCaseName);
|
ExecNetSetPrecision::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, ExecNetSetPrecision,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::ValuesIn(netPrecisions),
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(autoBatchConfig)),
|
||||||
|
ExecNetSetPrecision::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
@ -22,27 +22,27 @@ namespace {
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS,
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_OPTIMAL_NUMBER_OF_INFER_REQUESTS,
|
||||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS,
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_SUPPORTED_METRICS,
|
||||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME,
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_NETWORK_NAME,
|
||||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported,
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassExecutableNetworkGetMetricTest_ThrowsUnsupported,
|
||||||
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU")
|
::testing::Values("GPU", "MULTI:GPU", "HETERO:GPU", "AUTO:GPU,CPU", "BATCH:GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
//
|
//
|
||||||
|
@ -19,6 +19,10 @@ const std::vector<std::map<std::string, std::string>> autoConfigs = {
|
|||||||
{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||||
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestCallbackTests,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestCallbackTests,
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||||
::testing::ValuesIn(autoConfigs)),
|
::testing::ValuesIn(autoConfigs)),
|
||||||
InferRequestCallbackTests::getTestCaseName);
|
InferRequestCallbackTests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestCallbackTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(autoBatchConfigs)),
|
||||||
|
InferRequestCallbackTests::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -18,6 +18,10 @@ const std::vector<std::map<std::string, std::string>> autoconfigs = {
|
|||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}}
|
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES, std::string(CommonTestUtils::DEVICE_CPU) + "," + CommonTestUtils::DEVICE_GPU}}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||||
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestMultithreadingTests,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
@ -36,4 +40,10 @@ INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestMultithreadingTes
|
|||||||
::testing::ValuesIn(autoconfigs)),
|
::testing::ValuesIn(autoconfigs)),
|
||||||
InferRequestMultithreadingTests::getTestCaseName);
|
InferRequestMultithreadingTests::getTestCaseName);
|
||||||
|
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestMultithreadingTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(auto_batch_configs)),
|
||||||
|
InferRequestMultithreadingTests::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -19,6 +19,11 @@ namespace {
|
|||||||
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU}}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> autoBatchConfigs = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||||
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, InferRequestWaitTests,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
@ -32,9 +37,15 @@ namespace {
|
|||||||
InferRequestWaitTests::getTestCaseName);
|
InferRequestWaitTests::getTestCaseName);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestWaitTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, InferRequestWaitTests,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||||
::testing::ValuesIn(autoConfigs)),
|
::testing::ValuesIn(autoConfigs)),
|
||||||
InferRequestWaitTests::getTestCaseName);
|
InferRequestWaitTests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, InferRequestWaitTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(autoBatchConfigs)),
|
||||||
|
InferRequestWaitTests::getTestCaseName);
|
||||||
|
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -30,11 +30,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassNetworkTestP, OVClassNetworkTestP, ::tes
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
OVClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_SUPPORTED_METRICS,
|
OVClassGetMetricTest_SUPPORTED_METRICS,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_AVAILABLE_DEVICES,
|
OVClassGetMetricTest_AVAILABLE_DEVICES,
|
||||||
@ -42,7 +42,7 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_FULL_DEVICE_NAME,
|
OVClassGetMetricTest_FULL_DEVICE_NAME,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES,
|
OVClassGetMetricTest_OPTIMIZATION_CAPABILITIES,
|
||||||
@ -62,11 +62,11 @@ INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetMetricTest,
|
||||||
OVClassGetMetricTest_ThrowUnsupported,
|
OVClassGetMetricTest_ThrowUnsupported,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest,
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetConfigTest,
|
||||||
OVClassGetConfigTest_ThrowUnsupported,
|
OVClassGetConfigTest_ThrowUnsupported,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO"));
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH"));
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU"));
|
INSTANTIATE_TEST_SUITE_P(nightly_OVClassGetAvailableDevices, OVClassGetAvailableDevices, ::testing::Values("GPU"));
|
||||||
|
|
||||||
|
@ -104,6 +104,29 @@ namespace {
|
|||||||
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU},
|
CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU},
|
||||||
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}}
|
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "should be int"}}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> auto_batch_inconfigs = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
|
||||||
|
{CONFIG_KEY(AUTO_BATCH_TIMEOUT), "-1"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG), CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, "DOESN'T EXIST"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "-1"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, "ON"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE, "unknown_file"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_DUMP_KERNELS, "ON"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_TUNING_MODE, "TUNING_UNKNOWN_MODE"}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{InferenceEngine::PluginConfigParams::KEY_DEVICE_ID, "DEVICE_UNKNOWN"}},
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
IE_SUPPRESS_DEPRECATED_END
|
IE_SUPPRESS_DEPRECATED_END
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, IncorrectConfigTests,
|
||||||
@ -125,6 +148,12 @@ namespace {
|
|||||||
IncorrectConfigTests::getTestCaseName);
|
IncorrectConfigTests::getTestCaseName);
|
||||||
|
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(auto_batch_inconfigs)),
|
||||||
|
IncorrectConfigTests::getTestCaseName);
|
||||||
|
|
||||||
const std::vector<std::map<std::string, std::string>> conf = {
|
const std::vector<std::map<std::string, std::string>> conf = {
|
||||||
{}
|
{}
|
||||||
};
|
};
|
||||||
@ -167,17 +196,6 @@ namespace {
|
|||||||
};
|
};
|
||||||
IE_SUPPRESS_DEPRECATED_END
|
IE_SUPPRESS_DEPRECATED_END
|
||||||
|
|
||||||
const std::vector<std::map<std::string, std::string>> multiconf = {
|
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
|
||||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::THROUGHPUT}},
|
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
|
||||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY}},
|
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
|
||||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT, InferenceEngine::PluginConfigParams::LATENCY},
|
|
||||||
{InferenceEngine::PluginConfigParams::KEY_PERFORMANCE_HINT_NUM_REQUESTS, "1"}}
|
|
||||||
};
|
|
||||||
|
|
||||||
const std::vector<std::map<std::string, std::string>> autoConfigs = {
|
const std::vector<std::map<std::string, std::string>> autoConfigs = {
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU}},
|
||||||
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
{{InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES , CommonTestUtils::DEVICE_GPU},
|
||||||
@ -232,6 +250,12 @@ namespace {
|
|||||||
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}}
|
{InferenceEngine::MultiDeviceConfigParams::KEY_AUTO_NETWORK_PRIORITY, "2"}}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const std::vector<std::map<std::string, std::string>> auto_batch_configs = {
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU}},
|
||||||
|
{{CONFIG_KEY(AUTO_BATCH_DEVICE_CONFIG) , CommonTestUtils::DEVICE_GPU},
|
||||||
|
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "1"}},
|
||||||
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests,
|
INSTANTIATE_TEST_SUITE_P(smoke_BehaviorTests, DefaultValuesConfigTests,
|
||||||
::testing::Combine(
|
::testing::Combine(
|
||||||
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
::testing::Values(CommonTestUtils::DEVICE_GPU),
|
||||||
@ -255,4 +279,15 @@ namespace {
|
|||||||
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
::testing::Values(CommonTestUtils::DEVICE_AUTO),
|
||||||
::testing::ValuesIn(autoinconfigs)),
|
::testing::ValuesIn(autoinconfigs)),
|
||||||
IncorrectConfigAPITests::getTestCaseName);
|
IncorrectConfigAPITests::getTestCaseName);
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, IncorrectConfigAPITests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(auto_batch_inconfigs)),
|
||||||
|
IncorrectConfigAPITests::getTestCaseName);
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(smoke_AutoBatch_BehaviorTests, CorrectConfigTests,
|
||||||
|
::testing::Combine(
|
||||||
|
::testing::Values(CommonTestUtils::DEVICE_BATCH),
|
||||||
|
::testing::ValuesIn(auto_batch_configs)),
|
||||||
|
CorrectConfigTests::getTestCaseName);
|
||||||
} // namespace
|
} // namespace
|
||||||
|
@ -35,12 +35,12 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_CONFIG_KEYS,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS,
|
nightly_IEClassGetMetricTest, IEClassGetMetricTest_SUPPORTED_METRICS,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
@ -50,7 +50,7 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME,
|
nightly_IEClassGetMetricTest, IEClassGetMetricTest_FULL_DEVICE_NAME,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
@ -80,12 +80,12 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported,
|
nightly_IEClassGetMetricTest, IEClassGetMetricTest_ThrowUnsupported,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported,
|
nightly_IEClassGetConfigTest, IEClassGetConfigTest_ThrowUnsupported,
|
||||||
::testing::Values("GPU", "MULTI", "HETERO", "AUTO")
|
::testing::Values("GPU", "MULTI", "HETERO", "AUTO", "BATCH")
|
||||||
);
|
);
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
@ -115,6 +115,26 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
::testing::Values("GPU")
|
::testing::Values("GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
|
using IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE = BehaviorTestsUtils::IEClassBaseTestP;
|
||||||
|
TEST_P(IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE, GetMetricAndPrintNoThrow) {
|
||||||
|
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||||
|
InferenceEngine::Core ie;
|
||||||
|
InferenceEngine::Parameter p;
|
||||||
|
|
||||||
|
std::map<std::string, InferenceEngine::Parameter> _options = {{"MODEL_PTR", simpleCnnNetwork.getFunction()}};
|
||||||
|
ASSERT_NO_THROW(p = ie.GetMetric(deviceName, METRIC_KEY(OPTIMAL_BATCH_SIZE), _options).as<unsigned int>());
|
||||||
|
unsigned int t = p;
|
||||||
|
|
||||||
|
std::cout << "GPU device optimal batch size: " << t << std::endl;
|
||||||
|
|
||||||
|
ASSERT_METRIC_SUPPORTED_IE(METRIC_KEY(OPTIMAL_BATCH_SIZE));
|
||||||
|
}
|
||||||
|
|
||||||
|
INSTANTIATE_TEST_SUITE_P(
|
||||||
|
nightly_IEClassExecutableNetworkGetMetricTest, IEClassGetMetricTest_GPU_OPTIMAL_BATCH_SIZE,
|
||||||
|
::testing::Values("GPU")
|
||||||
|
);
|
||||||
|
|
||||||
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP;
|
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT = BehaviorTestsUtils::IEClassBaseTestP;
|
||||||
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) {
|
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_DEFAULT, GetMetricAndPrintNoThrow) {
|
||||||
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||||
@ -135,6 +155,7 @@ INSTANTIATE_TEST_SUITE_P(
|
|||||||
::testing::Values("GPU")
|
::testing::Values("GPU")
|
||||||
);
|
);
|
||||||
|
|
||||||
|
|
||||||
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP;
|
using IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM = BehaviorTestsUtils::IEClassBaseTestP;
|
||||||
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) {
|
TEST_P(IEClassGetMetricTest_GPU_MAX_BATCH_SIZE_STREAM_DEVICE_MEM, GetMetricAndPrintNoThrow) {
|
||||||
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
SKIP_IF_CURRENT_TEST_IS_DISABLED()
|
||||||
|
@ -16,6 +16,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
|||||||
list(APPEND DEPENDENCIES ov_auto_plugin)
|
list(APPEND DEPENDENCIES ov_auto_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
list(APPEND DEPENDENCIES ov_auto_batch_plugin)
|
||||||
|
endif()
|
||||||
|
|
||||||
|
|
||||||
# remove once CVS-69781 is fixed
|
# remove once CVS-69781 is fixed
|
||||||
if(ENABLE_OV_IR_FRONTEND)
|
if(ENABLE_OV_IR_FRONTEND)
|
||||||
list(APPEND DEPENDENCIES ov_ir_frontend)
|
list(APPEND DEPENDENCIES ov_ir_frontend)
|
||||||
|
@ -0,0 +1,161 @@
|
|||||||
|
// Copyright (C) 2018-2021 Intel Corporation
|
||||||
|
// SPDX-License-Identifier: Apache-2.0
|
||||||
|
//
|
||||||
|
|
||||||
|
#include <string>
|
||||||
|
#include <utility>
|
||||||
|
#include <vector>
|
||||||
|
#include <memory>
|
||||||
|
|
||||||
|
#include <gpu/gpu_config.hpp>
|
||||||
|
#include <common_test_utils/test_common.hpp>
|
||||||
|
#include <functional_test_utils/plugin_cache.hpp>
|
||||||
|
|
||||||
|
#include "ngraph_functions/subgraph_builders.hpp"
|
||||||
|
#include "functional_test_utils/blob_utils.hpp"
|
||||||
|
|
||||||
|
using namespace ::testing;
|
||||||
|
using namespace InferenceEngine;
|
||||||
|
|
||||||
|
namespace AutoBatchingTests {
|
||||||
|
using AutoBatchTwoNetsParams = std::tuple<
|
||||||
|
std::string, // device name
|
||||||
|
bool, // get or set blob
|
||||||
|
size_t, // number of streams
|
||||||
|
size_t, // number of requests
|
||||||
|
size_t>; // batch size>
|
||||||
|
|
||||||
|
class AutoBatching_Test : public CommonTestUtils::TestsCommon,
|
||||||
|
public testing::WithParamInterface<AutoBatchTwoNetsParams> {
|
||||||
|
void SetUp() override {
|
||||||
|
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
|
||||||
|
fn_ptrs = {ngraph::builder::subgraph::makeSingleConv(),
|
||||||
|
ngraph::builder::subgraph::makeMultiSingleConv()};
|
||||||
|
};
|
||||||
|
public:
|
||||||
|
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
|
||||||
|
size_t streams, requests, batch;
|
||||||
|
bool use_get_blob;
|
||||||
|
std::string device_name;
|
||||||
|
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
|
||||||
|
return device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") + "_batch_size_" +
|
||||||
|
std::to_string(batch) +
|
||||||
|
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
|
||||||
|
}
|
||||||
|
|
||||||
|
protected:
|
||||||
|
std::string device_name;
|
||||||
|
bool use_get_blob;
|
||||||
|
size_t num_streams;
|
||||||
|
size_t num_requests;
|
||||||
|
size_t num_batch;
|
||||||
|
std::vector<std::shared_ptr<ngraph::Function>> fn_ptrs;
|
||||||
|
|
||||||
|
void TestAutoBatch() {
|
||||||
|
std::vector<InferenceEngine::CNNNetwork> nets;
|
||||||
|
for (auto &fn_ptr : fn_ptrs) {
|
||||||
|
nets.push_back(CNNNetwork(fn_ptr));
|
||||||
|
}
|
||||||
|
|
||||||
|
auto ie = InferenceEngine::Core();
|
||||||
|
std::vector<std::string> outputs;
|
||||||
|
std::vector<InferRequest> irs;
|
||||||
|
std::vector<std::vector<uint8_t>> ref;
|
||||||
|
std::vector<int> outElementsCount;
|
||||||
|
|
||||||
|
for (size_t i = 0; i < nets.size(); ++i) {
|
||||||
|
auto net = nets[i];
|
||||||
|
auto inputs = net.getInputsInfo();
|
||||||
|
for (auto n : inputs) {
|
||||||
|
n.second->setPrecision(Precision::FP32);
|
||||||
|
}
|
||||||
|
std::map<std::string, std::string> config;
|
||||||
|
if (device_name.find("GPU") != std::string::npos)
|
||||||
|
config[CONFIG_KEY(GPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
|
||||||
|
if (device_name.find("CPU") != std::string::npos)
|
||||||
|
config[CONFIG_KEY(CPU_THROUGHPUT_STREAMS)] = std::to_string(num_streams);
|
||||||
|
// minimize timeout to reduce test time
|
||||||
|
config[CONFIG_KEY(AUTO_BATCH_TIMEOUT)] = std::to_string(1);
|
||||||
|
auto exec_net_ref = ie.LoadNetwork(net, std::string(CommonTestUtils::DEVICE_BATCH) + ":" +
|
||||||
|
device_name + "(" + std::to_string(num_batch) + ")",
|
||||||
|
config);
|
||||||
|
|
||||||
|
for (size_t j = 0; j < num_requests; j++) {
|
||||||
|
outputs.push_back(net.getOutputsInfo().begin()->first); //single output
|
||||||
|
outElementsCount.push_back(
|
||||||
|
std::accumulate(begin(fn_ptrs[i]->get_output_shape(0)), end(fn_ptrs[i]->get_output_shape(0)), 1,
|
||||||
|
std::multiplies<size_t>()));
|
||||||
|
|
||||||
|
auto inf_req = exec_net_ref.CreateInferRequest();
|
||||||
|
irs.push_back(inf_req);
|
||||||
|
|
||||||
|
std::vector<std::vector<uint8_t>> inData;
|
||||||
|
for (auto n : inputs) {
|
||||||
|
auto blob = FuncTestUtils::createAndFillBlob(n.second->getTensorDesc());
|
||||||
|
if (use_get_blob)
|
||||||
|
memcpy(reinterpret_cast<void *>(inf_req.GetBlob(n.first)->buffer().as<uint8_t*>()),
|
||||||
|
reinterpret_cast<const void *>(blob->cbuffer().as<uint8_t*>()), blob->byteSize());
|
||||||
|
else
|
||||||
|
inf_req.SetBlob(n.first, blob);
|
||||||
|
|
||||||
|
const auto inBlob = inf_req.GetBlob(n.first);
|
||||||
|
const auto blobSize = inBlob->byteSize();
|
||||||
|
const auto inBlobBuf = inBlob->cbuffer().as<uint8_t *>();
|
||||||
|
inData.push_back(std::vector<uint8_t>(inBlobBuf, inBlobBuf + blobSize));
|
||||||
|
}
|
||||||
|
auto refOutData = ngraph::helpers::interpreterFunction(fn_ptrs[i], {inData}).front().second;
|
||||||
|
ref.push_back(refOutData);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const int niter = 1;
|
||||||
|
for (int i = 0; i < niter; i++) {
|
||||||
|
for (auto ir : irs) {
|
||||||
|
ir.StartAsync();
|
||||||
|
}
|
||||||
|
|
||||||
|
for (auto ir : irs) {
|
||||||
|
ir.Wait(InferRequest::RESULT_READY);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
auto thr = FuncTestUtils::GetComparisonThreshold(InferenceEngine::Precision::FP32);
|
||||||
|
for (size_t i = 0; i < irs.size(); ++i) {
|
||||||
|
const auto &refBuffer = ref[i].data();
|
||||||
|
ASSERT_EQ(outElementsCount[i], irs[i].GetBlob(outputs[i])->size());
|
||||||
|
FuncTestUtils::compareRawBuffers(irs[i].GetBlob(outputs[i])->buffer().as<float *>(),
|
||||||
|
reinterpret_cast<const float *>(refBuffer), outElementsCount[i],
|
||||||
|
outElementsCount[i],
|
||||||
|
thr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
class AutoBatching_Test_DetectionOutput : public AutoBatching_Test {
|
||||||
|
public:
|
||||||
|
void SetUp() override {
|
||||||
|
std::tie(device_name, use_get_blob, num_streams, num_requests, num_batch) = this->GetParam();
|
||||||
|
fn_ptrs = {ngraph::builder::subgraph::makeEltwisePlusDetectionOutput(),
|
||||||
|
ngraph::builder::subgraph::makeEltwisePlusDetectionOutput()};
|
||||||
|
};
|
||||||
|
|
||||||
|
static std::string getTestCaseName(const testing::TestParamInfo<AutoBatchTwoNetsParams> &obj) {
|
||||||
|
size_t streams, requests, batch;
|
||||||
|
bool use_get_blob;
|
||||||
|
std::string device_name;
|
||||||
|
std::tie(device_name, use_get_blob, streams, requests, batch) = obj.param;
|
||||||
|
return "DetectionOutput_HETERO_" + device_name + std::string(use_get_blob ? "_get_blob" : "_set_blob") +
|
||||||
|
"_batch_size_" + std::to_string(batch) +
|
||||||
|
"_num_streams_" + std::to_string(streams) + "_num_req_" + std::to_string(requests);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
TEST_P(AutoBatching_Test, compareAutoBatchingToSingleBatch) {
|
||||||
|
TestAutoBatch();
|
||||||
|
}
|
||||||
|
|
||||||
|
TEST_P(AutoBatching_Test_DetectionOutput, compareAutoBatchingToSingleBatch) {
|
||||||
|
TestAutoBatch();
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace AutoBatchingTests
|
@ -10,6 +10,7 @@ const char DEVICE_AUTO[] = "AUTO";
|
|||||||
const char DEVICE_CPU[] = "CPU";
|
const char DEVICE_CPU[] = "CPU";
|
||||||
const char DEVICE_GNA[] = "GNA";
|
const char DEVICE_GNA[] = "GNA";
|
||||||
const char DEVICE_GPU[] = "GPU";
|
const char DEVICE_GPU[] = "GPU";
|
||||||
|
const char DEVICE_BATCH[] = "BATCH";
|
||||||
const char DEVICE_HDDL[] = "HDDL";
|
const char DEVICE_HDDL[] = "HDDL";
|
||||||
const char DEVICE_MYRIAD[] = "MYRIAD";
|
const char DEVICE_MYRIAD[] = "MYRIAD";
|
||||||
const char DEVICE_KEEMBAY[] = "VPUX";
|
const char DEVICE_KEEMBAY[] = "VPUX";
|
||||||
|
@ -26,6 +26,9 @@ public:
|
|||||||
MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal(
|
MOCK_METHOD3(ImportNetwork, InferenceEngine::SoExecutableNetworkInternal(
|
||||||
std::istream&, const std::shared_ptr<InferenceEngine::RemoteContext>&, const std::map<std::string, std::string>&));
|
std::istream&, const std::shared_ptr<InferenceEngine::RemoteContext>&, const std::map<std::string, std::string>&));
|
||||||
|
|
||||||
|
MOCK_METHOD2(CreateContext, InferenceEngine::RemoteContext::Ptr(const std::string& deviceName,
|
||||||
|
const InferenceEngine::ParamMap& params));
|
||||||
|
|
||||||
MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult(
|
MOCK_CONST_METHOD3(QueryNetwork, InferenceEngine::QueryNetworkResult(
|
||||||
const InferenceEngine::CNNNetwork&, const std::string&, const std::map<std::string, std::string>&));
|
const InferenceEngine::CNNNetwork&, const std::string&, const std::map<std::string, std::string>&));
|
||||||
|
|
||||||
|
@ -242,6 +242,44 @@ inline std::shared_ptr<ngraph::Function> makeSingleConv(std::vector<size_t> inpu
|
|||||||
return fn_ptr;
|
return fn_ptr;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
inline std::shared_ptr<ngraph::Function> makeEltwisePlusDetectionOutput(std::vector<std::vector<size_t>> inShapes =
|
||||||
|
{{1, 60}, {1, 165}, {1, 1, 75}},
|
||||||
|
ngraph::element::Type_t type = ngraph::element::Type_t::f32) {
|
||||||
|
// adding Eltwise so that we can tests Auto-Batching's HETERO code-path that splits the DetectionOutput and the rest of the network
|
||||||
|
auto params = ngraph::builder::makeParams(ngraph::element::f32, inShapes);
|
||||||
|
auto paramOuts = ngraph::helpers::convert2OutputVector(
|
||||||
|
ngraph::helpers::castOps2Nodes<ngraph::opset3::Parameter>(params));
|
||||||
|
ngraph::OutputVector outs;
|
||||||
|
for (size_t i = 0; i < inShapes.size(); i++) {
|
||||||
|
auto shape = inShapes[i];
|
||||||
|
auto p = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::f32, ngraph::Shape{shape});
|
||||||
|
auto add = ngraph::builder::makeEltwise(paramOuts[i], p, ngraph::helpers::EltwiseTypes::ADD);
|
||||||
|
params.push_back(p);
|
||||||
|
outs.push_back(add->output(0));
|
||||||
|
}
|
||||||
|
ngraph::op::DetectionOutput::Attributes attr;
|
||||||
|
attr.num_classes = 11;
|
||||||
|
attr.background_label_id = 0;
|
||||||
|
attr.top_k = 75;
|
||||||
|
attr.variance_encoded_in_target = true;
|
||||||
|
attr.keep_top_k = {50};
|
||||||
|
attr.code_type = std::string{"caffe.PriorBoxParameter.CORNER"};
|
||||||
|
attr.share_location = true;
|
||||||
|
attr.nms_threshold = 0.5f;
|
||||||
|
attr.confidence_threshold = 0.5f;
|
||||||
|
attr.clip_after_nms = false;
|
||||||
|
attr.clip_before_nms = false;
|
||||||
|
attr.decrease_label_id = false;
|
||||||
|
attr.normalized = false;
|
||||||
|
attr.input_height = 1;
|
||||||
|
attr.input_width = 1;
|
||||||
|
attr.objectness_score = 0.4f;
|
||||||
|
|
||||||
|
auto detOut = ngraph::builder::makeDetectionOutput(outs, attr);
|
||||||
|
ngraph::ResultVector results{std::make_shared<ngraph::opset3::Result>(detOut)};
|
||||||
|
return std::make_shared<ngraph::Function>(results, params, "EltWiseWithDetectionOutput");
|
||||||
|
}
|
||||||
|
|
||||||
inline std::shared_ptr<ngraph::Function> makeMultiSingleConv(std::vector<size_t> inputShape = {1, 3, 24, 24},
|
inline std::shared_ptr<ngraph::Function> makeMultiSingleConv(std::vector<size_t> inputShape = {1, 3, 24, 24},
|
||||||
ngraph::element::Type type = ngraph::element::Type_t::f32) {
|
ngraph::element::Type type = ngraph::element::Type_t::f32) {
|
||||||
auto param0 = std::make_shared<ngraph::opset1::Parameter>(type, ngraph::Shape(inputShape));
|
auto param0 = std::make_shared<ngraph::opset1::Parameter>(type, ngraph::Shape(inputShape));
|
||||||
|
@ -38,6 +38,7 @@ using Config = std::map<std::string, std::string>;
|
|||||||
using namespace MockMultiDevice;
|
using namespace MockMultiDevice;
|
||||||
|
|
||||||
using ConfigParams = std::tuple<
|
using ConfigParams = std::tuple<
|
||||||
|
bool, // if THROUGHPUT
|
||||||
unsigned int, // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS
|
unsigned int, // cpu OPTIMAL_NUMBER_OF_INFER_REQUESTS
|
||||||
int, // cpu infer requet num of customer want
|
int, // cpu infer requet num of customer want
|
||||||
bool, // if cpu sleep, cpu device will load slow
|
bool, // if cpu sleep, cpu device will load slow
|
||||||
@ -77,12 +78,18 @@ public:
|
|||||||
unsigned int expectOptimalNum;
|
unsigned int expectOptimalNum;
|
||||||
bool cpuSleep;
|
bool cpuSleep;
|
||||||
bool gpuSleep;
|
bool gpuSleep;
|
||||||
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
bool isThroughput;
|
||||||
|
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||||
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param;
|
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = obj.param;
|
||||||
std::ostringstream result;
|
std::ostringstream result;
|
||||||
result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum;
|
result << "cpuOptimalNum_" << cpuOptimalNum << "cpuCustomerNum_" << cpuCustomerNum;
|
||||||
result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum;
|
result << "gpuOptimalNum_" << gpuOptimalNum << "gpuCustomerNum_" << gpuCustomerNum;
|
||||||
result << "expectOptimalNum_" << expectOptimalNum;
|
result << "expectOptimalNum_" << expectOptimalNum;
|
||||||
|
if (isThroughput) {
|
||||||
|
result << "_isThroughput" << "true";
|
||||||
|
} else {
|
||||||
|
result << "__isThroughput" << "false";
|
||||||
|
}
|
||||||
if (cpuSleep) {
|
if (cpuSleep) {
|
||||||
result << "_cpuSleep_" << "true";
|
result << "_cpuSleep_" << "true";
|
||||||
} else {
|
} else {
|
||||||
@ -147,7 +154,7 @@ public:
|
|||||||
IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {});
|
IE_SET_METRIC(SUPPORTED_CONFIG_KEYS, supportConfigs, {});
|
||||||
ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _))
|
ON_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _))
|
||||||
.WillByDefault(RETURN_MOCK_VALUE(supportConfigs));
|
.WillByDefault(RETURN_MOCK_VALUE(supportConfigs));
|
||||||
EXPECT_CALL(*core, GetMetric(_, StrEq(METRIC_KEY(SUPPORTED_CONFIG_KEYS)), _)).Times(AnyNumber());
|
EXPECT_CALL(*core, GetMetric(_, _, _)).Times(AnyNumber());
|
||||||
|
|
||||||
// test auto plugin
|
// test auto plugin
|
||||||
config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES});
|
config.insert({CONFIG_KEY_INTERNAL(MULTI_WORK_MODE_AS_AUTO), InferenceEngine::PluginConfigParams::YES});
|
||||||
@ -168,11 +175,24 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
|
|||||||
unsigned int expectOptimalNum;
|
unsigned int expectOptimalNum;
|
||||||
bool cpuSleep;
|
bool cpuSleep;
|
||||||
bool gpuSleep;
|
bool gpuSleep;
|
||||||
std::tie(cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
bool isThroughput;
|
||||||
|
std::tie(isThroughput, cpuOptimalNum, cpuCustomerNum, cpuSleep,
|
||||||
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam();
|
gpuOptimalNum, gpuCustomerNum, gpuSleep, expectOptimalNum) = this->GetParam();
|
||||||
|
if (isThroughput) {
|
||||||
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
|
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {{CONFIG_KEY(PERFORMANCE_HINT),
|
||||||
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
|
InferenceEngine::PluginConfigParams::THROUGHPUT}}, cpuCustomerNum, ""});
|
||||||
|
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {{CONFIG_KEY(PERFORMANCE_HINT),
|
||||||
|
InferenceEngine::PluginConfigParams::THROUGHPUT}}, gpuCustomerNum, ""});
|
||||||
|
IE_SET_METRIC(OPTIMAL_BATCH_SIZE, optimalBatchNum, 256);
|
||||||
|
IE_SET_METRIC(RANGE_FOR_STREAMS, rangeOfStreams, std::make_tuple<unsigned int, unsigned int>(1, 2));
|
||||||
|
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(OPTIMAL_BATCH_SIZE)), _))
|
||||||
|
.WillByDefault(RETURN_MOCK_VALUE(optimalBatchNum));
|
||||||
|
ON_CALL(*core.get(), GetMetric(StrEq(CommonTestUtils::DEVICE_GPU), StrEq(METRIC_KEY(RANGE_FOR_STREAMS)), _))
|
||||||
|
.WillByDefault(RETURN_MOCK_VALUE(rangeOfStreams));
|
||||||
|
} else {
|
||||||
|
metaDevices.push_back({CommonTestUtils::DEVICE_CPU, {}, cpuCustomerNum, ""});
|
||||||
|
metaDevices.push_back({CommonTestUtils::DEVICE_GPU, {}, gpuCustomerNum, ""});
|
||||||
|
}
|
||||||
ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1]));
|
ON_CALL(*plugin, SelectDevice(_, _, _)).WillByDefault(Return(metaDevices[1]));
|
||||||
ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices));
|
ON_CALL(*plugin, ParseMetaDevices(_, _)).WillByDefault(Return(metaDevices));
|
||||||
EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1);
|
EXPECT_CALL(*plugin, ParseMetaDevices(_, _)).Times(1);
|
||||||
@ -241,27 +261,28 @@ TEST_P(ExecNetworkGetMetric, OPTIMAL_NUMBER_OF_INFER_REQUESTS) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
// ConfigParams {unsigned int, int, bool,
|
// ConfigParams {bool, unsigned int, int, bool,
|
||||||
// unsigned int, int, bool, unsigned int}
|
// unsigned int, int, bool, unsigned int}
|
||||||
//
|
//
|
||||||
// every element for ConfigParams
|
// every element for ConfigParams
|
||||||
// {cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
|
// {is throughput mode, cpuOptimalNum, customer hope for cpu infer requset num, if cpu sleep when load,
|
||||||
// gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load,
|
// gpuOptimalNum, customer hope for gpu infer requset num, if gpu sleep when load,
|
||||||
// expectOptimalNum of Auto ExecNetwork}
|
// expectOptimalNum of Auto ExecNetwork}
|
||||||
//
|
//
|
||||||
const std::vector<ConfigParams> testConfigs = {
|
const std::vector<ConfigParams> testConfigs = {
|
||||||
ConfigParams {1, -1, false, 2, -1, true, 8},
|
ConfigParams {false, 1, -1, false, 2, -1, true, 8},
|
||||||
ConfigParams {1, -1, false, 10, -1, true, 8},
|
ConfigParams {false, 1, -1, false, 10, -1, true, 8},
|
||||||
ConfigParams {12, -1, false, 2, -1, true, 12},
|
ConfigParams {false, 12, -1, false, 2, -1, true, 12},
|
||||||
ConfigParams {12, -1, false, 10, -1, true, 12},
|
ConfigParams {false, 12, -1, false, 10, -1, true, 12},
|
||||||
ConfigParams {1, -1, true, 2, -1, false, 8},
|
ConfigParams {false, 1, -1, true, 2, -1, false, 8},
|
||||||
ConfigParams {1, -1, true, 10, -1, false, 10},
|
ConfigParams {false, 1, -1, true, 10, -1, false, 10},
|
||||||
ConfigParams {6, -1, true, 2, -1, false, 8},
|
ConfigParams {false, 6, -1, true, 2, -1, false, 8},
|
||||||
ConfigParams {6, -1, true, 10, -1, false, 10},
|
ConfigParams {false, 6, -1, true, 10, -1, false, 10},
|
||||||
ConfigParams {6, 4, false, 2, 3, true, 8},
|
ConfigParams {false, 6, 4, false, 2, 3, true, 8},
|
||||||
ConfigParams {6, 4, false, 10, 3, true, 8},
|
ConfigParams {false, 6, 4, false, 10, 3, true, 8},
|
||||||
ConfigParams {1, 4, true, 2, 3, false, 8},
|
ConfigParams {false, 1, 4, true, 2, 3, false, 8},
|
||||||
ConfigParams {1, 4, true, 10, 3, false, 10}
|
ConfigParams {false, 1, 4, true, 10, 3, false, 10},
|
||||||
|
ConfigParams {true, 1, 4, false, 10, 3, true, 512}
|
||||||
};
|
};
|
||||||
|
|
||||||
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric,
|
INSTANTIATE_TEST_SUITE_P(smoke_Auto_BehaviorTests, ExecNetworkGetMetric,
|
||||||
|
@ -14,6 +14,11 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
|||||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||||
|
endif()
|
||||||
|
|
||||||
|
|
||||||
target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests")
|
target_include_directories(${TARGET_NAME} PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}/plugin_tests")
|
||||||
|
|
||||||
target_link_libraries(${TARGET_NAME} PUBLIC
|
target_link_libraries(${TARGET_NAME} PUBLIC
|
||||||
|
@ -25,6 +25,10 @@ if(ENABLE_AUTO OR ENABLE_MULTI)
|
|||||||
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
add_dependencies(${TARGET_NAME} ov_auto_plugin)
|
||||||
endif()
|
endif()
|
||||||
|
|
||||||
|
if(ENABLE_AUTO_BATCH)
|
||||||
|
add_dependencies(${TARGET_NAME} ov_auto_batch_plugin)
|
||||||
|
endif()
|
||||||
|
|
||||||
set_ie_threading_interface_for(${TARGET_NAME})
|
set_ie_threading_interface_for(${TARGET_NAME})
|
||||||
|
|
||||||
ie_faster_build(${TARGET_NAME}
|
ie_faster_build(${TARGET_NAME}
|
||||||
|
Loading…
Reference in New Issue
Block a user