Added common ov::execution_mode hint (#15048)
* [GPU] Added common exec mode hint and gpu support * Add ov::inference precision and update usages. Deprecate ov::hint::inference_precision property * [GPU] Execution mode tests and fixes * Fixed code style * Moved execution_mode test to common. Fixes for python API * Remove deprecations for hint::inference_precision and just keep both * Fix test
This commit is contained in:
parent
53e699eaba
commit
2201a5f83e
@ -11,7 +11,7 @@ For an in-depth description of CPU plugin, see:
|
||||
|
||||
## Device Name
|
||||
The `CPU` device name is used for the CPU plugin. Even though there can be more than one physical socket on a platform, only one device of this kind is listed by OpenVINO.
|
||||
On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically.
|
||||
On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically.
|
||||
In order to use CPU for inference, the device name should be passed to the `ov::Core::compile_model()` method:
|
||||
|
||||
@sphinxtabset
|
||||
@ -38,7 +38,7 @@ CPU plugin supports the following data types as inference precision of internal
|
||||
- u8
|
||||
- i8
|
||||
- u1
|
||||
|
||||
|
||||
[Hello Query Device C++ Sample](../../../samples/cpp/hello_query_device/README.md) can be used to print out supported data types for all detected devices.
|
||||
|
||||
### Quantized Data Types Specifics
|
||||
@ -60,7 +60,7 @@ For more details about the `bfloat16` format, see the [BFLOAT16 – Hardware Num
|
||||
Using the `bf16` precision provides the following performance benefits:
|
||||
|
||||
- Faster multiplication of two `bfloat16` numbers because of shorter mantissa of the `bfloat16` data.
|
||||
- Reduced memory consumption since `bfloat16` data half the size of 32-bit float.
|
||||
- Reduced memory consumption since `bfloat16` data half the size of 32-bit float.
|
||||
|
||||
To check if the CPU device can support the `bfloat16` data type, use the [query device properties interface](./config_properties.md) to query `ov::device::capabilities` property, which should contain `BF16` in the list of CPU capabilities:
|
||||
|
||||
@ -76,11 +76,11 @@ To check if the CPU device can support the `bfloat16` data type, use the [query
|
||||
|
||||
@endsphinxtabset
|
||||
|
||||
If the model has been converted to `bf16`, the `ov::hint::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type:
|
||||
If the model has been converted to `bf16`, the `ov::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type:
|
||||
|
||||
@snippet snippets/cpu/Bfloat16Inference1.cpp part1
|
||||
|
||||
To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::hint::inference_precision` to `ov::element::f32`.
|
||||
To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::inference_precision` to `ov::element::f32`.
|
||||
|
||||
@sphinxtabset
|
||||
|
||||
@ -95,12 +95,12 @@ To infer the model in `f32` precision instead of `bf16` on targets with native `
|
||||
@endsphinxtabset
|
||||
|
||||
The `Bfloat16` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
|
||||
To enable the simulation, the `ov::hint::inference_precision` has to be explicitly set to `ov::element::bf16`.
|
||||
To enable the simulation, the `ov::inference_precision` has to be explicitly set to `ov::element::bf16`.
|
||||
|
||||
> **NOTE**: If ov::hint::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
|
||||
> **NOTE**: If ov::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
|
||||
|
||||
> **NOTE**: Due to the reduced mantissa size of the `bfloat16` data type, the resulting `bf16` inference accuracy may differ from the `f32` inference, especially for models that were not trained using the `bfloat16` data type. If the `bf16` inference accuracy is not acceptable, it is recommended to switch to the `f32` precision.
|
||||
|
||||
|
||||
## Supported Features
|
||||
|
||||
### Multi-device Execution
|
||||
@ -204,7 +204,7 @@ The plugin supports the following properties:
|
||||
All parameters must be set before calling `ov::Core::compile_model()` in order to take effect or passed as additional argument to `ov::Core::compile_model()`
|
||||
|
||||
- `ov::enable_profiling`
|
||||
- `ov::hint::inference_precision`
|
||||
- `ov::inference_precision`
|
||||
- `ov::hint::performance_mode`
|
||||
- `ov::hint::num_request`
|
||||
- `ov::num_streams`
|
||||
|
@ -51,7 +51,7 @@ For details, see a description of the `ov::intel_gna::execution_mode` property.
|
||||
|
||||
GNA is designed for real-time workloads i.e., noise reduction.
|
||||
For such workloads, processing should be time constrained. Otherwise, extra delays may cause undesired effects such as
|
||||
*audio glitches*. The GNA driver provides a Quality of Service (QoS) mechanism to ensure that processing can satisfy real-time requirements.
|
||||
*audio glitches*. The GNA driver provides a Quality of Service (QoS) mechanism to ensure that processing can satisfy real-time requirements.
|
||||
The mechanism interrupts requests that might cause high-priority Windows audio processes to miss
|
||||
the schedule. As a result, long running GNA tasks terminate early.
|
||||
|
||||
@ -101,7 +101,7 @@ GNA plugin supports the `i16` and `i8` quantized data types as inference precisi
|
||||
* Accuracy (i16 weights)
|
||||
* Performance (i8 weights)
|
||||
|
||||
For POT quantized model, the `ov::hint::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.
|
||||
For POT quantized model, the `ov::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.
|
||||
|
||||
## Supported Features
|
||||
|
||||
@ -206,7 +206,7 @@ In order to take effect, the following parameters must be set before model compi
|
||||
|
||||
- ov::cache_dir
|
||||
- ov::enable_profiling
|
||||
- ov::hint::inference_precision
|
||||
- ov::inference_precision
|
||||
- ov::hint::num_requests
|
||||
- ov::intel_gna::compile_target
|
||||
- ov::intel_gna::firmware_model_image_path
|
||||
@ -272,7 +272,7 @@ The following tables provide a more explicit representation of the Intel(R) GNA
|
||||
|
||||
For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
|
||||
* All convolution parameters are natively supported by HW (see tables above).
|
||||
* The runtime precision is explicitly set by the `ov::hint::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.
|
||||
* The runtime precision is explicitly set by the `ov::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.
|
||||
|
||||
### Batch Size Limitation
|
||||
|
||||
@ -332,4 +332,4 @@ Increasing batch size only improves efficiency of `MatMul` layers.
|
||||
|
||||
* [Supported Devices](Supported_Devices.md)
|
||||
* [Converting Model](../../MO_DG/prepare_model/convert_model/Converting_Model.md)
|
||||
* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md)
|
||||
* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md)
|
||||
|
@ -138,7 +138,7 @@ It is done by specifying `MULTI:GPU.1,GPU.0` as a target device.
|
||||
For more details, see the [Multi-device execution](../multi_device.md).
|
||||
|
||||
### Automatic Batching
|
||||
The GPU plugin is capable of reporting `ov::max_batch_size` and `ov::optimal_batch_size` metrics with respect to the current hardware
|
||||
The GPU plugin is capable of reporting `ov::max_batch_size` and `ov::optimal_batch_size` metrics with respect to the current hardware
|
||||
platform and model. Therefore, automatic batching is enabled by default when `ov::optimal_batch_size` is `> 1` and `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` is set.
|
||||
Alternatively, it can be enabled explicitly via the device notion, for example `BATCH:GPU`.
|
||||
|
||||
@ -238,7 +238,7 @@ For usage examples, refer to the [RemoteTensor API](./GPU_RemoteTensor_API.md).
|
||||
For more details, see the [preprocessing API](../preprocessing_overview.md).
|
||||
|
||||
### Model Caching
|
||||
Cache for the GPU plugin may be enabled via the common OpenVINO `ov::cache_dir` property. GPU plugin implementation supports only caching of compiled kernels, so all plugin-specific model transformations are executed on each `ov::Core::compile_model()` call regardless of the `cache_dir` option.
|
||||
Cache for the GPU plugin may be enabled via the common OpenVINO `ov::cache_dir` property. GPU plugin implementation supports only caching of compiled kernels, so all plugin-specific model transformations are executed on each `ov::Core::compile_model()` call regardless of the `cache_dir` option.
|
||||
Still, since kernel compilation is a bottleneck in the model loading process, a significant load time reduction can be achieved with the `ov::cache_dir` property enabled.
|
||||
|
||||
> **NOTE**: Full model caching support is currently implemented as a preview feature. To activate it, set the OV_GPU_CACHE_MODEL environment variable to 1.
|
||||
@ -262,8 +262,9 @@ All parameters must be set before calling `ov::Core::compile_model()` in order t
|
||||
- ov::enable_profiling
|
||||
- ov::hint::model_priority
|
||||
- ov::hint::performance_mode
|
||||
- ov::hint::execution_mode
|
||||
- ov::hint::num_requests
|
||||
- ov::hint::inference_precision
|
||||
- ov::inference_precision
|
||||
- ov::num_streams
|
||||
- ov::compilation_num_threads
|
||||
- ov::device::id
|
||||
|
@ -17,11 +17,11 @@
|
||||
@endsphinxdirective
|
||||
|
||||
Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
|
||||
`ov::hint::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model.
|
||||
`ov::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model.
|
||||
|
||||
Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.
|
||||
|
||||
It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data.
|
||||
It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data.
|
||||
|
||||
For more information on this topic, see the following articles:
|
||||
* [feature support by device](@ref features_support_matrix)
|
||||
@ -30,28 +30,28 @@ For more information on this topic, see the following articles:
|
||||
* [The 'get_tensor' Idiom](@ref tensor_idiom)
|
||||
* For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md)
|
||||
|
||||
See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations**
|
||||
See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations**
|
||||
|
||||
## Writing Performance-Portable Inference Applications
|
||||
Although inference performed in OpenVINO Runtime can be configured with a multitude of low-level performance settings, it is not recommended in most cases. Firstly, achieving the best performance with such adjustments requires deep understanding of device architecture and the inference engine.
|
||||
|
||||
|
||||
Secondly, such optimization may not translate well to other device-model combinations. In other words, one set of execution parameters is likely to result in different performance when used under different conditions. For example:
|
||||
* both the CPU and GPU support the notion of [streams](./dldt_deployment_optimization_tput_advanced.md), yet they deduce their optimal number very differently.
|
||||
* Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU.
|
||||
* Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization.
|
||||
* Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, [do not always map well to the CPU](dldt_deployment_optimization_internals.md).
|
||||
|
||||
|
||||
* both the CPU and GPU support the notion of [streams](./dldt_deployment_optimization_tput_advanced.md), yet they deduce their optimal number very differently.
|
||||
* Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU.
|
||||
* Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization.
|
||||
* Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, [do not always map well to the CPU](dldt_deployment_optimization_internals.md).
|
||||
|
||||
|
||||
To make the configuration process much easier and its performance optimization more portable, the option of [Performance Hints](../OV_Runtime_UG/performance_hints.md) has been introduced. It comprises two high-level "presets" focused on either **latency** or **throughput** and, essentially, makes execution specifics irrelevant.
|
||||
|
||||
The Performance Hints functionality makes configuration transparent to the application, for example, anticipates the need for explicit (application-side) batching or streams, and facilitates parallel processing of separate infer requests for different input sources
|
||||
The Performance Hints functionality makes configuration transparent to the application, for example, anticipates the need for explicit (application-side) batching or streams, and facilitates parallel processing of separate infer requests for different input sources
|
||||
|
||||
|
||||
## Additional Resources
|
||||
|
||||
* [Using Async API and running multiple inference requests in parallel to leverage throughput](@ref throughput_app_design).
|
||||
* [The throughput approach implementation details for specific devices](dldt_deployment_optimization_internals.md)
|
||||
* [The throughput approach implementation details for specific devices](dldt_deployment_optimization_internals.md)
|
||||
* [Details on throughput](dldt_deployment_optimization_tput.md)
|
||||
* [Details on latency](dldt_deployment_optimization_latency.md)
|
||||
* [API examples and details](../OV_Runtime_UG/performance_hints.md).
|
||||
|
@ -6,7 +6,7 @@ using namespace InferenceEngine;
|
||||
ov::Core core;
|
||||
auto network = core.read_model("sample.xml");
|
||||
auto exec_network = core.compile_model(network, "CPU");
|
||||
auto inference_precision = exec_network.get_property(ov::hint::inference_precision);
|
||||
auto inference_precision = exec_network.get_property(ov::inference_precision);
|
||||
//! [part1]
|
||||
|
||||
return 0;
|
||||
|
@ -4,7 +4,7 @@ int main() {
|
||||
using namespace InferenceEngine;
|
||||
//! [part2]
|
||||
ov::Core core;
|
||||
core.set_property("CPU", ov::hint::inference_precision(ov::element::f32));
|
||||
core.set_property("CPU", ov::inference_precision(ov::element::f32));
|
||||
//! [part2]
|
||||
|
||||
return 0;
|
||||
|
@ -49,7 +49,7 @@ auto compiled_model = core.compile_model(model, "HETERO",
|
||||
// profiling is enabled only for GPU
|
||||
ov::device::properties("GPU", ov::enable_profiling(true)),
|
||||
// FP32 inference precision only for CPU
|
||||
ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32))
|
||||
ov::device::properties("CPU", ov::inference_precision(ov::element::f32))
|
||||
);
|
||||
//! [configure_fallback_devices]
|
||||
}
|
||||
|
@ -19,7 +19,7 @@ auto model = core.read_model("sample.xml");
|
||||
//! [compile_model_with_property]
|
||||
auto compiled_model = core.compile_model(model, "CPU",
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::hint::inference_precision(ov::element::f32));
|
||||
ov::inference_precision(ov::element::f32));
|
||||
//! [compile_model_with_property]
|
||||
}
|
||||
|
||||
|
@ -25,7 +25,7 @@ auto model = core.read_model("sample.xml");
|
||||
auto compiled_model = core.compile_model(model, "MULTI",
|
||||
ov::device::priorities("GPU", "CPU"),
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::hint::inference_precision(ov::element::f32));
|
||||
ov::inference_precision(ov::element::f32));
|
||||
//! [core_compile_model]
|
||||
|
||||
//! [compiled_model_set_property]
|
||||
|
@ -500,13 +500,13 @@ int main(int argc, char* argv[]) {
|
||||
auto it_device_infer_precision = device_infer_precision.find(device);
|
||||
if (it_device_infer_precision != device_infer_precision.end()) {
|
||||
// set to user defined value
|
||||
if (supported(ov::hint::inference_precision.name())) {
|
||||
device_config.emplace(ov::hint::inference_precision(it_device_infer_precision->second));
|
||||
if (supported(ov::inference_precision.name())) {
|
||||
device_config.emplace(ov::inference_precision(it_device_infer_precision->second));
|
||||
} else if (device == "MULTI" || device == "AUTO") {
|
||||
// check if the element contains the hardware device property
|
||||
auto value_vec = split(it_device_infer_precision->second, ' ');
|
||||
if (value_vec.size() == 1) {
|
||||
auto key = ov::hint::inference_precision.name();
|
||||
auto key = ov::inference_precision.name();
|
||||
device_config[key] = it_device_infer_precision->second;
|
||||
} else {
|
||||
// set device inference_precison properties in the AUTO/MULTI plugin
|
||||
@ -523,16 +523,16 @@ int main(int argc, char* argv[]) {
|
||||
is_dev_set_property[it.first] = false;
|
||||
device_config.erase(it.first);
|
||||
device_config.insert(
|
||||
ov::device::properties(it.first, ov::hint::inference_precision(it.second)));
|
||||
ov::device::properties(it.first, ov::inference_precision(it.second)));
|
||||
} else {
|
||||
auto& property = device_config[it.first].as<ov::AnyMap>();
|
||||
property.emplace(ov::hint::inference_precision(it.second));
|
||||
property.emplace(ov::inference_precision(it.second));
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
throw std::logic_error("Device " + device + " doesn't support config key '" +
|
||||
ov::hint::inference_precision.name() + "'! " +
|
||||
ov::inference_precision.name() + "'! " +
|
||||
"Please specify -infer_precision for correct devices in format "
|
||||
"<dev1>:<infer_precision1>,<dev2>:<infer_precision2>" +
|
||||
" or via configuration file.");
|
||||
|
@ -220,7 +220,7 @@ int main(int argc, char* argv[]) {
|
||||
gnaPluginConfig[ov::intel_gna::scale_factors_per_input.name()] = scale_factors_per_input;
|
||||
}
|
||||
}
|
||||
gnaPluginConfig[ov::hint::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
|
||||
gnaPluginConfig[ov::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
|
||||
auto parse_target = [&](const std::string& target) -> ov::intel_gna::HWGeneration {
|
||||
auto hw_target = ov::intel_gna::HWGeneration::UNDEFINED;
|
||||
|
||||
|
@ -38,6 +38,7 @@ void regmodule_properties(py::module m) {
|
||||
wrap_property_RO(m_properties, ov::optimal_batch_size, "optimal_batch_size");
|
||||
wrap_property_RO(m_properties, ov::max_batch_size, "max_batch_size");
|
||||
wrap_property_RO(m_properties, ov::range_for_async_infer_requests, "range_for_async_infer_requests");
|
||||
wrap_property_RW(m_properties, ov::inference_precision, "inference_precision");
|
||||
|
||||
// Submodule hint
|
||||
py::module m_hint =
|
||||
|
@ -199,6 +199,7 @@ def test_properties_ro(ov_property_ro, expected_value):
|
||||
((properties.Affinity.NONE, properties.Affinity.NONE),),
|
||||
),
|
||||
(properties.force_tbb_terminate, "FORCE_TBB_TERMINATE", ((True, True),)),
|
||||
(properties.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
|
||||
(properties.hint.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
|
||||
(
|
||||
properties.hint.model_priority,
|
||||
@ -362,7 +363,7 @@ def test_single_property_setting(device):
|
||||
properties.cache_dir("./"),
|
||||
properties.inference_num_threads(9),
|
||||
properties.affinity(properties.Affinity.NONE),
|
||||
properties.hint.inference_precision(Type.f32),
|
||||
properties.inference_precision(Type.f32),
|
||||
properties.hint.performance_mode(properties.hint.PerformanceMode.LATENCY),
|
||||
properties.hint.num_requests(12),
|
||||
properties.streams.num(5),
|
||||
@ -374,7 +375,7 @@ def test_single_property_setting(device):
|
||||
properties.cache_dir(): "./",
|
||||
properties.inference_num_threads(): 9,
|
||||
properties.affinity(): properties.Affinity.NONE,
|
||||
properties.hint.inference_precision(): Type.f32,
|
||||
properties.inference_precision(): Type.f32,
|
||||
properties.hint.performance_mode(): properties.hint.PerformanceMode.LATENCY,
|
||||
properties.hint.num_requests(): 12,
|
||||
properties.streams.num(): 5,
|
||||
|
@ -228,16 +228,22 @@ static constexpr Property<std::string, PropertyMutability::RO> model_name{"NETWO
|
||||
static constexpr Property<uint32_t, PropertyMutability::RO> optimal_number_of_infer_requests{
|
||||
"OPTIMAL_NUMBER_OF_INFER_REQUESTS"};
|
||||
|
||||
/**
|
||||
* @brief Hint for device to use specified precision for inference
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
|
||||
|
||||
/**
|
||||
* @brief Namespace with hint properties
|
||||
*/
|
||||
namespace hint {
|
||||
|
||||
/**
|
||||
* @brief Hint for device to use specified precision for inference
|
||||
* @brief An alias for inference_precision property for backward compatibility
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
|
||||
using ov::inference_precision;
|
||||
|
||||
/**
|
||||
* @brief Enum to define possible priorities hints
|
||||
@ -360,6 +366,56 @@ static constexpr Property<std::shared_ptr<ov::Model>> model{"MODEL_PTR"};
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
static constexpr Property<bool, PropertyMutability::RW> allow_auto_batching{"ALLOW_AUTO_BATCHING"};
|
||||
|
||||
/**
|
||||
* @brief Enum to define possible execution mode hints
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
enum class ExecutionMode {
|
||||
UNDEFINED = -1, //!< Undefined value, settings may vary from device to device
|
||||
PERFORMANCE = 1, //!< Optimize for max performance
|
||||
ACCURACY = 2, //!< Optimize for max accuracy
|
||||
};
|
||||
|
||||
/** @cond INTERNAL */
|
||||
inline std::ostream& operator<<(std::ostream& os, const ExecutionMode& mode) {
|
||||
switch (mode) {
|
||||
case ExecutionMode::UNDEFINED:
|
||||
return os << "UNDEFINED";
|
||||
case ExecutionMode::PERFORMANCE:
|
||||
return os << "PERFORMANCE";
|
||||
case ExecutionMode::ACCURACY:
|
||||
return os << "ACCURACY";
|
||||
default:
|
||||
throw ov::Exception{"Unsupported execution mode hint"};
|
||||
}
|
||||
}
|
||||
|
||||
inline std::istream& operator>>(std::istream& is, ExecutionMode& mode) {
|
||||
std::string str;
|
||||
is >> str;
|
||||
if (str == "PERFORMANCE") {
|
||||
mode = ExecutionMode::PERFORMANCE;
|
||||
} else if (str == "ACCURACY") {
|
||||
mode = ExecutionMode::ACCURACY;
|
||||
} else if (str == "UNDEFINED") {
|
||||
mode = ExecutionMode::UNDEFINED;
|
||||
} else {
|
||||
throw ov::Exception{"Unsupported execution mode: " + str};
|
||||
}
|
||||
return is;
|
||||
}
|
||||
/** @endcond */
|
||||
|
||||
/**
|
||||
* @brief High-level OpenVINO Execution hint
|
||||
* unlike low-level properties that are individual (per-device), the hints are something that every device accepts
|
||||
* and turns into device-specific settings
|
||||
* Execution mode hint controls preferred optimization targets (performance or accuracy) for given model
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
static constexpr Property<ExecutionMode> execution_mode{"EXECUTION_MODE_HINT"};
|
||||
|
||||
} // namespace hint
|
||||
|
||||
/**
|
||||
|
@ -150,7 +150,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
|
||||
IE_THROW() << "Wrong value for property key " << PluginConfigParams::KEY_ENFORCE_BF16
|
||||
<< ". Expected only YES/NO";
|
||||
}
|
||||
} else if (key == ov::hint::inference_precision.name()) {
|
||||
} else if (key == ov::inference_precision.name()) {
|
||||
if (val == "bf16") {
|
||||
if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) {
|
||||
enforceBF16 = true;
|
||||
@ -162,7 +162,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
|
||||
enforceBF16 = false;
|
||||
manualEnforceBF16 = false;
|
||||
} else {
|
||||
IE_THROW() << "Wrong value for property key " << ov::hint::inference_precision.name()
|
||||
IE_THROW() << "Wrong value for property key " << ov::inference_precision.name()
|
||||
<< ". Supported values: bf16, f32";
|
||||
}
|
||||
} else if (key == PluginConfigParams::KEY_CACHE_DIR) {
|
||||
@ -266,4 +266,3 @@ void Config::updateProperties() {
|
||||
|
||||
} // namespace intel_cpu
|
||||
} // namespace ov
|
||||
|
||||
|
@ -305,7 +305,7 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
|
||||
RO_property(ov::affinity.name()),
|
||||
RO_property(ov::inference_num_threads.name()),
|
||||
RO_property(ov::enable_profiling.name()),
|
||||
RO_property(ov::hint::inference_precision.name()),
|
||||
RO_property(ov::inference_precision.name()),
|
||||
RO_property(ov::hint::performance_mode.name()),
|
||||
RO_property(ov::hint::num_requests.name()),
|
||||
RO_property(ov::execution_devices.name()),
|
||||
@ -341,10 +341,10 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
|
||||
} else if (name == ov::enable_profiling.name()) {
|
||||
const bool perfCount = config.collectPerfCounters;
|
||||
return decltype(ov::enable_profiling)::value_type(perfCount);
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
} else if (name == ov::inference_precision) {
|
||||
const auto enforceBF16 = config.enforceBF16;
|
||||
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
|
||||
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
|
||||
return decltype(ov::inference_precision)::value_type(inference_precision);
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
const auto perfHint = ov::util::from_string(config.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
|
||||
return perfHint;
|
||||
|
@ -505,10 +505,10 @@ Parameter Engine::GetConfig(const std::string& name, const std::map<std::string,
|
||||
} else if (name == ov::enable_profiling.name()) {
|
||||
const bool perfCount = engConfig.collectPerfCounters;
|
||||
return decltype(ov::enable_profiling)::value_type(perfCount);
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
} else if (name == ov::inference_precision) {
|
||||
const auto enforceBF16 = engConfig.enforceBF16;
|
||||
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
|
||||
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
|
||||
return decltype(ov::inference_precision)::value_type(inference_precision);
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
const auto perfHint = ov::util::from_string(engConfig.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
|
||||
return perfHint;
|
||||
@ -594,7 +594,7 @@ Parameter Engine::GetMetric(const std::string& name, const std::map<std::string,
|
||||
RW_property(ov::affinity.name()),
|
||||
RW_property(ov::inference_num_threads.name()),
|
||||
RW_property(ov::enable_profiling.name()),
|
||||
RW_property(ov::hint::inference_precision.name()),
|
||||
RW_property(ov::inference_precision.name()),
|
||||
RW_property(ov::hint::performance_mode.name()),
|
||||
RW_property(ov::hint::num_requests.name()),
|
||||
};
|
||||
|
@ -224,14 +224,21 @@ TEST(OVClassBasicTest, smoke_SetConfigHintInferencePrecision) {
|
||||
auto value = ov::element::f32;
|
||||
const auto precision = InferenceEngine::with_cpu_x86_bfloat16() ? ov::element::bf16 : ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
|
||||
ASSERT_EQ(precision, value);
|
||||
|
||||
const auto forcedPrecision = ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forcedPrecision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::inference_precision(forcedPrecision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
|
||||
ASSERT_EQ(value, forcedPrecision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
const auto forced_precision_deprecated = ov::element::f32;
|
||||
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forced_precision_deprecated)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
|
||||
ASSERT_EQ(value, forced_precision_deprecated);
|
||||
OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
}
|
||||
|
||||
TEST(OVClassBasicTest, smoke_SetConfigEnableProfiling) {
|
||||
|
@ -185,7 +185,7 @@ OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
}
|
||||
} else if (key == ov::hint::performance_mode) {
|
||||
performance_mode = ov::util::from_string(value, ov::hint::performance_mode);
|
||||
} else if (key == ov::hint::inference_precision) {
|
||||
} else if (key == ov::inference_precision) {
|
||||
std::stringstream ss(value);
|
||||
ss >> inference_precision;
|
||||
if ((inference_precision != ov::element::i8) && (inference_precision != ov::element::i16)) {
|
||||
@ -194,7 +194,7 @@ OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
}
|
||||
gnaPrecision = (inference_precision == ov::element::i8) ? Precision::I8 : Precision::I16;
|
||||
} else if (key == GNA_CONFIG_KEY(PRECISION)) {
|
||||
check_compatibility(ov::hint::inference_precision.name());
|
||||
check_compatibility(ov::inference_precision.name());
|
||||
auto precision = Precision::FromStr(value);
|
||||
if (precision != Precision::I8 && precision != Precision::I16) {
|
||||
THROW_GNA_EXCEPTION << "Unsupported precision of GNA hardware, should be Int16 or Int8, but was: "
|
||||
@ -329,7 +329,7 @@ void Config::AdjustKeyMapValues() {
|
||||
gnaFlags.exclusive_async_requests ? PluginConfigParams::YES: PluginConfigParams::NO;
|
||||
keyConfigMap[ov::hint::performance_mode.name()] = ov::util::to_string(performance_mode);
|
||||
if (inference_precision != ov::element::undefined) {
|
||||
keyConfigMap[ov::hint::inference_precision.name()] = ov::util::to_string(inference_precision);
|
||||
keyConfigMap[ov::inference_precision.name()] = ov::util::to_string(inference_precision);
|
||||
} else {
|
||||
keyConfigMap[GNA_CONFIG_KEY(PRECISION)] = gnaPrecision.name();
|
||||
}
|
||||
@ -370,7 +370,7 @@ Parameter Config::GetParameter(const std::string& name) const {
|
||||
ov::intel_gna::HWGeneration::UNDEFINED);
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
return performance_mode;
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
} else if (name == ov::inference_precision) {
|
||||
return inference_precision;
|
||||
} else {
|
||||
auto result = keyConfigMap.find(name);
|
||||
@ -399,7 +399,7 @@ const Parameter Config::GetSupportedProperties(bool compiled) {
|
||||
{ ov::intel_gna::pwl_design_algorithm.name(), model_mutability },
|
||||
{ ov::intel_gna::pwl_max_error_percent.name(), model_mutability },
|
||||
{ ov::hint::performance_mode.name(), ov::PropertyMutability::RW },
|
||||
{ ov::hint::inference_precision.name(), model_mutability },
|
||||
{ ov::inference_precision.name(), model_mutability },
|
||||
{ ov::hint::num_requests.name(), model_mutability },
|
||||
{ ov::log::level.name(), ov::PropertyMutability::RW },
|
||||
{ ov::execution_devices.name(), ov::PropertyMutability::RO },
|
||||
|
@ -173,7 +173,7 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
::testing::Combine(
|
||||
::testing::Values("GNA"),
|
||||
::testing::Values(ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8),
|
||||
ov::inference_precision(ngraph::element::i8),
|
||||
ov::hint::num_requests(2),
|
||||
ov::intel_gna::pwl_design_algorithm(ov::intel_gna::PWLDesignAlgorithm::UNIFORM_DISTRIBUTION),
|
||||
ov::intel_gna::pwl_max_error_percent(0.2),
|
||||
|
@ -110,30 +110,35 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
|
||||
ov::Core core;
|
||||
ov::element::Type precision;
|
||||
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
ASSERT_EQ(ov::element::undefined, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i8)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
ASSERT_EQ(ov::element::i8, precision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i8)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::i8, precision);
|
||||
OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i16)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i16)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
ASSERT_EQ(ov::element::i16, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I8"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I8"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
ASSERT_EQ(ov::element::i8, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I16"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I16"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
ASSERT_EQ(ov::element::i16, precision);
|
||||
|
||||
ASSERT_THROW(core.set_property("GNA", { ov::hint::inference_precision(ov::element::i8),
|
||||
ASSERT_THROW(core.set_property("GNA", { ov::inference_precision(ov::element::i8),
|
||||
{ GNA_CONFIG_KEY(PRECISION), "I16"}}), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i32)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::undefined)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "ABC"}}), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i32)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::undefined)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "ABC"}}), ov::Exception);
|
||||
}
|
||||
|
||||
TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPerformanceHint) {
|
||||
|
@ -169,7 +169,7 @@ protected:
|
||||
TEST_F(GNAExportImportTest, ExportImportI16) {
|
||||
const ov::AnyMap gna_config = {
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::hint::inference_precision(ngraph::element::i16)
|
||||
ov::inference_precision(ngraph::element::i16)
|
||||
};
|
||||
exported_file_name = "export_test.bin";
|
||||
ExportModel(exported_file_name, gna_config);
|
||||
@ -179,7 +179,7 @@ TEST_F(GNAExportImportTest, ExportImportI16) {
|
||||
TEST_F(GNAExportImportTest, ExportImportI8) {
|
||||
const ov::AnyMap gna_config = {
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::hint::inference_precision(ngraph::element::i8)
|
||||
ov::inference_precision(ngraph::element::i8)
|
||||
};
|
||||
exported_file_name = "export_test.bin";
|
||||
ExportModel(exported_file_name, gna_config);
|
||||
@ -202,4 +202,4 @@ TEST_F(GNAExportImportTest, ShowLibVersionFromModelInLogDebugMode) {
|
||||
const ov::AnyMap gna_config = {ov::log::level(ov::log::Level::DEBUG)};
|
||||
EXPECT_THAT(ExportImportModelWithLogLevel(gna_config),
|
||||
HasSubstr(ov::intel_gna::common::get_openvino_version_string()));
|
||||
}
|
||||
}
|
||||
|
@ -90,7 +90,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestDefault) {
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
|
||||
Run({
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::hint::inference_precision(ngraph::element::i16)
|
||||
ov::inference_precision(ngraph::element::i16)
|
||||
});
|
||||
compare(ngraph::element::i16, ngraph::element::i32, sizeof(int16_t), sizeof(uint32_t));
|
||||
}
|
||||
@ -98,7 +98,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
|
||||
Run({
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::hint::inference_precision(ngraph::element::i8)
|
||||
ov::inference_precision(ngraph::element::i8)
|
||||
});
|
||||
compare(ngraph::element::i16, ngraph::element::i32, sizeof(int8_t), Precision::fromType<gna_compound_bias_t>().size());
|
||||
}
|
||||
@ -106,7 +106,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8LP) {
|
||||
Run({
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::hint::inference_precision(ngraph::element::i8)
|
||||
ov::inference_precision(ngraph::element::i8)
|
||||
}, true);
|
||||
compare(ngraph::element::i8, ngraph::element::i32, sizeof(int8_t), sizeof(int8_t));
|
||||
}
|
||||
|
@ -117,13 +117,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.125f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -141,13 +141,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestFp32to
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
@ -189,13 +189,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -214,13 +214,13 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestI16toI
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map,
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 10.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 20.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
@ -239,10 +239,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI1
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -261,10 +261,10 @@ INSTANTIATE_TEST_SUITE_P(GNAInputPrecisionTestSuite, GNAInputPrecisionTestU8toI8
|
||||
::testing::ValuesIn(std::vector<ov::AnyMap> { // gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
|
@ -142,6 +142,7 @@ public:
|
||||
|
||||
protected:
|
||||
void apply_hints(const cldnn::device_info& info);
|
||||
void apply_execution_hints(const cldnn::device_info& info);
|
||||
void apply_performance_hints(const cldnn::device_info& info);
|
||||
void apply_priority_hints(const cldnn::device_info& info);
|
||||
void apply_debug_options(const cldnn::device_info& info);
|
||||
|
@ -473,10 +473,11 @@ InferenceEngine::Parameter CompiledModel::GetMetric(const std::string &name) con
|
||||
ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::device::id.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::execution_devices.name(), PropertyMutability::RO}
|
||||
};
|
||||
|
@ -13,7 +13,7 @@ bool LegacyAPIHelper::is_new_api_property(const std::pair<std::string, ov::Any>&
|
||||
static const std::vector<std::string> new_properties_list = {
|
||||
ov::intel_gpu::hint::queue_priority.name(),
|
||||
ov::intel_gpu::hint::queue_throttle.name(),
|
||||
ov::hint::inference_precision.name(),
|
||||
ov::inference_precision.name(),
|
||||
ov::compilation_num_threads.name(),
|
||||
ov::num_streams.name(),
|
||||
};
|
||||
|
@ -581,10 +581,11 @@ std::vector<ov::PropertyName> Plugin::get_supported_properties() const {
|
||||
ov::PropertyName{ov::intel_gpu::enable_loop_unrolling.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::cache_dir.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::performance_mode.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::execution_mode.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::device::id.name(), PropertyMutability::RW},
|
||||
};
|
||||
|
||||
|
@ -206,7 +206,7 @@ void TransformationsPipeline::apply(std::shared_ptr<ov::Model> func) {
|
||||
};
|
||||
|
||||
// Add conversion from FP data types to infer precision if it's specified
|
||||
auto infer_precision = config.get_property(ov::hint::inference_precision);
|
||||
auto infer_precision = config.get_property(ov::inference_precision);
|
||||
if (infer_precision != ov::element::undefined) {
|
||||
if (!fp_precision_supported(infer_precision))
|
||||
infer_precision = fallback_precision;
|
||||
|
@ -40,9 +40,10 @@ void ExecutionConfig::set_default() {
|
||||
std::make_tuple(ov::cache_dir, ""),
|
||||
std::make_tuple(ov::num_streams, 1),
|
||||
std::make_tuple(ov::compilation_num_threads, std::max(1, static_cast<int>(std::thread::hardware_concurrency()))),
|
||||
std::make_tuple(ov::hint::inference_precision, ov::element::f16, InferencePrecisionValidator()),
|
||||
std::make_tuple(ov::inference_precision, ov::element::f16, InferencePrecisionValidator()),
|
||||
std::make_tuple(ov::hint::model_priority, ov::hint::Priority::MEDIUM),
|
||||
std::make_tuple(ov::hint::performance_mode, ov::hint::PerformanceMode::LATENCY, PerformanceModeValidator()),
|
||||
std::make_tuple(ov::hint::execution_mode, ov::hint::ExecutionMode::PERFORMANCE),
|
||||
std::make_tuple(ov::hint::num_requests, 0),
|
||||
|
||||
std::make_tuple(ov::intel_gpu::hint::host_task_priority, ov::hint::Priority::MEDIUM),
|
||||
@ -119,6 +120,22 @@ Any ExecutionConfig::get_property(const std::string& name) const {
|
||||
return internal_properties.at(name);
|
||||
}
|
||||
|
||||
void ExecutionConfig::apply_execution_hints(const cldnn::device_info& info) {
|
||||
if (is_set_by_user(ov::hint::execution_mode)) {
|
||||
const auto mode = get_property(ov::hint::execution_mode);
|
||||
if (!is_set_by_user(ov::inference_precision)) {
|
||||
if (mode == ov::hint::ExecutionMode::ACCURACY) {
|
||||
set_property(ov::inference_precision(ov::element::f32));
|
||||
} else if (mode == ov::hint::ExecutionMode::PERFORMANCE) {
|
||||
if (info.supports_fp16)
|
||||
set_property(ov::inference_precision(ov::element::f16));
|
||||
else
|
||||
set_property(ov::inference_precision(ov::element::f32));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void ExecutionConfig::apply_performance_hints(const cldnn::device_info& info) {
|
||||
if (is_set_by_user(ov::hint::performance_mode)) {
|
||||
const auto mode = get_property(ov::hint::performance_mode);
|
||||
@ -158,6 +175,7 @@ void ExecutionConfig::apply_debug_options(const cldnn::device_info& info) {
|
||||
}
|
||||
|
||||
void ExecutionConfig::apply_hints(const cldnn::device_info& info) {
|
||||
apply_execution_hints(info);
|
||||
apply_performance_hints(info);
|
||||
apply_priority_hints(info);
|
||||
apply_debug_options(info);
|
||||
|
@ -37,9 +37,9 @@ TEST_P(InferencePrecisionTests, smoke_canSetInferencePrecisionAndInfer) {
|
||||
ov::element::Type model_precision;
|
||||
ov::element::Type inference_precision;
|
||||
std::tie(model_precision, inference_precision) = GetParam();
|
||||
auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice("GPU", {1, 1, 32, 32}, model_precision);
|
||||
auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(CommonTestUtils::DEVICE_GPU, {1, 1, 32, 32}, model_precision);
|
||||
ov::CompiledModel compiled_model;
|
||||
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, "GPU", ov::hint::inference_precision(inference_precision)));
|
||||
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::inference_precision(inference_precision)));
|
||||
auto req = compiled_model.create_infer_request();
|
||||
OV_ASSERT_NO_THROW(req.infer());
|
||||
}
|
||||
@ -52,3 +52,35 @@ static const std::vector<params> test_params = {
|
||||
};
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(smoke_GPU_BehaviorTests, InferencePrecisionTests, ::testing::ValuesIn(test_params), InferencePrecisionTests::getTestCaseName);
|
||||
|
||||
TEST(InferencePrecisionTests, CantSetInvalidInferencePrecision) {
|
||||
ov::Core core;
|
||||
|
||||
ASSERT_NO_THROW(core.get_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision));
|
||||
ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::bf16)));
|
||||
ASSERT_ANY_THROW(core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::undefined)));
|
||||
}
|
||||
|
||||
TEST(ExecutionModeTest, SetCompileGetInferPrecisionAndExecMode) {
|
||||
ov::Core core;
|
||||
|
||||
core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
|
||||
auto model = ngraph::builder::subgraph::makeConvPoolRelu();
|
||||
{
|
||||
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
|
||||
ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
|
||||
}
|
||||
|
||||
{
|
||||
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, compiled_model.get_property(ov::hint::execution_mode));
|
||||
ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
|
||||
}
|
||||
|
||||
{
|
||||
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU);
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
|
||||
ASSERT_EQ(ov::element::f16, compiled_model.get_property(ov::hint::inference_precision));
|
||||
}
|
||||
}
|
||||
|
@ -55,7 +55,7 @@ TEST_P(OVConcurrencyTest, canInferTwoExecNets) {
|
||||
auto fn = fn_ptrs[i];
|
||||
|
||||
auto exec_net = ie.compile_model(fn_ptrs[i], CommonTestUtils::DEVICE_GPU,
|
||||
ov::num_streams(num_streams), ov::hint::inference_precision(ov::element::f32));
|
||||
ov::num_streams(num_streams), ov::inference_precision(ov::element::f32));
|
||||
|
||||
auto input = fn_ptrs[i]->get_parameters().at(0);
|
||||
auto output = fn_ptrs[i]->get_results().at(0);
|
||||
@ -115,7 +115,7 @@ TEST(canSwapTensorsBetweenInferRequests, inputs) {
|
||||
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
|
||||
const int infer_requests_num = 2;
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
@ -193,7 +193,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, usmHostIsNotChanged) {
|
||||
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
|
||||
@ -232,7 +232,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, canSetSystemHostTensor) {
|
||||
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
|
||||
@ -258,7 +258,7 @@ TEST(canSwapTensorsBetweenInferRequests, outputs) {
|
||||
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
|
||||
const int infer_requests_num = 2;
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
|
@ -40,7 +40,7 @@ public:
|
||||
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "0"},
|
||||
};
|
||||
}
|
||||
config.insert({ov::hint::inference_precision.name(), "f32"});
|
||||
config.insert({ov::inference_precision.name(), "f32"});
|
||||
fn_ptr = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(with_auto_batching ? CommonTestUtils::DEVICE_BATCH : deviceName);
|
||||
}
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||
@ -230,7 +230,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -277,7 +277,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -305,7 +305,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -375,7 +375,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -404,7 +404,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -469,7 +469,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -498,7 +498,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -601,7 +601,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
|
||||
|
||||
/* XXX: is it correct to set KEY_CLDNN_NV12_TWO_INPUTS in case of remote blob? */
|
||||
auto exec_net_b = ie.LoadNetwork(net_remote, CommonTestUtils::DEVICE_GPU,
|
||||
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::hint::inference_precision.name(), "f32"} });
|
||||
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::inference_precision.name(), "f32"} });
|
||||
auto inf_req_remote = exec_net_b.CreateInferRequest();
|
||||
auto cldnn_context = exec_net_b.GetContext();
|
||||
cl_context ctx = std::dynamic_pointer_cast<ClContext>(cldnn_context)->get();
|
||||
@ -670,7 +670,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
|
||||
net_local.getInputsInfo().begin()->second->setPrecision(Precision::U8);
|
||||
net_local.getInputsInfo().begin()->second->getPreProcess().setColorFormat(ColorFormat::NV12);
|
||||
|
||||
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
auto inf_req_local = exec_net_b1.CreateInferRequest();
|
||||
|
||||
@ -742,7 +742,7 @@ TEST_P(TwoNets_Test, canInferTwoExecNets) {
|
||||
|
||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU,
|
||||
{{PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS, std::to_string(num_streams)},
|
||||
{ov::hint::inference_precision.name(), "f32"}});
|
||||
{ov::inference_precision.name(), "f32"}});
|
||||
|
||||
for (int j = 0; j < num_streams * num_requests; j++) {
|
||||
outputs.push_back(net.getOutputsInfo().begin()->first);
|
||||
|
@ -87,6 +87,10 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
smoke_OVClassSetModelPriorityConfigTest, OVClassSetModelPriorityConfigTest,
|
||||
::testing::Values("MULTI", "AUTO"));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
smoke_OVClassSetExecutionModeHintConfigTest, OVClassSetExecutionModeHintConfigTest,
|
||||
::testing::Values(CommonTestUtils::DEVICE_GPU));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(
|
||||
smoke_OVClassSetTBBForceTerminatePropertyTest, OVClassSetTBBForceTerminatePropertyTest,
|
||||
::testing::Values("CPU", "GPU"));
|
||||
@ -346,14 +350,21 @@ TEST_P(OVClassGetPropertyTest_GPU, GetAndSetInferencePrecisionNoThrow) {
|
||||
auto value = ov::element::undefined;
|
||||
const auto expected_default_precision = ov::element::f16;
|
||||
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
|
||||
ASSERT_EQ(expected_default_precision, value);
|
||||
|
||||
const auto forced_precision = ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
|
||||
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::inference_precision(forced_precision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
|
||||
ASSERT_EQ(value, forced_precision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
const auto forced_precision_deprecated = ov::element::f16;
|
||||
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision_deprecated)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
|
||||
ASSERT_EQ(value, forced_precision_deprecated);
|
||||
OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
}
|
||||
|
||||
TEST_P(OVClassGetPropertyTest_GPU, GetAndSetModelPriorityNoThrow) {
|
||||
@ -715,6 +726,9 @@ const std::vector<ov::AnyMap> gpuCorrectConfigs = {
|
||||
|
||||
auto gpuCorrectConfigsWithSecondaryProperties = []() {
|
||||
return std::vector<ov::AnyMap>{
|
||||
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
|
||||
ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE),
|
||||
ov::inference_precision(ov::element::f32))},
|
||||
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::hint::allow_auto_batching(false))},
|
||||
|
@ -119,6 +119,7 @@ using OVClassLoadNetworkAfterCoreRecreateTest = OVClassBaseTestP;
|
||||
using OVClassLoadNetworkTest = OVClassQueryNetworkTest;
|
||||
using OVClassSetGlobalConfigTest = OVClassBaseTestP;
|
||||
using OVClassSetModelPriorityConfigTest = OVClassBaseTestP;
|
||||
using OVClassSetExecutionModeHintConfigTest = OVClassBaseTestP;
|
||||
using OVClassSetTBBForceTerminatePropertyTest = OVClassBaseTestP;
|
||||
using OVClassSetLogLevelConfigTest = OVClassBaseTestP;
|
||||
using OVClassSpecificDeviceTestSetConfig = OVClassBaseTestP;
|
||||
@ -430,6 +431,22 @@ TEST_P(OVClassSetModelPriorityConfigTest, SetConfigNoThrow) {
|
||||
EXPECT_EQ(value, ov::hint::Priority::HIGH);
|
||||
}
|
||||
|
||||
TEST_P(OVClassSetExecutionModeHintConfigTest, SetConfigNoThrow) {
|
||||
ov::Core ie = createCoreWithTemplate();
|
||||
|
||||
OV_ASSERT_PROPERTY_SUPPORTED(ov::hint::execution_mode);
|
||||
|
||||
ov::hint::ExecutionMode defaultMode{};
|
||||
ASSERT_NO_THROW(defaultMode = ie.get_property(target_device, ov::hint::execution_mode));
|
||||
|
||||
ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::UNDEFINED));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::UNDEFINED, ie.get_property(target_device, ov::hint::execution_mode));
|
||||
ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::ACCURACY, ie.get_property(target_device, ov::hint::execution_mode));
|
||||
ie.set_property(target_device, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, ie.get_property(target_device, ov::hint::execution_mode));
|
||||
}
|
||||
|
||||
TEST_P(OVClassSetDevicePriorityConfigTest, SetConfigAndCheckGetConfigNoThrow) {
|
||||
ov::Core ie = createCoreWithTemplate();
|
||||
std::string devicePriority;
|
||||
|
@ -36,7 +36,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNonContiguousAxes
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
|
||||
@ -56,7 +56,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNormalizeOverAllA
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
|
||||
@ -76,7 +76,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForNotSorted) {
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
|
||||
@ -96,7 +96,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForSingleAxis) {
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
|
||||
|
@ -225,7 +225,7 @@ void SubgraphBaseTest::compile_model() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
configuration.insert({ov::hint::inference_precision.name(), hint});
|
||||
configuration.insert({ov::inference_precision.name(), hint});
|
||||
}
|
||||
|
||||
compiledModel = core->compile_model(function, targetDevice, configuration);
|
||||
|
@ -54,7 +54,7 @@ void SnippetsTestsCommon::validateOriginalLayersNamesByType(const std::string& l
|
||||
ASSERT_TRUE(false) << "Layer type '" << layerType << "' was not found in compiled model";
|
||||
}
|
||||
void SnippetsTestsCommon::setInferenceType(ov::element::Type type) {
|
||||
configuration.emplace(ov::hint::inference_precision(type));
|
||||
configuration.emplace(ov::inference_precision(type));
|
||||
}
|
||||
|
||||
} // namespace test
|
||||
|
Loading…
Reference in New Issue
Block a user