Revert inference precision to be a hint (#16634)
This commit is contained in:
parent
7d8f4af78a
commit
0250f62d11
@ -105,14 +105,14 @@ to query ``ov::device::capabilities`` property, which should contain ``BF16`` in
|
||||
:fragment: [part0]
|
||||
|
||||
|
||||
If the model has been converted to ``bf16``, the ``ov::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
|
||||
If the model has been converted to ``bf16``, the ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
|
||||
the ``ov::CompiledModel::get_property`` call. The code below demonstrates how to get the element type:
|
||||
|
||||
.. doxygensnippet:: snippets/cpu/Bfloat16Inference1.cpp
|
||||
:language: py
|
||||
:fragment: [part1]
|
||||
|
||||
To infer the model in ``f32`` precision instead of ``bf16`` on targets with native ``bf16`` support, set the ``ov::inference_precision`` to ``ov::element::f32``.
|
||||
To infer the model in ``f32`` precision instead of ``bf16`` on targets with native ``bf16`` support, set the ``ov::hint::inference_precision`` to ``ov::element::f32``.
|
||||
|
||||
|
||||
.. tab-set::
|
||||
@ -134,11 +134,11 @@ To infer the model in ``f32`` precision instead of ``bf16`` on targets with nati
|
||||
|
||||
The ``Bfloat16`` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the
|
||||
native ``avx512_bf16`` instruction. This mode is used for development purposes and it does not guarantee good performance.
|
||||
To enable the simulation, the ``ov::inference_precision`` has to be explicitly set to ``ov::element::bf16``.
|
||||
To enable the simulation, the ``ov::hint::inference_precision`` has to be explicitly set to ``ov::element::bf16``.
|
||||
|
||||
.. note::
|
||||
|
||||
If ``ov::inference_precision`` is set to ``ov::element::bf16`` on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
|
||||
If ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
|
||||
|
||||
.. note::
|
||||
|
||||
@ -292,7 +292,7 @@ Read-write Properties
|
||||
All parameters must be set before calling ``ov::Core::compile_model()`` in order to take effect or passed as additional argument to ``ov::Core::compile_model()``
|
||||
|
||||
- ``ov::enable_profiling``
|
||||
- ``ov::inference_precision``
|
||||
- ``ov::hint::inference_precision``
|
||||
- ``ov::hint::performance_mode``
|
||||
- ``ov::hint::num_request``
|
||||
- ``ov::num_streams``
|
||||
|
@ -140,7 +140,7 @@ quantization hints based on statistics for the provided dataset.
|
||||
* Accuracy (i16 weights)
|
||||
* Performance (i8 weights)
|
||||
|
||||
For POT quantized models, the ``ov::inference_precision`` property has no effect except in cases described in the
|
||||
For POT quantized models, the ``ov::hint::inference_precision`` property has no effect except in cases described in the
|
||||
:ref:`Model and Operation Limitations section <#model-and-operation-limitations>`.
|
||||
|
||||
|
||||
@ -268,7 +268,7 @@ In order to take effect, the following parameters must be set before model compi
|
||||
|
||||
- ov::cache_dir
|
||||
- ov::enable_profiling
|
||||
- ov::inference_precision
|
||||
- ov::hint::inference_precision
|
||||
- ov::hint::num_requests
|
||||
- ov::intel_gna::compile_target
|
||||
- ov::intel_gna::firmware_model_image_path
|
||||
@ -354,7 +354,7 @@ Support for 2D Convolutions using POT
|
||||
For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
|
||||
|
||||
* All convolution parameters are natively supported by HW (see tables above).
|
||||
* The runtime precision is explicitly set by the ``ov::inference_precision`` property as ``i8`` for the models produced by
|
||||
* The runtime precision is explicitly set by the ``ov::hint::inference_precision`` property as ``i8`` for the models produced by
|
||||
the ``performance mode`` of POT, and as ``i16`` for the models produced by the ``accuracy mode`` of POT.
|
||||
|
||||
|
||||
|
@ -327,7 +327,7 @@ All parameters must be set before calling ``ov::Core::compile_model()`` in order
|
||||
- ov::hint::performance_mode
|
||||
- ov::hint::execution_mode
|
||||
- ov::hint::num_requests
|
||||
- ov::inference_precision
|
||||
- ov::hint::inference_precision
|
||||
- ov::num_streams
|
||||
- ov::compilation_num_threads
|
||||
- ov::device::id
|
||||
|
@ -16,7 +16,7 @@
|
||||
|
||||
|
||||
Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
|
||||
`ov::inference_precision <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gad605a888f3c9b7598ab55023fbf44240>`__ is a "typical runtime configuration" which trades accuracy for performance, allowing ``fp16/bf16`` execution for the layers that remain in ``fp32`` after quantization of the original ``fp32`` model.
|
||||
`ov::hint::inference_precision <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gad605a888f3c9b7598ab55023fbf44240>`__ is a "typical runtime configuration" which trades accuracy for performance, allowing ``fp16/bf16`` execution for the layers that remain in ``fp32`` after quantization of the original ``fp32`` model.
|
||||
|
||||
Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.
|
||||
|
||||
|
@ -6,7 +6,7 @@ using namespace InferenceEngine;
|
||||
ov::Core core;
|
||||
auto network = core.read_model("sample.xml");
|
||||
auto exec_network = core.compile_model(network, "CPU");
|
||||
auto inference_precision = exec_network.get_property(ov::inference_precision);
|
||||
auto inference_precision = exec_network.get_property(ov::hint::inference_precision);
|
||||
//! [part1]
|
||||
|
||||
return 0;
|
||||
|
@ -4,7 +4,7 @@ int main() {
|
||||
using namespace InferenceEngine;
|
||||
//! [part2]
|
||||
ov::Core core;
|
||||
core.set_property("CPU", ov::inference_precision(ov::element::f32));
|
||||
core.set_property("CPU", ov::hint::inference_precision(ov::element::f32));
|
||||
//! [part2]
|
||||
|
||||
return 0;
|
||||
|
@ -49,7 +49,7 @@ auto compiled_model = core.compile_model(model, "HETERO",
|
||||
// profiling is enabled only for GPU
|
||||
ov::device::properties("GPU", ov::enable_profiling(true)),
|
||||
// FP32 inference precision only for CPU
|
||||
ov::device::properties("CPU", ov::inference_precision(ov::element::f32))
|
||||
ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32))
|
||||
);
|
||||
//! [configure_fallback_devices]
|
||||
}
|
||||
|
@ -19,7 +19,7 @@ auto model = core.read_model("sample.xml");
|
||||
//! [compile_model_with_property]
|
||||
auto compiled_model = core.compile_model(model, "CPU",
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::inference_precision(ov::element::f32));
|
||||
ov::hint::inference_precision(ov::element::f32));
|
||||
//! [compile_model_with_property]
|
||||
}
|
||||
|
||||
|
@ -25,7 +25,7 @@ auto model = core.read_model("sample.xml");
|
||||
auto compiled_model = core.compile_model(model, "MULTI",
|
||||
ov::device::priorities("GPU", "CPU"),
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::inference_precision(ov::element::f32));
|
||||
ov::hint::inference_precision(ov::element::f32));
|
||||
//! [core_compile_model]
|
||||
|
||||
//! [compiled_model_set_property]
|
||||
|
@ -327,7 +327,7 @@ DEFINE_string(nstreams, "", infer_num_streams_message);
|
||||
/// @brief Define flag for inference only mode <br>
|
||||
DEFINE_bool(inference_only, true, inference_only_message);
|
||||
|
||||
/// @brief Define flag for inference precision
|
||||
/// @brief Define flag for inference precision hint
|
||||
DEFINE_string(infer_precision, "", inference_precision_message);
|
||||
|
||||
/// @brief Specify precision for all input layers of the network
|
||||
|
@ -481,17 +481,17 @@ int main(int argc, char* argv[]) {
|
||||
auto it_device_infer_precision = device_infer_precision.find(device);
|
||||
if (it_device_infer_precision != device_infer_precision.end()) {
|
||||
// set to user defined value
|
||||
if (supported(ov::inference_precision.name())) {
|
||||
device_config.emplace(ov::inference_precision(it_device_infer_precision->second));
|
||||
if (supported(ov::hint::inference_precision.name())) {
|
||||
device_config.emplace(ov::hint::inference_precision(it_device_infer_precision->second));
|
||||
} else if (is_virtual_device(device)) {
|
||||
update_device_config_for_virtual_device(it_device_infer_precision->second,
|
||||
device_config,
|
||||
ov::inference_precision,
|
||||
ov::hint::inference_precision,
|
||||
is_dev_set_property,
|
||||
is_load_config);
|
||||
} else {
|
||||
throw std::logic_error("Device " + device + " doesn't support config key '" +
|
||||
ov::inference_precision.name() + "'! " +
|
||||
ov::hint::inference_precision.name() + "'! " +
|
||||
"Please specify -infer_precision for correct devices in format "
|
||||
"<dev1>:<infer_precision1>,<dev2>:<infer_precision2>" +
|
||||
" or via configuration file.");
|
||||
|
@ -200,7 +200,7 @@ void update_device_config_for_virtual_device(const std::string& value,
|
||||
const auto& device_value = it.second;
|
||||
if (device_config.find(ov::device::properties.name()) == device_config.end() ||
|
||||
(is_load_config && is_dev_set_property[device_name])) {
|
||||
// Create ov::device::properties with ov::num_stream/ov::inference_precision and
|
||||
// Create ov::device::properties with ov::num_stream/ov::hint::inference_precision and
|
||||
// 1. Insert this ov::device::properties into device config if this
|
||||
// ov::device::properties isn't existed. Otherwise,
|
||||
// 2. Replace the existed ov::device::properties within device config.
|
||||
|
@ -220,7 +220,7 @@ int main(int argc, char* argv[]) {
|
||||
gnaPluginConfig[ov::intel_gna::scale_factors_per_input.name()] = scale_factors_per_input;
|
||||
}
|
||||
}
|
||||
gnaPluginConfig[ov::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
|
||||
gnaPluginConfig[ov::hint::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
|
||||
const std::unordered_map<std::string, ov::intel_gna::HWGeneration> StringHWGenerationMap{
|
||||
{"GNA_TARGET_1_0", ov::intel_gna::HWGeneration::GNA_1_0},
|
||||
{"GNA_TARGET_2_0", ov::intel_gna::HWGeneration::GNA_2_0},
|
||||
|
@ -39,7 +39,6 @@ void regmodule_properties(py::module m) {
|
||||
wrap_property_RO(m_properties, ov::optimal_batch_size, "optimal_batch_size");
|
||||
wrap_property_RO(m_properties, ov::max_batch_size, "max_batch_size");
|
||||
wrap_property_RO(m_properties, ov::range_for_async_infer_requests, "range_for_async_infer_requests");
|
||||
wrap_property_RW(m_properties, ov::inference_precision, "inference_precision");
|
||||
|
||||
// Submodule hint
|
||||
py::module m_hint =
|
||||
|
@ -215,7 +215,6 @@ def test_properties_ro(ov_property_ro, expected_value):
|
||||
((properties.Affinity.NONE, properties.Affinity.NONE),),
|
||||
),
|
||||
(properties.force_tbb_terminate, "FORCE_TBB_TERMINATE", ((True, True),)),
|
||||
(properties.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
|
||||
(properties.hint.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
|
||||
(
|
||||
properties.hint.model_priority,
|
||||
@ -342,12 +341,12 @@ def test_properties_device_properties():
|
||||
{"CPU": {"NUM_STREAMS": 2}})
|
||||
check({"CPU": make_dict(properties.streams.num(2))},
|
||||
{"CPU": {"NUM_STREAMS": properties.streams.Num(2)}})
|
||||
check({"GPU": make_dict(properties.inference_precision(Type.f32))},
|
||||
check({"GPU": make_dict(properties.hint.inference_precision(Type.f32))},
|
||||
{"GPU": {"INFERENCE_PRECISION_HINT": Type.f32}})
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32))},
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.hint.inference_precision(Type.f32))},
|
||||
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)}})
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32)),
|
||||
"GPU": make_dict(properties.streams.num(1), properties.inference_precision(Type.f16))},
|
||||
check({"CPU": make_dict(properties.streams.num(2), properties.hint.inference_precision(Type.f32)),
|
||||
"GPU": make_dict(properties.streams.num(1), properties.hint.inference_precision(Type.f16))},
|
||||
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)},
|
||||
"GPU": {"INFERENCE_PRECISION_HINT": Type.f16, "NUM_STREAMS": properties.streams.Num(1)}})
|
||||
|
||||
@ -420,7 +419,7 @@ def test_single_property_setting(device):
|
||||
properties.cache_dir("./"),
|
||||
properties.inference_num_threads(9),
|
||||
properties.affinity(properties.Affinity.NONE),
|
||||
properties.inference_precision(Type.f32),
|
||||
properties.hint.inference_precision(Type.f32),
|
||||
properties.hint.performance_mode(properties.hint.PerformanceMode.LATENCY),
|
||||
properties.hint.scheduling_core_type(properties.hint.SchedulingCoreType.PCORE_ONLY),
|
||||
properties.hint.use_hyper_threading(True),
|
||||
@ -434,7 +433,7 @@ def test_single_property_setting(device):
|
||||
properties.cache_dir(): "./",
|
||||
properties.inference_num_threads(): 9,
|
||||
properties.affinity(): properties.Affinity.NONE,
|
||||
properties.inference_precision(): Type.f32,
|
||||
properties.hint.inference_precision(): Type.f32,
|
||||
properties.hint.performance_mode(): properties.hint.PerformanceMode.LATENCY,
|
||||
properties.hint.scheduling_core_type(): properties.hint.SchedulingCoreType.PCORE_ONLY,
|
||||
properties.hint.use_hyper_threading(): True,
|
||||
|
@ -233,22 +233,16 @@ static constexpr Property<std::string, PropertyMutability::RO> model_name{"NETWO
|
||||
static constexpr Property<uint32_t, PropertyMutability::RO> optimal_number_of_infer_requests{
|
||||
"OPTIMAL_NUMBER_OF_INFER_REQUESTS"};
|
||||
|
||||
/**
|
||||
* @brief Hint for device to use specified precision for inference
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
|
||||
|
||||
/**
|
||||
* @brief Namespace with hint properties
|
||||
*/
|
||||
namespace hint {
|
||||
|
||||
/**
|
||||
* @brief An alias for inference_precision property for backward compatibility
|
||||
* @brief Hint for device to use specified precision for inference
|
||||
* @ingroup ov_runtime_cpp_prop_api
|
||||
*/
|
||||
using ov::inference_precision;
|
||||
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
|
||||
|
||||
/**
|
||||
* @brief Enum to define possible priorities hints
|
||||
@ -271,7 +265,7 @@ inline std::ostream& operator<<(std::ostream& os, const Priority& priority) {
|
||||
case Priority::HIGH:
|
||||
return os << "HIGH";
|
||||
default:
|
||||
OPENVINO_THROW("Unsupported performance measure hint");
|
||||
OPENVINO_THROW("Unsupported model priority value");
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -176,7 +176,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
|
||||
if (!device_id.empty()) {
|
||||
IE_THROW() << "CPU plugin supports only '' as device id";
|
||||
}
|
||||
} else if (key == ov::inference_precision.name()) {
|
||||
} else if (key == ov::hint::inference_precision.name()) {
|
||||
if (val == "bf16") {
|
||||
if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) {
|
||||
enforceBF16 = true;
|
||||
@ -186,7 +186,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
|
||||
} else if (val == "f32") {
|
||||
enforceBF16 = false;
|
||||
} else {
|
||||
IE_THROW() << "Wrong value for property key " << ov::inference_precision.name()
|
||||
IE_THROW() << "Wrong value for property key " << ov::hint::inference_precision.name()
|
||||
<< ". Supported values: bf16, f32";
|
||||
}
|
||||
} else if (PluginConfigInternalParams::KEY_CPU_RUNTIME_CACHE_CAPACITY == key) {
|
||||
|
@ -309,7 +309,7 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
|
||||
RO_property(ov::affinity.name()),
|
||||
RO_property(ov::inference_num_threads.name()),
|
||||
RO_property(ov::enable_profiling.name()),
|
||||
RO_property(ov::inference_precision.name()),
|
||||
RO_property(ov::hint::inference_precision.name()),
|
||||
RO_property(ov::hint::performance_mode.name()),
|
||||
RO_property(ov::hint::num_requests.name()),
|
||||
RO_property(ov::hint::scheduling_core_type.name()),
|
||||
@ -347,10 +347,10 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
|
||||
} else if (name == ov::enable_profiling.name()) {
|
||||
const bool perfCount = config.collectPerfCounters;
|
||||
return decltype(ov::enable_profiling)::value_type(perfCount);
|
||||
} else if (name == ov::inference_precision) {
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
const auto enforceBF16 = config.enforceBF16;
|
||||
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
|
||||
return decltype(ov::inference_precision)::value_type(inference_precision);
|
||||
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
const auto perfHint = ov::util::from_string(config.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
|
||||
return perfHint;
|
||||
|
@ -577,10 +577,10 @@ Parameter Engine::GetConfig(const std::string& name, const std::map<std::string,
|
||||
} else if (name == ov::enable_profiling.name()) {
|
||||
const bool perfCount = engConfig.collectPerfCounters;
|
||||
return decltype(ov::enable_profiling)::value_type(perfCount);
|
||||
} else if (name == ov::inference_precision) {
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
const auto enforceBF16 = engConfig.enforceBF16;
|
||||
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
|
||||
return decltype(ov::inference_precision)::value_type(inference_precision);
|
||||
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
const auto perfHint = ov::util::from_string(engConfig.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
|
||||
return perfHint;
|
||||
@ -675,7 +675,7 @@ Parameter Engine::GetMetric(const std::string& name, const std::map<std::string,
|
||||
RW_property(ov::affinity.name()),
|
||||
RW_property(ov::inference_num_threads.name()),
|
||||
RW_property(ov::enable_profiling.name()),
|
||||
RW_property(ov::inference_precision.name()),
|
||||
RW_property(ov::hint::inference_precision.name()),
|
||||
RW_property(ov::hint::performance_mode.name()),
|
||||
RW_property(ov::hint::num_requests.name()),
|
||||
RW_property(ov::hint::scheduling_core_type.name()),
|
||||
|
@ -279,13 +279,13 @@ TEST(OVClassBasicTest, smoke_SetConfigHintInferencePrecision) {
|
||||
auto value = ov::element::f32;
|
||||
const auto precision = InferenceEngine::with_cpu_x86_bfloat16() ? ov::element::bf16 : ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
|
||||
ASSERT_EQ(precision, value);
|
||||
|
||||
const auto forcedPrecision = ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::inference_precision(forcedPrecision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forcedPrecision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
|
||||
ASSERT_EQ(value, forcedPrecision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
|
@ -172,7 +172,7 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& config) {
|
||||
}
|
||||
} else if (key == ov::hint::performance_mode) {
|
||||
performance_mode = ov::util::from_string(value, ov::hint::performance_mode);
|
||||
} else if (key == ov::inference_precision) {
|
||||
} else if (key == ov::hint::inference_precision) {
|
||||
inference_precision = ov::util::from_string<ov::element::Type>(value);
|
||||
if ((inference_precision != ov::element::i8) && (inference_precision != ov::element::i16)) {
|
||||
THROW_GNA_EXCEPTION << "Unsupported precision of GNA hardware, should be I16 or I8, but was: " << value;
|
||||
@ -187,7 +187,7 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& config) {
|
||||
<< value;
|
||||
}
|
||||
// Update gnaPrecision basing on execution_mode only if inference_precision is not set
|
||||
if (config.count(ov::inference_precision.name()) == 0) {
|
||||
if (config.count(ov::hint::inference_precision.name()) == 0) {
|
||||
gnaPrecision = execution_mode == ov::hint::ExecutionMode::PERFORMANCE ? InferenceEngine::Precision::I8
|
||||
: InferenceEngine::Precision::I16;
|
||||
}
|
||||
@ -320,7 +320,7 @@ void Config::AdjustKeyMapValues() {
|
||||
gnaFlags.exclusive_async_requests ? PluginConfigParams::YES : PluginConfigParams::NO;
|
||||
keyConfigMap[ov::hint::performance_mode.name()] = ov::util::to_string(performance_mode);
|
||||
if (inference_precision != ov::element::undefined) {
|
||||
keyConfigMap[ov::inference_precision.name()] = ov::util::to_string(inference_precision);
|
||||
keyConfigMap[ov::hint::inference_precision.name()] = ov::util::to_string(inference_precision);
|
||||
} else {
|
||||
keyConfigMap[GNA_CONFIG_KEY(PRECISION)] = gnaPrecision.name();
|
||||
}
|
||||
@ -355,7 +355,7 @@ Parameter Config::GetParameter(const std::string& name) const {
|
||||
return DeviceToHwGeneration(target->get_user_set_compile_target());
|
||||
} else if (name == ov::hint::performance_mode) {
|
||||
return performance_mode;
|
||||
} else if (name == ov::inference_precision) {
|
||||
} else if (name == ov::hint::inference_precision) {
|
||||
return inference_precision;
|
||||
} else {
|
||||
auto result = keyConfigMap.find(name);
|
||||
@ -375,7 +375,7 @@ const Parameter Config::GetImpactingModelCompilationProperties(bool compiled) {
|
||||
{ov::intel_gna::compile_target.name(), model_mutability},
|
||||
{ov::intel_gna::pwl_design_algorithm.name(), model_mutability},
|
||||
{ov::intel_gna::pwl_max_error_percent.name(), model_mutability},
|
||||
{ov::inference_precision.name(), model_mutability},
|
||||
{ov::hint::inference_precision.name(), model_mutability},
|
||||
{ov::hint::execution_mode.name(), model_mutability},
|
||||
{ov::hint::num_requests.name(), model_mutability},
|
||||
};
|
||||
|
@ -193,7 +193,7 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
::testing::Combine(
|
||||
::testing::Values("GNA"),
|
||||
::testing::Values(ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8),
|
||||
ov::hint::inference_precision(ngraph::element::i8),
|
||||
ov::hint::num_requests(2),
|
||||
ov::intel_gna::pwl_design_algorithm(ov::intel_gna::PWLDesignAlgorithm::UNIFORM_DISTRIBUTION),
|
||||
ov::intel_gna::pwl_max_error_percent(0.2),
|
||||
@ -221,8 +221,8 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_FP32),
|
||||
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::AUTO),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"input", 1.0f}}),
|
||||
ov::inference_precision(ov::element::i8),
|
||||
ov::inference_precision(ov::element::i16),
|
||||
ov::hint::inference_precision(ov::element::i8),
|
||||
ov::hint::inference_precision(ov::element::i16),
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY),
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::UNDEFINED),
|
||||
|
@ -116,11 +116,11 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
|
||||
ov::Core core;
|
||||
ov::element::Type precision;
|
||||
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::undefined, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i8)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i8)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::i8, precision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
@ -128,23 +128,23 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
OPENVINO_SUPPRESS_DEPRECATED_END
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i16)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i16)));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::i16, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I8"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I8"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::i8, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I16"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I16"}}));
|
||||
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
|
||||
ASSERT_EQ(ov::element::i16, precision);
|
||||
|
||||
OV_ASSERT_NO_THROW(
|
||||
core.set_property("GNA", {ov::inference_precision(ov::element::i8), {GNA_CONFIG_KEY(PRECISION), "I16"}}));
|
||||
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i32)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::undefined)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "ABC"}}), ov::Exception);
|
||||
core.set_property("GNA", {ov::hint::inference_precision(ov::element::i8), {GNA_CONFIG_KEY(PRECISION), "I16"}}));
|
||||
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i32)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::undefined)), ov::Exception);
|
||||
ASSERT_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "ABC"}}), ov::Exception);
|
||||
}
|
||||
|
||||
TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPerformanceHint) {
|
||||
|
@ -169,7 +169,7 @@ protected:
|
||||
|
||||
TEST_F(GNAExportImportTest, ExportImportI16) {
|
||||
const ov::AnyMap gna_config = {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::inference_precision(ngraph::element::i16)};
|
||||
ov::hint::inference_precision(ngraph::element::i16)};
|
||||
exported_file_name = "export_test.bin";
|
||||
ExportModel(exported_file_name, gna_config);
|
||||
ImportModel(exported_file_name, gna_config);
|
||||
@ -177,7 +177,7 @@ TEST_F(GNAExportImportTest, ExportImportI16) {
|
||||
|
||||
TEST_F(GNAExportImportTest, ExportImportI8) {
|
||||
const ov::AnyMap gna_config = {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::inference_precision(ngraph::element::i8)};
|
||||
ov::hint::inference_precision(ngraph::element::i8)};
|
||||
exported_file_name = "export_test.bin";
|
||||
ExportModel(exported_file_name, gna_config);
|
||||
ImportModel(exported_file_name, gna_config);
|
||||
|
@ -85,13 +85,13 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestDefault) {
|
||||
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
|
||||
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::inference_precision(ngraph::element::i16)});
|
||||
ov::hint::inference_precision(ngraph::element::i16)});
|
||||
compare(ngraph::element::i16, ngraph::element::i32, sizeof(int16_t), sizeof(uint32_t));
|
||||
}
|
||||
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
|
||||
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::inference_precision(ngraph::element::i8)});
|
||||
ov::hint::inference_precision(ngraph::element::i8)});
|
||||
compare(ngraph::element::i16,
|
||||
ngraph::element::i32,
|
||||
sizeof(int8_t),
|
||||
@ -100,7 +100,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
|
||||
|
||||
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8LP) {
|
||||
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
true);
|
||||
compare(ngraph::element::i8, ngraph::element::i32, sizeof(int8_t), sizeof(int8_t));
|
||||
}
|
||||
|
@ -122,13 +122,13 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.125f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -148,13 +148,13 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
@ -200,13 +200,13 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -227,13 +227,13 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map,
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 10.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 20.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
@ -254,10 +254,10 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
|
||||
ov::inference_precision(ngraph::element::i16)},
|
||||
ov::hint::inference_precision(ngraph::element::i16)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(false), // use low precision
|
||||
@ -278,10 +278,10 @@ INSTANTIATE_TEST_SUITE_P(
|
||||
// gna config map
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
|
||||
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
|
||||
ov::inference_precision(ngraph::element::i8)},
|
||||
ov::hint::inference_precision(ngraph::element::i8)},
|
||||
}),
|
||||
::testing::Values(true), // gna device
|
||||
::testing::Values(true), // use low precision
|
||||
|
@ -247,9 +247,9 @@ TEST_F(GNAPluginConfigTest, GnaConfigExecutionModeUpdatesGnaPrecision) {
|
||||
}
|
||||
|
||||
TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionUpdatesGnaPrecision) {
|
||||
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
|
||||
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
|
||||
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I8);
|
||||
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
|
||||
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
|
||||
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I16);
|
||||
}
|
||||
|
||||
@ -257,7 +257,7 @@ TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionHasHigherPriorityI16) {
|
||||
SetAndCompare(GNA_CONFIG_KEY(PRECISION), Precision(Precision::I8).name());
|
||||
SetAndCompare(ov::hint::execution_mode.name(),
|
||||
ov::util::to_string<ov::hint::ExecutionMode>(ov::hint::ExecutionMode::PERFORMANCE));
|
||||
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
|
||||
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
|
||||
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I16);
|
||||
}
|
||||
|
||||
@ -265,6 +265,6 @@ TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionHasHigherPriorityI8) {
|
||||
SetAndCompare(GNA_CONFIG_KEY(PRECISION), Precision(Precision::I16).name());
|
||||
SetAndCompare(ov::hint::execution_mode.name(),
|
||||
ov::util::to_string<ov::hint::ExecutionMode>(ov::hint::ExecutionMode::ACCURACY));
|
||||
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
|
||||
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
|
||||
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I8);
|
||||
}
|
||||
|
@ -325,7 +325,7 @@ InferenceEngine::Parameter CompiledModel::GetMetric(const std::string &name) con
|
||||
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::device::id.name(), PropertyMutability::RO},
|
||||
ov::PropertyName{ov::execution_devices.name(), PropertyMutability::RO}
|
||||
};
|
||||
|
@ -14,7 +14,7 @@ bool LegacyAPIHelper::is_new_api_property(const std::pair<std::string, ov::Any>&
|
||||
static const std::vector<std::string> new_properties_list = {
|
||||
ov::intel_gpu::hint::queue_priority.name(),
|
||||
ov::intel_gpu::hint::queue_throttle.name(),
|
||||
ov::inference_precision.name(),
|
||||
ov::hint::inference_precision.name(),
|
||||
ov::compilation_num_threads.name(),
|
||||
ov::num_streams.name(),
|
||||
};
|
||||
|
@ -671,7 +671,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
|
||||
cachingProperties.push_back(ov::PropertyName(ov::device::architecture.name(), PropertyMutability::RO));
|
||||
cachingProperties.push_back(ov::PropertyName(ov::intel_gpu::execution_units_count.name(), PropertyMutability::RO));
|
||||
cachingProperties.push_back(ov::PropertyName(ov::intel_gpu::driver_version.name(), PropertyMutability::RO));
|
||||
cachingProperties.push_back(ov::PropertyName(ov::inference_precision.name(), PropertyMutability::RW));
|
||||
cachingProperties.push_back(ov::PropertyName(ov::hint::inference_precision.name(), PropertyMutability::RW));
|
||||
cachingProperties.push_back(ov::PropertyName(ov::hint::execution_mode.name(), PropertyMutability::RW));
|
||||
return decltype(ov::caching_properties)::value_type(cachingProperties);
|
||||
} else if (name == ov::intel_gpu::driver_version) {
|
||||
@ -730,7 +730,7 @@ std::vector<ov::PropertyName> Plugin::get_supported_properties() const {
|
||||
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW},
|
||||
ov::PropertyName{ov::device::id.name(), PropertyMutability::RW},
|
||||
};
|
||||
|
||||
|
@ -40,7 +40,7 @@ void ExecutionConfig::set_default() {
|
||||
std::make_tuple(ov::cache_dir, ""),
|
||||
std::make_tuple(ov::num_streams, 1),
|
||||
std::make_tuple(ov::compilation_num_threads, std::max(1, static_cast<int>(std::thread::hardware_concurrency()))),
|
||||
std::make_tuple(ov::inference_precision, ov::element::f16, InferencePrecisionValidator()),
|
||||
std::make_tuple(ov::hint::inference_precision, ov::element::f16, InferencePrecisionValidator()),
|
||||
std::make_tuple(ov::hint::model_priority, ov::hint::Priority::MEDIUM),
|
||||
std::make_tuple(ov::hint::performance_mode, ov::hint::PerformanceMode::LATENCY, PerformanceModeValidator()),
|
||||
std::make_tuple(ov::hint::execution_mode, ov::hint::ExecutionMode::PERFORMANCE),
|
||||
@ -123,14 +123,14 @@ Any ExecutionConfig::get_property(const std::string& name) const {
|
||||
void ExecutionConfig::apply_execution_hints(const cldnn::device_info& info) {
|
||||
if (is_set_by_user(ov::hint::execution_mode)) {
|
||||
const auto mode = get_property(ov::hint::execution_mode);
|
||||
if (!is_set_by_user(ov::inference_precision)) {
|
||||
if (!is_set_by_user(ov::hint::inference_precision)) {
|
||||
if (mode == ov::hint::ExecutionMode::ACCURACY) {
|
||||
set_property(ov::inference_precision(ov::element::f32));
|
||||
set_property(ov::hint::inference_precision(ov::element::f32));
|
||||
} else if (mode == ov::hint::ExecutionMode::PERFORMANCE) {
|
||||
if (info.supports_fp16)
|
||||
set_property(ov::inference_precision(ov::element::f16));
|
||||
set_property(ov::hint::inference_precision(ov::element::f16));
|
||||
else
|
||||
set_property(ov::inference_precision(ov::element::f32));
|
||||
set_property(ov::hint::inference_precision(ov::element::f32));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -39,7 +39,7 @@ TEST_P(InferencePrecisionTests, smoke_canSetInferencePrecisionAndInfer) {
|
||||
std::tie(model_precision, inference_precision) = GetParam();
|
||||
auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(CommonTestUtils::DEVICE_GPU, {1, 1, 32, 32}, model_precision);
|
||||
ov::CompiledModel compiled_model;
|
||||
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::inference_precision(inference_precision)));
|
||||
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(inference_precision)));
|
||||
auto req = compiled_model.create_infer_request();
|
||||
OV_ASSERT_NO_THROW(req.infer());
|
||||
}
|
||||
@ -67,7 +67,7 @@ TEST(ExecutionModeTest, SetCompileGetInferPrecisionAndExecMode) {
|
||||
core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
|
||||
auto model = ngraph::builder::subgraph::makeConvPoolRelu();
|
||||
{
|
||||
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
|
||||
ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
|
||||
}
|
||||
|
@ -55,7 +55,7 @@ TEST_P(OVConcurrencyTest, canInferTwoExecNets) {
|
||||
auto fn = fn_ptrs[i];
|
||||
|
||||
auto exec_net = ie.compile_model(fn_ptrs[i], CommonTestUtils::DEVICE_GPU,
|
||||
ov::num_streams(num_streams), ov::inference_precision(ov::element::f32));
|
||||
ov::num_streams(num_streams), ov::hint::inference_precision(ov::element::f32));
|
||||
|
||||
auto input = fn_ptrs[i]->get_parameters().at(0);
|
||||
auto output = fn_ptrs[i]->get_results().at(0);
|
||||
@ -115,7 +115,7 @@ TEST(canSwapTensorsBetweenInferRequests, inputs) {
|
||||
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
|
||||
const int infer_requests_num = 2;
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
@ -193,7 +193,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, usmHostIsNotChanged) {
|
||||
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
|
||||
@ -232,7 +232,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, canSetSystemHostTensor) {
|
||||
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
|
||||
@ -258,7 +258,7 @@ TEST(canSwapTensorsBetweenInferRequests, outputs) {
|
||||
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
|
||||
|
||||
auto ie = ov::Core();
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
|
||||
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
|
||||
|
||||
const int infer_requests_num = 2;
|
||||
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
|
||||
|
@ -40,7 +40,7 @@ public:
|
||||
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "0"},
|
||||
};
|
||||
}
|
||||
config.insert({ov::inference_precision.name(), "f32"});
|
||||
config.insert({ov::hint::inference_precision.name(), "f32"});
|
||||
fn_ptr = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(with_auto_batching ? CommonTestUtils::DEVICE_BATCH : deviceName);
|
||||
}
|
||||
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
|
||||
@ -230,7 +230,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -277,7 +277,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -305,7 +305,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -375,7 +375,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -404,7 +404,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -469,7 +469,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
|
||||
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
|
||||
|
||||
auto ie = PluginCache::get().ie();
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
// regular inference
|
||||
auto inf_req_regular = exec_net_regular.CreateInferRequest();
|
||||
@ -498,7 +498,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
|
||||
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
|
||||
// without calling thread blocks
|
||||
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
auto inf_req_shared = exec_net_shared.CreateInferRequest();
|
||||
|
||||
// Allocate shared buffers for input and output data which will be set to infer request
|
||||
@ -601,7 +601,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
|
||||
|
||||
/* XXX: is it correct to set KEY_CLDNN_NV12_TWO_INPUTS in case of remote blob? */
|
||||
auto exec_net_b = ie.LoadNetwork(net_remote, CommonTestUtils::DEVICE_GPU,
|
||||
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::inference_precision.name(), "f32"} });
|
||||
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::hint::inference_precision.name(), "f32"} });
|
||||
auto inf_req_remote = exec_net_b.CreateInferRequest();
|
||||
auto cldnn_context = exec_net_b.GetContext();
|
||||
cl_context ctx = std::dynamic_pointer_cast<ClContext>(cldnn_context)->get();
|
||||
@ -670,7 +670,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
|
||||
net_local.getInputsInfo().begin()->second->setPrecision(Precision::U8);
|
||||
net_local.getInputsInfo().begin()->second->getPreProcess().setColorFormat(ColorFormat::NV12);
|
||||
|
||||
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::inference_precision.name(), "f32"}});
|
||||
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
auto inf_req_local = exec_net_b1.CreateInferRequest();
|
||||
|
||||
@ -742,7 +742,7 @@ TEST_P(TwoNets_Test, canInferTwoExecNets) {
|
||||
|
||||
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU,
|
||||
{{PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS, std::to_string(num_streams)},
|
||||
{ov::inference_precision.name(), "f32"}});
|
||||
{ov::hint::inference_precision.name(), "f32"}});
|
||||
|
||||
for (int j = 0; j < num_streams * num_requests; j++) {
|
||||
outputs.push_back(net.getOutputsInfo().begin()->first);
|
||||
|
@ -350,13 +350,13 @@ TEST_P(OVClassGetPropertyTest_GPU, GetAndSetInferencePrecisionNoThrow) {
|
||||
auto value = ov::element::undefined;
|
||||
const auto expected_default_precision = ov::element::f16;
|
||||
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
|
||||
ASSERT_EQ(expected_default_precision, value);
|
||||
|
||||
const auto forced_precision = ov::element::f32;
|
||||
|
||||
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::inference_precision(forced_precision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
|
||||
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision)));
|
||||
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
|
||||
ASSERT_EQ(value, forced_precision);
|
||||
|
||||
OPENVINO_SUPPRESS_DEPRECATED_START
|
||||
@ -728,7 +728,7 @@ auto gpuCorrectConfigsWithSecondaryProperties = []() {
|
||||
return std::vector<ov::AnyMap>{
|
||||
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
|
||||
ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE),
|
||||
ov::inference_precision(ov::element::f32))},
|
||||
ov::hint::inference_precision(ov::element::f32))},
|
||||
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
|
||||
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
|
||||
ov::hint::allow_auto_batching(false))},
|
||||
@ -821,7 +821,7 @@ TEST_P(OVClassGetMetricTest_CACHING_PROPERTIES, GetMetricAndPrintNoThrow) {
|
||||
ov::device::architecture.name(),
|
||||
ov::intel_gpu::execution_units_count.name(),
|
||||
ov::intel_gpu::driver_version.name(),
|
||||
ov::inference_precision.name(),
|
||||
ov::hint::inference_precision.name(),
|
||||
ov::hint::execution_mode.name(),
|
||||
};
|
||||
|
||||
|
@ -36,7 +36,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNonContiguousAxes
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
|
||||
@ -56,7 +56,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNormalizeOverAllA
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
|
||||
@ -76,7 +76,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForNotSorted) {
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
|
||||
@ -96,7 +96,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForSingleAxis) {
|
||||
auto core = ov::Core();
|
||||
ov::AnyMap config;
|
||||
if (device_name == CommonTestUtils::DEVICE_GPU)
|
||||
config.insert(ov::inference_precision(ov::element::f32));
|
||||
config.insert(ov::hint::inference_precision(ov::element::f32));
|
||||
const auto compiled_model = core.compile_model(model, device_name, config);
|
||||
|
||||
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
|
||||
|
@ -226,7 +226,7 @@ void SubgraphBaseTest::compile_model() {
|
||||
break;
|
||||
}
|
||||
}
|
||||
configuration.insert({ov::inference_precision.name(), hint});
|
||||
configuration.insert({ov::hint::inference_precision.name(), hint});
|
||||
}
|
||||
|
||||
compiledModel = core->compile_model(function, targetDevice, configuration);
|
||||
|
@ -54,7 +54,7 @@ void SnippetsTestsCommon::validateOriginalLayersNamesByType(const std::string& l
|
||||
ASSERT_TRUE(false) << "Layer type '" << layerType << "' was not found in compiled model";
|
||||
}
|
||||
void SnippetsTestsCommon::setInferenceType(ov::element::Type type) {
|
||||
configuration.emplace(ov::inference_precision(type));
|
||||
configuration.emplace(ov::hint::inference_precision(type));
|
||||
}
|
||||
|
||||
} // namespace test
|
||||
|
Loading…
Reference in New Issue
Block a user