Revert inference precision to be a hint (#16634)

This commit is contained in:
Ilya Lavrenov 2023-03-29 18:59:33 +04:00 committed by GitHub
parent 7d8f4af78a
commit 0250f62d11
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
38 changed files with 126 additions and 134 deletions

View File

@ -105,14 +105,14 @@ to query ``ov::device::capabilities`` property, which should contain ``BF16`` in
:fragment: [part0]
If the model has been converted to ``bf16``, the ``ov::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
If the model has been converted to ``bf16``, the ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
the ``ov::CompiledModel::get_property`` call. The code below demonstrates how to get the element type:
.. doxygensnippet:: snippets/cpu/Bfloat16Inference1.cpp
:language: py
:fragment: [part1]
To infer the model in ``f32`` precision instead of ``bf16`` on targets with native ``bf16`` support, set the ``ov::inference_precision`` to ``ov::element::f32``.
To infer the model in ``f32`` precision instead of ``bf16`` on targets with native ``bf16`` support, set the ``ov::hint::inference_precision`` to ``ov::element::f32``.
.. tab-set::
@ -134,11 +134,11 @@ To infer the model in ``f32`` precision instead of ``bf16`` on targets with nati
The ``Bfloat16`` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the
native ``avx512_bf16`` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, the ``ov::inference_precision`` has to be explicitly set to ``ov::element::bf16``.
To enable the simulation, the ``ov::hint::inference_precision`` has to be explicitly set to ``ov::element::bf16``.
.. note::
If ``ov::inference_precision`` is set to ``ov::element::bf16`` on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
If ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
.. note::
@ -292,7 +292,7 @@ Read-write Properties
All parameters must be set before calling ``ov::Core::compile_model()`` in order to take effect or passed as additional argument to ``ov::Core::compile_model()``
- ``ov::enable_profiling``
- ``ov::inference_precision``
- ``ov::hint::inference_precision``
- ``ov::hint::performance_mode``
- ``ov::hint::num_request``
- ``ov::num_streams``

View File

@ -140,7 +140,7 @@ quantization hints based on statistics for the provided dataset.
* Accuracy (i16 weights)
* Performance (i8 weights)
For POT quantized models, the ``ov::inference_precision`` property has no effect except in cases described in the
For POT quantized models, the ``ov::hint::inference_precision`` property has no effect except in cases described in the
:ref:`Model and Operation Limitations section <#model-and-operation-limitations>`.
@ -268,7 +268,7 @@ In order to take effect, the following parameters must be set before model compi
- ov::cache_dir
- ov::enable_profiling
- ov::inference_precision
- ov::hint::inference_precision
- ov::hint::num_requests
- ov::intel_gna::compile_target
- ov::intel_gna::firmware_model_image_path
@ -354,7 +354,7 @@ Support for 2D Convolutions using POT
For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
* All convolution parameters are natively supported by HW (see tables above).
* The runtime precision is explicitly set by the ``ov::inference_precision`` property as ``i8`` for the models produced by
* The runtime precision is explicitly set by the ``ov::hint::inference_precision`` property as ``i8`` for the models produced by
the ``performance mode`` of POT, and as ``i16`` for the models produced by the ``accuracy mode`` of POT.

View File

@ -327,7 +327,7 @@ All parameters must be set before calling ``ov::Core::compile_model()`` in order
- ov::hint::performance_mode
- ov::hint::execution_mode
- ov::hint::num_requests
- ov::inference_precision
- ov::hint::inference_precision
- ov::num_streams
- ov::compilation_num_threads
- ov::device::id

View File

@ -16,7 +16,7 @@
Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
`ov::inference_precision <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gad605a888f3c9b7598ab55023fbf44240>`__ is a "typical runtime configuration" which trades accuracy for performance, allowing ``fp16/bf16`` execution for the layers that remain in ``fp32`` after quantization of the original ``fp32`` model.
`ov::hint::inference_precision <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gad605a888f3c9b7598ab55023fbf44240>`__ is a "typical runtime configuration" which trades accuracy for performance, allowing ``fp16/bf16`` execution for the layers that remain in ``fp32`` after quantization of the original ``fp32`` model.
Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.

View File

@ -6,7 +6,7 @@ using namespace InferenceEngine;
ov::Core core;
auto network = core.read_model("sample.xml");
auto exec_network = core.compile_model(network, "CPU");
auto inference_precision = exec_network.get_property(ov::inference_precision);
auto inference_precision = exec_network.get_property(ov::hint::inference_precision);
//! [part1]
return 0;

View File

@ -4,7 +4,7 @@ int main() {
using namespace InferenceEngine;
//! [part2]
ov::Core core;
core.set_property("CPU", ov::inference_precision(ov::element::f32));
core.set_property("CPU", ov::hint::inference_precision(ov::element::f32));
//! [part2]
return 0;

View File

@ -49,7 +49,7 @@ auto compiled_model = core.compile_model(model, "HETERO",
// profiling is enabled only for GPU
ov::device::properties("GPU", ov::enable_profiling(true)),
// FP32 inference precision only for CPU
ov::device::properties("CPU", ov::inference_precision(ov::element::f32))
ov::device::properties("CPU", ov::hint::inference_precision(ov::element::f32))
);
//! [configure_fallback_devices]
}

View File

@ -19,7 +19,7 @@ auto model = core.read_model("sample.xml");
//! [compile_model_with_property]
auto compiled_model = core.compile_model(model, "CPU",
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
ov::inference_precision(ov::element::f32));
ov::hint::inference_precision(ov::element::f32));
//! [compile_model_with_property]
}

View File

@ -25,7 +25,7 @@ auto model = core.read_model("sample.xml");
auto compiled_model = core.compile_model(model, "MULTI",
ov::device::priorities("GPU", "CPU"),
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
ov::inference_precision(ov::element::f32));
ov::hint::inference_precision(ov::element::f32));
//! [core_compile_model]
//! [compiled_model_set_property]

View File

@ -327,7 +327,7 @@ DEFINE_string(nstreams, "", infer_num_streams_message);
/// @brief Define flag for inference only mode <br>
DEFINE_bool(inference_only, true, inference_only_message);
/// @brief Define flag for inference precision
/// @brief Define flag for inference precision hint
DEFINE_string(infer_precision, "", inference_precision_message);
/// @brief Specify precision for all input layers of the network

View File

@ -481,17 +481,17 @@ int main(int argc, char* argv[]) {
auto it_device_infer_precision = device_infer_precision.find(device);
if (it_device_infer_precision != device_infer_precision.end()) {
// set to user defined value
if (supported(ov::inference_precision.name())) {
device_config.emplace(ov::inference_precision(it_device_infer_precision->second));
if (supported(ov::hint::inference_precision.name())) {
device_config.emplace(ov::hint::inference_precision(it_device_infer_precision->second));
} else if (is_virtual_device(device)) {
update_device_config_for_virtual_device(it_device_infer_precision->second,
device_config,
ov::inference_precision,
ov::hint::inference_precision,
is_dev_set_property,
is_load_config);
} else {
throw std::logic_error("Device " + device + " doesn't support config key '" +
ov::inference_precision.name() + "'! " +
ov::hint::inference_precision.name() + "'! " +
"Please specify -infer_precision for correct devices in format "
"<dev1>:<infer_precision1>,<dev2>:<infer_precision2>" +
" or via configuration file.");

View File

@ -200,7 +200,7 @@ void update_device_config_for_virtual_device(const std::string& value,
const auto& device_value = it.second;
if (device_config.find(ov::device::properties.name()) == device_config.end() ||
(is_load_config && is_dev_set_property[device_name])) {
// Create ov::device::properties with ov::num_stream/ov::inference_precision and
// Create ov::device::properties with ov::num_stream/ov::hint::inference_precision and
// 1. Insert this ov::device::properties into device config if this
// ov::device::properties isn't existed. Otherwise,
// 2. Replace the existed ov::device::properties within device config.

View File

@ -220,7 +220,7 @@ int main(int argc, char* argv[]) {
gnaPluginConfig[ov::intel_gna::scale_factors_per_input.name()] = scale_factors_per_input;
}
}
gnaPluginConfig[ov::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
gnaPluginConfig[ov::hint::inference_precision.name()] = (FLAGS_qb == 8) ? ov::element::i8 : ov::element::i16;
const std::unordered_map<std::string, ov::intel_gna::HWGeneration> StringHWGenerationMap{
{"GNA_TARGET_1_0", ov::intel_gna::HWGeneration::GNA_1_0},
{"GNA_TARGET_2_0", ov::intel_gna::HWGeneration::GNA_2_0},

View File

@ -39,7 +39,6 @@ void regmodule_properties(py::module m) {
wrap_property_RO(m_properties, ov::optimal_batch_size, "optimal_batch_size");
wrap_property_RO(m_properties, ov::max_batch_size, "max_batch_size");
wrap_property_RO(m_properties, ov::range_for_async_infer_requests, "range_for_async_infer_requests");
wrap_property_RW(m_properties, ov::inference_precision, "inference_precision");
// Submodule hint
py::module m_hint =

View File

@ -215,7 +215,6 @@ def test_properties_ro(ov_property_ro, expected_value):
((properties.Affinity.NONE, properties.Affinity.NONE),),
),
(properties.force_tbb_terminate, "FORCE_TBB_TERMINATE", ((True, True),)),
(properties.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
(properties.hint.inference_precision, "INFERENCE_PRECISION_HINT", ((Type.f32, Type.f32),)),
(
properties.hint.model_priority,
@ -342,12 +341,12 @@ def test_properties_device_properties():
{"CPU": {"NUM_STREAMS": 2}})
check({"CPU": make_dict(properties.streams.num(2))},
{"CPU": {"NUM_STREAMS": properties.streams.Num(2)}})
check({"GPU": make_dict(properties.inference_precision(Type.f32))},
check({"GPU": make_dict(properties.hint.inference_precision(Type.f32))},
{"GPU": {"INFERENCE_PRECISION_HINT": Type.f32}})
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32))},
check({"CPU": make_dict(properties.streams.num(2), properties.hint.inference_precision(Type.f32))},
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)}})
check({"CPU": make_dict(properties.streams.num(2), properties.inference_precision(Type.f32)),
"GPU": make_dict(properties.streams.num(1), properties.inference_precision(Type.f16))},
check({"CPU": make_dict(properties.streams.num(2), properties.hint.inference_precision(Type.f32)),
"GPU": make_dict(properties.streams.num(1), properties.hint.inference_precision(Type.f16))},
{"CPU": {"INFERENCE_PRECISION_HINT": Type.f32, "NUM_STREAMS": properties.streams.Num(2)},
"GPU": {"INFERENCE_PRECISION_HINT": Type.f16, "NUM_STREAMS": properties.streams.Num(1)}})
@ -420,7 +419,7 @@ def test_single_property_setting(device):
properties.cache_dir("./"),
properties.inference_num_threads(9),
properties.affinity(properties.Affinity.NONE),
properties.inference_precision(Type.f32),
properties.hint.inference_precision(Type.f32),
properties.hint.performance_mode(properties.hint.PerformanceMode.LATENCY),
properties.hint.scheduling_core_type(properties.hint.SchedulingCoreType.PCORE_ONLY),
properties.hint.use_hyper_threading(True),
@ -434,7 +433,7 @@ def test_single_property_setting(device):
properties.cache_dir(): "./",
properties.inference_num_threads(): 9,
properties.affinity(): properties.Affinity.NONE,
properties.inference_precision(): Type.f32,
properties.hint.inference_precision(): Type.f32,
properties.hint.performance_mode(): properties.hint.PerformanceMode.LATENCY,
properties.hint.scheduling_core_type(): properties.hint.SchedulingCoreType.PCORE_ONLY,
properties.hint.use_hyper_threading(): True,

View File

@ -233,22 +233,16 @@ static constexpr Property<std::string, PropertyMutability::RO> model_name{"NETWO
static constexpr Property<uint32_t, PropertyMutability::RO> optimal_number_of_infer_requests{
"OPTIMAL_NUMBER_OF_INFER_REQUESTS"};
/**
* @brief Hint for device to use specified precision for inference
* @ingroup ov_runtime_cpp_prop_api
*/
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
/**
* @brief Namespace with hint properties
*/
namespace hint {
/**
* @brief An alias for inference_precision property for backward compatibility
* @brief Hint for device to use specified precision for inference
* @ingroup ov_runtime_cpp_prop_api
*/
using ov::inference_precision;
static constexpr Property<element::Type, PropertyMutability::RW> inference_precision{"INFERENCE_PRECISION_HINT"};
/**
* @brief Enum to define possible priorities hints
@ -271,7 +265,7 @@ inline std::ostream& operator<<(std::ostream& os, const Priority& priority) {
case Priority::HIGH:
return os << "HIGH";
default:
OPENVINO_THROW("Unsupported performance measure hint");
OPENVINO_THROW("Unsupported model priority value");
}
}

View File

@ -176,7 +176,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
if (!device_id.empty()) {
IE_THROW() << "CPU plugin supports only '' as device id";
}
} else if (key == ov::inference_precision.name()) {
} else if (key == ov::hint::inference_precision.name()) {
if (val == "bf16") {
if (dnnl::impl::cpu::x64::mayiuse(dnnl::impl::cpu::x64::avx512_core)) {
enforceBF16 = true;
@ -186,7 +186,7 @@ void Config::readProperties(const std::map<std::string, std::string> &prop) {
} else if (val == "f32") {
enforceBF16 = false;
} else {
IE_THROW() << "Wrong value for property key " << ov::inference_precision.name()
IE_THROW() << "Wrong value for property key " << ov::hint::inference_precision.name()
<< ". Supported values: bf16, f32";
}
} else if (PluginConfigInternalParams::KEY_CPU_RUNTIME_CACHE_CAPACITY == key) {

View File

@ -309,7 +309,7 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
RO_property(ov::affinity.name()),
RO_property(ov::inference_num_threads.name()),
RO_property(ov::enable_profiling.name()),
RO_property(ov::inference_precision.name()),
RO_property(ov::hint::inference_precision.name()),
RO_property(ov::hint::performance_mode.name()),
RO_property(ov::hint::num_requests.name()),
RO_property(ov::hint::scheduling_core_type.name()),
@ -347,10 +347,10 @@ InferenceEngine::Parameter ExecNetwork::GetMetric(const std::string &name) const
} else if (name == ov::enable_profiling.name()) {
const bool perfCount = config.collectPerfCounters;
return decltype(ov::enable_profiling)::value_type(perfCount);
} else if (name == ov::inference_precision) {
} else if (name == ov::hint::inference_precision) {
const auto enforceBF16 = config.enforceBF16;
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
return decltype(ov::inference_precision)::value_type(inference_precision);
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
} else if (name == ov::hint::performance_mode) {
const auto perfHint = ov::util::from_string(config.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
return perfHint;

View File

@ -577,10 +577,10 @@ Parameter Engine::GetConfig(const std::string& name, const std::map<std::string,
} else if (name == ov::enable_profiling.name()) {
const bool perfCount = engConfig.collectPerfCounters;
return decltype(ov::enable_profiling)::value_type(perfCount);
} else if (name == ov::inference_precision) {
} else if (name == ov::hint::inference_precision) {
const auto enforceBF16 = engConfig.enforceBF16;
const auto inference_precision = enforceBF16 ? ov::element::bf16 : ov::element::f32;
return decltype(ov::inference_precision)::value_type(inference_precision);
return decltype(ov::hint::inference_precision)::value_type(inference_precision);
} else if (name == ov::hint::performance_mode) {
const auto perfHint = ov::util::from_string(engConfig.perfHintsConfig.ovPerfHint, ov::hint::performance_mode);
return perfHint;
@ -675,7 +675,7 @@ Parameter Engine::GetMetric(const std::string& name, const std::map<std::string,
RW_property(ov::affinity.name()),
RW_property(ov::inference_num_threads.name()),
RW_property(ov::enable_profiling.name()),
RW_property(ov::inference_precision.name()),
RW_property(ov::hint::inference_precision.name()),
RW_property(ov::hint::performance_mode.name()),
RW_property(ov::hint::num_requests.name()),
RW_property(ov::hint::scheduling_core_type.name()),

View File

@ -279,13 +279,13 @@ TEST(OVClassBasicTest, smoke_SetConfigHintInferencePrecision) {
auto value = ov::element::f32;
const auto precision = InferenceEngine::with_cpu_x86_bfloat16() ? ov::element::bf16 : ov::element::f32;
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
ASSERT_EQ(precision, value);
const auto forcedPrecision = ov::element::f32;
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::inference_precision(forcedPrecision)));
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::inference_precision));
OV_ASSERT_NO_THROW(ie.set_property("CPU", ov::hint::inference_precision(forcedPrecision)));
OV_ASSERT_NO_THROW(value = ie.get_property("CPU", ov::hint::inference_precision));
ASSERT_EQ(value, forcedPrecision);
OPENVINO_SUPPRESS_DEPRECATED_START

View File

@ -172,7 +172,7 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& config) {
}
} else if (key == ov::hint::performance_mode) {
performance_mode = ov::util::from_string(value, ov::hint::performance_mode);
} else if (key == ov::inference_precision) {
} else if (key == ov::hint::inference_precision) {
inference_precision = ov::util::from_string<ov::element::Type>(value);
if ((inference_precision != ov::element::i8) && (inference_precision != ov::element::i16)) {
THROW_GNA_EXCEPTION << "Unsupported precision of GNA hardware, should be I16 or I8, but was: " << value;
@ -187,7 +187,7 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& config) {
<< value;
}
// Update gnaPrecision basing on execution_mode only if inference_precision is not set
if (config.count(ov::inference_precision.name()) == 0) {
if (config.count(ov::hint::inference_precision.name()) == 0) {
gnaPrecision = execution_mode == ov::hint::ExecutionMode::PERFORMANCE ? InferenceEngine::Precision::I8
: InferenceEngine::Precision::I16;
}
@ -320,7 +320,7 @@ void Config::AdjustKeyMapValues() {
gnaFlags.exclusive_async_requests ? PluginConfigParams::YES : PluginConfigParams::NO;
keyConfigMap[ov::hint::performance_mode.name()] = ov::util::to_string(performance_mode);
if (inference_precision != ov::element::undefined) {
keyConfigMap[ov::inference_precision.name()] = ov::util::to_string(inference_precision);
keyConfigMap[ov::hint::inference_precision.name()] = ov::util::to_string(inference_precision);
} else {
keyConfigMap[GNA_CONFIG_KEY(PRECISION)] = gnaPrecision.name();
}
@ -355,7 +355,7 @@ Parameter Config::GetParameter(const std::string& name) const {
return DeviceToHwGeneration(target->get_user_set_compile_target());
} else if (name == ov::hint::performance_mode) {
return performance_mode;
} else if (name == ov::inference_precision) {
} else if (name == ov::hint::inference_precision) {
return inference_precision;
} else {
auto result = keyConfigMap.find(name);
@ -375,7 +375,7 @@ const Parameter Config::GetImpactingModelCompilationProperties(bool compiled) {
{ov::intel_gna::compile_target.name(), model_mutability},
{ov::intel_gna::pwl_design_algorithm.name(), model_mutability},
{ov::intel_gna::pwl_max_error_percent.name(), model_mutability},
{ov::inference_precision.name(), model_mutability},
{ov::hint::inference_precision.name(), model_mutability},
{ov::hint::execution_mode.name(), model_mutability},
{ov::hint::num_requests.name(), model_mutability},
};

View File

@ -193,7 +193,7 @@ INSTANTIATE_TEST_SUITE_P(
::testing::Combine(
::testing::Values("GNA"),
::testing::Values(ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i8),
ov::hint::inference_precision(ngraph::element::i8),
ov::hint::num_requests(2),
ov::intel_gna::pwl_design_algorithm(ov::intel_gna::PWLDesignAlgorithm::UNIFORM_DISTRIBUTION),
ov::intel_gna::pwl_max_error_percent(0.2),
@ -221,8 +221,8 @@ INSTANTIATE_TEST_SUITE_P(
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_FP32),
ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::AUTO),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"input", 1.0f}}),
ov::inference_precision(ov::element::i8),
ov::inference_precision(ov::element::i16),
ov::hint::inference_precision(ov::element::i8),
ov::hint::inference_precision(ov::element::i16),
ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY),
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
ov::hint::performance_mode(ov::hint::PerformanceMode::UNDEFINED),

View File

@ -116,11 +116,11 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
ov::Core core;
ov::element::Type precision;
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
ASSERT_EQ(ov::element::undefined, precision);
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i8)));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i8)));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
ASSERT_EQ(ov::element::i8, precision);
OPENVINO_SUPPRESS_DEPRECATED_START
@ -128,23 +128,23 @@ TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPrecisionHint) {
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
OPENVINO_SUPPRESS_DEPRECATED_END
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i16)));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
OV_ASSERT_NO_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i16)));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
ASSERT_EQ(ov::element::i16, precision);
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I8"}}));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I8"}}));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
ASSERT_EQ(ov::element::i8, precision);
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "I16"}}));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::inference_precision));
OV_ASSERT_NO_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "I16"}}));
OV_ASSERT_NO_THROW(precision = core.get_property("GNA", ov::hint::inference_precision));
ASSERT_EQ(ov::element::i16, precision);
OV_ASSERT_NO_THROW(
core.set_property("GNA", {ov::inference_precision(ov::element::i8), {GNA_CONFIG_KEY(PRECISION), "I16"}}));
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::i32)), ov::Exception);
ASSERT_THROW(core.set_property("GNA", ov::inference_precision(ov::element::undefined)), ov::Exception);
ASSERT_THROW(core.set_property("GNA", {{ov::inference_precision.name(), "ABC"}}), ov::Exception);
core.set_property("GNA", {ov::hint::inference_precision(ov::element::i8), {GNA_CONFIG_KEY(PRECISION), "I16"}}));
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::i32)), ov::Exception);
ASSERT_THROW(core.set_property("GNA", ov::hint::inference_precision(ov::element::undefined)), ov::Exception);
ASSERT_THROW(core.set_property("GNA", {{ov::hint::inference_precision.name(), "ABC"}}), ov::Exception);
}
TEST(OVClassBasicTest, smoke_SetConfigAfterCreatedPerformanceHint) {

View File

@ -169,7 +169,7 @@ protected:
TEST_F(GNAExportImportTest, ExportImportI16) {
const ov::AnyMap gna_config = {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::inference_precision(ngraph::element::i16)};
ov::hint::inference_precision(ngraph::element::i16)};
exported_file_name = "export_test.bin";
ExportModel(exported_file_name, gna_config);
ImportModel(exported_file_name, gna_config);
@ -177,7 +177,7 @@ TEST_F(GNAExportImportTest, ExportImportI16) {
TEST_F(GNAExportImportTest, ExportImportI8) {
const ov::AnyMap gna_config = {ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::inference_precision(ngraph::element::i8)};
ov::hint::inference_precision(ngraph::element::i8)};
exported_file_name = "export_test.bin";
ExportModel(exported_file_name, gna_config);
ImportModel(exported_file_name, gna_config);

View File

@ -85,13 +85,13 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestDefault) {
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI16) {
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::inference_precision(ngraph::element::i16)});
ov::hint::inference_precision(ngraph::element::i16)});
compare(ngraph::element::i16, ngraph::element::i32, sizeof(int16_t), sizeof(uint32_t));
}
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::inference_precision(ngraph::element::i8)});
ov::hint::inference_precision(ngraph::element::i8)});
compare(ngraph::element::i16,
ngraph::element::i32,
sizeof(int8_t),
@ -100,7 +100,7 @@ TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8) {
TEST_F(GNAHwPrecisionTest, GNAHwPrecisionTestI8LP) {
Run({ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
true);
compare(ngraph::element::i8, ngraph::element::i32, sizeof(int8_t), sizeof(int8_t));
}

View File

@ -122,13 +122,13 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.125f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
}),
::testing::Values(true), // gna device
::testing::Values(false), // use low precision
@ -148,13 +148,13 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
}),
::testing::Values(true), // gna device
::testing::Values(true), // use low precision
@ -200,13 +200,13 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 0.25f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
}),
::testing::Values(true), // gna device
::testing::Values(false), // use low precision
@ -227,13 +227,13 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map,
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 10.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 20.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
}),
::testing::Values(true), // gna device
::testing::Values(true), // use low precision
@ -254,10 +254,10 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 8.0f}}),
ov::inference_precision(ngraph::element::i16)},
ov::hint::inference_precision(ngraph::element::i16)},
}),
::testing::Values(true), // gna device
::testing::Values(false), // use low precision
@ -278,10 +278,10 @@ INSTANTIATE_TEST_SUITE_P(
// gna config map
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 1.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
{ov::intel_gna::execution_mode(ov::intel_gna::ExecutionMode::SW_EXACT),
ov::intel_gna::scale_factors_per_input(std::map<std::string, float>{{"0", 4.0f}}),
ov::inference_precision(ngraph::element::i8)},
ov::hint::inference_precision(ngraph::element::i8)},
}),
::testing::Values(true), // gna device
::testing::Values(true), // use low precision

View File

@ -247,9 +247,9 @@ TEST_F(GNAPluginConfigTest, GnaConfigExecutionModeUpdatesGnaPrecision) {
}
TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionUpdatesGnaPrecision) {
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I8);
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I16);
}
@ -257,7 +257,7 @@ TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionHasHigherPriorityI16) {
SetAndCompare(GNA_CONFIG_KEY(PRECISION), Precision(Precision::I8).name());
SetAndCompare(ov::hint::execution_mode.name(),
ov::util::to_string<ov::hint::ExecutionMode>(ov::hint::ExecutionMode::PERFORMANCE));
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i16));
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I16);
}
@ -265,6 +265,6 @@ TEST_F(GNAPluginConfigTest, GnaConfigInferencePrecisionHasHigherPriorityI8) {
SetAndCompare(GNA_CONFIG_KEY(PRECISION), Precision(Precision::I16).name());
SetAndCompare(ov::hint::execution_mode.name(),
ov::util::to_string<ov::hint::ExecutionMode>(ov::hint::ExecutionMode::ACCURACY));
SetAndCompare(ov::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
SetAndCompare(ov::hint::inference_precision.name(), ov::util::to_string<ov::element::Type>(ov::element::i8));
EXPECT_EQ(config.gnaPrecision, InferenceEngine::Precision::I8);
}

View File

@ -325,7 +325,7 @@ InferenceEngine::Parameter CompiledModel::GetMetric(const std::string &name) con
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RO},
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RO},
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RO},
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RO},
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RO},
ov::PropertyName{ov::device::id.name(), PropertyMutability::RO},
ov::PropertyName{ov::execution_devices.name(), PropertyMutability::RO}
};

View File

@ -14,7 +14,7 @@ bool LegacyAPIHelper::is_new_api_property(const std::pair<std::string, ov::Any>&
static const std::vector<std::string> new_properties_list = {
ov::intel_gpu::hint::queue_priority.name(),
ov::intel_gpu::hint::queue_throttle.name(),
ov::inference_precision.name(),
ov::hint::inference_precision.name(),
ov::compilation_num_threads.name(),
ov::num_streams.name(),
};

View File

@ -671,7 +671,7 @@ Parameter Plugin::GetMetric(const std::string& name, const std::map<std::string,
cachingProperties.push_back(ov::PropertyName(ov::device::architecture.name(), PropertyMutability::RO));
cachingProperties.push_back(ov::PropertyName(ov::intel_gpu::execution_units_count.name(), PropertyMutability::RO));
cachingProperties.push_back(ov::PropertyName(ov::intel_gpu::driver_version.name(), PropertyMutability::RO));
cachingProperties.push_back(ov::PropertyName(ov::inference_precision.name(), PropertyMutability::RW));
cachingProperties.push_back(ov::PropertyName(ov::hint::inference_precision.name(), PropertyMutability::RW));
cachingProperties.push_back(ov::PropertyName(ov::hint::execution_mode.name(), PropertyMutability::RW));
return decltype(ov::caching_properties)::value_type(cachingProperties);
} else if (name == ov::intel_gpu::driver_version) {
@ -730,7 +730,7 @@ std::vector<ov::PropertyName> Plugin::get_supported_properties() const {
ov::PropertyName{ov::compilation_num_threads.name(), PropertyMutability::RW},
ov::PropertyName{ov::num_streams.name(), PropertyMutability::RW},
ov::PropertyName{ov::hint::num_requests.name(), PropertyMutability::RW},
ov::PropertyName{ov::inference_precision.name(), PropertyMutability::RW},
ov::PropertyName{ov::hint::inference_precision.name(), PropertyMutability::RW},
ov::PropertyName{ov::device::id.name(), PropertyMutability::RW},
};

View File

@ -40,7 +40,7 @@ void ExecutionConfig::set_default() {
std::make_tuple(ov::cache_dir, ""),
std::make_tuple(ov::num_streams, 1),
std::make_tuple(ov::compilation_num_threads, std::max(1, static_cast<int>(std::thread::hardware_concurrency()))),
std::make_tuple(ov::inference_precision, ov::element::f16, InferencePrecisionValidator()),
std::make_tuple(ov::hint::inference_precision, ov::element::f16, InferencePrecisionValidator()),
std::make_tuple(ov::hint::model_priority, ov::hint::Priority::MEDIUM),
std::make_tuple(ov::hint::performance_mode, ov::hint::PerformanceMode::LATENCY, PerformanceModeValidator()),
std::make_tuple(ov::hint::execution_mode, ov::hint::ExecutionMode::PERFORMANCE),
@ -123,14 +123,14 @@ Any ExecutionConfig::get_property(const std::string& name) const {
void ExecutionConfig::apply_execution_hints(const cldnn::device_info& info) {
if (is_set_by_user(ov::hint::execution_mode)) {
const auto mode = get_property(ov::hint::execution_mode);
if (!is_set_by_user(ov::inference_precision)) {
if (!is_set_by_user(ov::hint::inference_precision)) {
if (mode == ov::hint::ExecutionMode::ACCURACY) {
set_property(ov::inference_precision(ov::element::f32));
set_property(ov::hint::inference_precision(ov::element::f32));
} else if (mode == ov::hint::ExecutionMode::PERFORMANCE) {
if (info.supports_fp16)
set_property(ov::inference_precision(ov::element::f16));
set_property(ov::hint::inference_precision(ov::element::f16));
else
set_property(ov::inference_precision(ov::element::f32));
set_property(ov::hint::inference_precision(ov::element::f32));
}
}
}

View File

@ -39,7 +39,7 @@ TEST_P(InferencePrecisionTests, smoke_canSetInferencePrecisionAndInfer) {
std::tie(model_precision, inference_precision) = GetParam();
auto function = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(CommonTestUtils::DEVICE_GPU, {1, 1, 32, 32}, model_precision);
ov::CompiledModel compiled_model;
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::inference_precision(inference_precision)));
OV_ASSERT_NO_THROW(compiled_model = core->compile_model(function, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(inference_precision)));
auto req = compiled_model.create_infer_request();
OV_ASSERT_NO_THROW(req.infer());
}
@ -67,7 +67,7 @@ TEST(ExecutionModeTest, SetCompileGetInferPrecisionAndExecMode) {
core.set_property(CommonTestUtils::DEVICE_GPU, ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
auto model = ngraph::builder::subgraph::makeConvPoolRelu();
{
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
auto compiled_model = core.compile_model(model, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
ASSERT_EQ(ov::hint::ExecutionMode::PERFORMANCE, compiled_model.get_property(ov::hint::execution_mode));
ASSERT_EQ(ov::element::f32, compiled_model.get_property(ov::hint::inference_precision));
}

View File

@ -55,7 +55,7 @@ TEST_P(OVConcurrencyTest, canInferTwoExecNets) {
auto fn = fn_ptrs[i];
auto exec_net = ie.compile_model(fn_ptrs[i], CommonTestUtils::DEVICE_GPU,
ov::num_streams(num_streams), ov::inference_precision(ov::element::f32));
ov::num_streams(num_streams), ov::hint::inference_precision(ov::element::f32));
auto input = fn_ptrs[i]->get_parameters().at(0);
auto output = fn_ptrs[i]->get_results().at(0);
@ -115,7 +115,7 @@ TEST(canSwapTensorsBetweenInferRequests, inputs) {
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
auto ie = ov::Core();
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
const int infer_requests_num = 2;
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
@ -193,7 +193,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, usmHostIsNotChanged) {
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
auto ie = ov::Core();
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
@ -232,7 +232,7 @@ TEST(smoke_InferRequestDeviceMemoryAllocation, canSetSystemHostTensor) {
auto fn = ngraph::builder::subgraph::makeDetectionOutput(ngraph::element::Type_t::f32);
auto ie = ov::Core();
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
ov::InferRequest infer_request1 = compiled_model.create_infer_request();
ov::InferRequest infer_request2 = compiled_model.create_infer_request();
@ -258,7 +258,7 @@ TEST(canSwapTensorsBetweenInferRequests, outputs) {
auto fn = ngraph::builder::subgraph::makeSplitMultiConvConcat();
auto ie = ov::Core();
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::inference_precision(ov::element::f32));
auto compiled_model = ie.compile_model(fn, CommonTestUtils::DEVICE_GPU, ov::hint::inference_precision(ov::element::f32));
const int infer_requests_num = 2;
ov::InferRequest infer_request1 = compiled_model.create_infer_request();

View File

@ -40,7 +40,7 @@ public:
{CONFIG_KEY(AUTO_BATCH_TIMEOUT) , "0"},
};
}
config.insert({ov::inference_precision.name(), "f32"});
config.insert({ov::hint::inference_precision.name(), "f32"});
fn_ptr = ov::test::behavior::getDefaultNGraphFunctionForTheDevice(with_auto_batching ? CommonTestUtils::DEVICE_BATCH : deviceName);
}
static std::string getTestCaseName(const testing::TestParamInfo<bool>& obj) {
@ -230,7 +230,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserContext) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -277,7 +277,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -305,7 +305,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_out_of_order) {
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
// without calling thread blocks
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
auto inf_req_shared = exec_net_shared.CreateInferRequest();
// Allocate shared buffers for input and output data which will be set to infer request
@ -375,7 +375,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -404,7 +404,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_in_order) {
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
// without calling thread blocks
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
auto inf_req_shared = exec_net_shared.CreateInferRequest();
// Allocate shared buffers for input and output data which will be set to infer request
@ -469,7 +469,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
auto blob = FuncTestUtils::createAndFillBlob(net.getInputsInfo().begin()->second->getTensorDesc());
auto ie = PluginCache::get().ie();
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::inference_precision.name(), "f32"}});
auto exec_net_regular = ie->LoadNetwork(net, deviceName, {{ov::hint::inference_precision.name(), "f32"}});
// regular inference
auto inf_req_regular = exec_net_regular.CreateInferRequest();
@ -498,7 +498,7 @@ TEST_P(RemoteBlob_Test, smoke_canInferOnUserQueue_infer_call_many_times) {
// In this scenario we create shared OCL queue and run simple pre-process action and post-process action (buffer copies in both cases)
// without calling thread blocks
auto remote_context = make_shared_context(*ie, deviceName, ocl_instance->_queue.get());
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::inference_precision.name(), "f32"}});
auto exec_net_shared = ie->LoadNetwork(net, remote_context, {{ov::hint::inference_precision.name(), "f32"}});
auto inf_req_shared = exec_net_shared.CreateInferRequest();
// Allocate shared buffers for input and output data which will be set to infer request
@ -601,7 +601,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
/* XXX: is it correct to set KEY_CLDNN_NV12_TWO_INPUTS in case of remote blob? */
auto exec_net_b = ie.LoadNetwork(net_remote, CommonTestUtils::DEVICE_GPU,
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::inference_precision.name(), "f32"} });
{ { GPUConfigParams::KEY_GPU_NV12_TWO_INPUTS, PluginConfigParams::YES}, {ov::hint::inference_precision.name(), "f32"} });
auto inf_req_remote = exec_net_b.CreateInferRequest();
auto cldnn_context = exec_net_b.GetContext();
cl_context ctx = std::dynamic_pointer_cast<ClContext>(cldnn_context)->get();
@ -670,7 +670,7 @@ TEST_P(BatchedBlob_Test, canInputNV12) {
net_local.getInputsInfo().begin()->second->setPrecision(Precision::U8);
net_local.getInputsInfo().begin()->second->getPreProcess().setColorFormat(ColorFormat::NV12);
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::inference_precision.name(), "f32"}});
auto exec_net_b1 = ie.LoadNetwork(net_local, CommonTestUtils::DEVICE_GPU, {{ov::hint::inference_precision.name(), "f32"}});
auto inf_req_local = exec_net_b1.CreateInferRequest();
@ -742,7 +742,7 @@ TEST_P(TwoNets_Test, canInferTwoExecNets) {
auto exec_net = ie.LoadNetwork(net, CommonTestUtils::DEVICE_GPU,
{{PluginConfigParams::KEY_GPU_THROUGHPUT_STREAMS, std::to_string(num_streams)},
{ov::inference_precision.name(), "f32"}});
{ov::hint::inference_precision.name(), "f32"}});
for (int j = 0; j < num_streams * num_requests; j++) {
outputs.push_back(net.getOutputsInfo().begin()->first);

View File

@ -350,13 +350,13 @@ TEST_P(OVClassGetPropertyTest_GPU, GetAndSetInferencePrecisionNoThrow) {
auto value = ov::element::undefined;
const auto expected_default_precision = ov::element::f16;
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
ASSERT_EQ(expected_default_precision, value);
const auto forced_precision = ov::element::f32;
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::inference_precision(forced_precision)));
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::inference_precision));
OV_ASSERT_NO_THROW(ie.set_property(target_device, ov::hint::inference_precision(forced_precision)));
OV_ASSERT_NO_THROW(value = ie.get_property(target_device, ov::hint::inference_precision));
ASSERT_EQ(value, forced_precision);
OPENVINO_SUPPRESS_DEPRECATED_START
@ -728,7 +728,7 @@ auto gpuCorrectConfigsWithSecondaryProperties = []() {
return std::vector<ov::AnyMap>{
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE),
ov::inference_precision(ov::element::f32))},
ov::hint::inference_precision(ov::element::f32))},
{ov::device::properties(CommonTestUtils::DEVICE_GPU,
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT),
ov::hint::allow_auto_batching(false))},
@ -821,7 +821,7 @@ TEST_P(OVClassGetMetricTest_CACHING_PROPERTIES, GetMetricAndPrintNoThrow) {
ov::device::architecture.name(),
ov::intel_gpu::execution_units_count.name(),
ov::intel_gpu::driver_version.name(),
ov::inference_precision.name(),
ov::hint::inference_precision.name(),
ov::hint::execution_mode.name(),
};

View File

@ -36,7 +36,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNonContiguousAxes
auto core = ov::Core();
ov::AnyMap config;
if (device_name == CommonTestUtils::DEVICE_GPU)
config.insert(ov::inference_precision(ov::element::f32));
config.insert(ov::hint::inference_precision(ov::element::f32));
const auto compiled_model = core.compile_model(model, device_name, config);
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
@ -56,7 +56,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeAppliedForNormalizeOverAllA
auto core = ov::Core();
ov::AnyMap config;
if (device_name == CommonTestUtils::DEVICE_GPU)
config.insert(ov::inference_precision(ov::element::f32));
config.insert(ov::hint::inference_precision(ov::element::f32));
const auto compiled_model = core.compile_model(model, device_name, config);
ASSERT_TRUE(model->get_ops().size() < compiled_model.get_runtime_model()->get_ops().size()); // decomposition applied
@ -76,7 +76,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForNotSorted) {
auto core = ov::Core();
ov::AnyMap config;
if (device_name == CommonTestUtils::DEVICE_GPU)
config.insert(ov::inference_precision(ov::element::f32));
config.insert(ov::hint::inference_precision(ov::element::f32));
const auto compiled_model = core.compile_model(model, device_name, config);
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied
@ -96,7 +96,7 @@ TEST_P(ExecGrapDecomposeNormalizeL2, CheckIfDecomposeNotAppliedForSingleAxis) {
auto core = ov::Core();
ov::AnyMap config;
if (device_name == CommonTestUtils::DEVICE_GPU)
config.insert(ov::inference_precision(ov::element::f32));
config.insert(ov::hint::inference_precision(ov::element::f32));
const auto compiled_model = core.compile_model(model, device_name, config);
ASSERT_TRUE(model->get_ops().size() >= compiled_model.get_runtime_model()->get_ops().size()); // decomposition not applied

View File

@ -226,7 +226,7 @@ void SubgraphBaseTest::compile_model() {
break;
}
}
configuration.insert({ov::inference_precision.name(), hint});
configuration.insert({ov::hint::inference_precision.name(), hint});
}
compiledModel = core->compile_model(function, targetDevice, configuration);

View File

@ -54,7 +54,7 @@ void SnippetsTestsCommon::validateOriginalLayersNamesByType(const std::string& l
ASSERT_TRUE(false) << "Layer type '" << layerType << "' was not found in compiled model";
}
void SnippetsTestsCommon::setInferenceType(ov::element::Type type) {
configuration.emplace(ov::inference_precision(type));
configuration.emplace(ov::hint::inference_precision(type));
}
} // namespace test