Docs: Model caching feature overview (#6275)

This commit is contained in:
Mikhail Nosov
2021-06-22 01:06:54 +03:00
committed by GitHub
parent 6ab6983778
commit 1aa89edbf3
9 changed files with 142 additions and 0 deletions

View File

@@ -31,6 +31,12 @@ input images to achieve optimal throughput. However, high batch size also comes
latency penalty. So, for more real-time oriented usages, lower batch sizes (as low as a single input) are used.
Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring.
## Using Caching API for first inference latency optimization
Since with the 2021.4 release, Inference Engine provides an ability to enable internal caching of loaded networks.
This can significantly reduce load network latency for some devices at application startup.
Internally caching uses plugin's Export/ImportNetwork flow, like it is done for [Compile tool](../../inference-engine/tools/compile_tool/README.md), using the regular ReadNetwork/LoadNetwork API.
Refer to the [Model Caching Overview](Model_caching_overview.md) for more detailed explanation.
## Using Async API
To gain better performance on accelerators, such as VPU, the Inference Engine uses the asynchronous approach (see
[Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)).

View File

@@ -0,0 +1,65 @@
# Model Caching Overview {#openvino_docs_IE_DG_Model_caching_overview}
## Introduction
As described in [Inference Engine Introduction](inference_engine_intro.md), common application flow consists of the following steps:
1. **Create Inference Engine Core object**
2. **Read the Intermediate Representation** - Read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork`
3. **Prepare inputs and outputs**
4. **Set configuration** Pass device-specific loading configurations to the device
5. **Compile and Load Network to device** - Use the `InferenceEngine::Core::LoadNetwork()` method with specific device
6. **Set input data**
7. **Execute**
Step #5 can potentially perform several time-consuming device-specific optimizations and network compilations,
and such delays can lead to bad user experience on application startup. To avoid this, some devices offer
Import/Export network capability, and it is possible to either use [Compile tool](../../inference-engine/tools/compile_tool/README.md)
or enable model caching to export compiled network automatically. Reusing cached networks can significantly reduce load network time.
## Set "CACHE_DIR" config option to enable model caching
To enable model caching, the application must specify the folder where to store cached blobs. It can be done like this
@snippet snippets/InferenceEngine_Caching0.cpp part0
With this code, if device supports Import/Export network capability, cached blob is automatically created inside the `myCacheFolder` folder
CACHE_DIR config is set to the Core object. If device does not support Import/Export capability, cache is just not created and no error is thrown
Depending on your device, total time for loading network on application startup can be significantly reduced.
Please also note that very first LoadNetwork (when cache is not yet created) takes slightly longer time to 'export' compiled blob into a cache file
![caching_enabled]
## Even faster: use LoadNetwork(modelPath)
In some cases, applications do not need to customize inputs and outputs every time. Such applications always
call `cnnNet = ie.ReadNetwork(...)`, then `ie.LoadNetwork(cnnNet, ..)` and it can be further optimized.
For such cases, more convenient API to load network in one call is introduced in the 2021.4 release.
@snippet snippets/InferenceEngine_Caching1.cpp part1
With enabled model caching, total load time is even smaller - in case that ReadNetwork is optimized as well
@snippet snippets/InferenceEngine_Caching2.cpp part2
![caching_times]
## Advanced examples
Not every device supports network import/export capability, enabling of caching for such devices do not have any effect.
To check in advance if a particular device supports model caching, your application can use the following code:
@snippet snippets/InferenceEngine_Caching3.cpp part3
[caching_enabled]: ../img/caching_enabled.png
[caching_times]: ../img/caching_times.png

View File

@@ -278,6 +278,7 @@ limitations under the License.
<tab type="user" title="Inference Engine API Changes History" url="@ref openvino_docs_IE_DG_API_Changes"/>
<tab type="user" title="Inference Engine Memory primitives" url="@ref openvino_docs_IE_DG_Memory_primitives"/>
<tab type="user" title="Inference Engine Device Query API" url="@ref openvino_docs_IE_DG_InferenceEngine_QueryAPI"/>
<tab type="user" title="Inference Engine Model Caching" url="@ref openvino_docs_IE_DG_Model_caching_overview"/>
<tab type="usergroup" title="Inference Engine Extensibility Mechanism" url="@ref openvino_docs_IE_DG_Extensibility_DG_Intro">
<tab type="user" title="Extension Library" url="@ref openvino_docs_IE_DG_Extensibility_DG_Extension"/>
<tab type="user" title="Custom Operations" url="@ref openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps"/>

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:488a7a47e5086a6868c22219bc9d58a3508059e5a1dc470f2653a12552dea82f
size 36207

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:2eed189f9cb3d30fe13b4ba4515edd4e6da5d01545660e65fa8a33d945967281
size 28894

View File

@@ -0,0 +1,17 @@
#include <ie_core.hpp>
int main() {
using namespace InferenceEngine;
std::string modelPath = "/tmp/myModel.xml";
std::string device = "GNA";
std::map<std::string, std::string> deviceConfig;
//! [part0]
InferenceEngine::Core ie; // Step 1: create Inference engine object
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
auto cnnNet = ie.ReadNetwork(modelPath); // Step 2: ReadNetwork
//... // Step 3: Prepare inputs/outputs
//... // Step 4: Set device configuration
ie.LoadNetwork(cnnNet, device, deviceConfig); // Step 5: LoadNetwork
//! [part0]
return 0;
}

View File

@@ -0,0 +1,13 @@
#include <ie_core.hpp>
int main() {
using namespace InferenceEngine;
std::string modelPath = "/tmp/myModel.xml";
std::string device = "GNA";
std::map<std::string, std::string> deviceConfig;
//! [part1]
InferenceEngine::Core ie; // Step 1: create Inference engine object
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
//! [part1]
return 0;
}

View File

@@ -0,0 +1,14 @@
#include <ie_core.hpp>
int main() {
using namespace InferenceEngine;
std::string modelPath = "/tmp/myModel.xml";
std::string device = "GNA";
std::map<std::string, std::string> deviceConfig;
//! [part2]
InferenceEngine::Core ie; // Step 1: create Inference engine object
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
//! [part2]
return 0;
}

View File

@@ -0,0 +1,20 @@
#include <ie_core.hpp>
int main() {
using namespace InferenceEngine;
std::string modelPath = "/tmp/myModel.xml";
std::string deviceName = "GNA";
std::map<std::string, std::string> deviceConfig;
InferenceEngine::Core ie;
//! [part3]
// Get list of supported metrics
std::vector<std::string> keys = ie.GetMetric(deviceName, METRIC_KEY(SUPPORTED_METRICS));
// Find 'IMPORT_EXPORT_SUPPORT' metric in supported metrics
auto it = std::find(keys.begin(), keys.end(), METRIC_KEY(IMPORT_EXPORT_SUPPORT));
// If metric 'IMPORT_EXPORT_SUPPORT' exists, check it's value
bool cachingSupported = (it != keys.end()) && ie.GetMetric(deviceName, METRIC_KEY(IMPORT_EXPORT_SUPPORT));
//! [part3]
return 0;
}