Docs: Model caching feature overview (#6275)
This commit is contained in:
@@ -31,6 +31,12 @@ input images to achieve optimal throughput. However, high batch size also comes
|
||||
latency penalty. So, for more real-time oriented usages, lower batch sizes (as low as a single input) are used.
|
||||
Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring.
|
||||
|
||||
## Using Caching API for first inference latency optimization
|
||||
Since with the 2021.4 release, Inference Engine provides an ability to enable internal caching of loaded networks.
|
||||
This can significantly reduce load network latency for some devices at application startup.
|
||||
Internally caching uses plugin's Export/ImportNetwork flow, like it is done for [Compile tool](../../inference-engine/tools/compile_tool/README.md), using the regular ReadNetwork/LoadNetwork API.
|
||||
Refer to the [Model Caching Overview](Model_caching_overview.md) for more detailed explanation.
|
||||
|
||||
## Using Async API
|
||||
To gain better performance on accelerators, such as VPU, the Inference Engine uses the asynchronous approach (see
|
||||
[Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)).
|
||||
|
||||
65
docs/IE_DG/Model_caching_overview.md
Normal file
65
docs/IE_DG/Model_caching_overview.md
Normal file
@@ -0,0 +1,65 @@
|
||||
# Model Caching Overview {#openvino_docs_IE_DG_Model_caching_overview}
|
||||
|
||||
## Introduction
|
||||
|
||||
As described in [Inference Engine Introduction](inference_engine_intro.md), common application flow consists of the following steps:
|
||||
|
||||
1. **Create Inference Engine Core object**
|
||||
|
||||
2. **Read the Intermediate Representation** - Read an Intermediate Representation file into an object of the `InferenceEngine::CNNNetwork`
|
||||
|
||||
3. **Prepare inputs and outputs**
|
||||
|
||||
4. **Set configuration** Pass device-specific loading configurations to the device
|
||||
|
||||
5. **Compile and Load Network to device** - Use the `InferenceEngine::Core::LoadNetwork()` method with specific device
|
||||
|
||||
6. **Set input data**
|
||||
|
||||
7. **Execute**
|
||||
|
||||
Step #5 can potentially perform several time-consuming device-specific optimizations and network compilations,
|
||||
and such delays can lead to bad user experience on application startup. To avoid this, some devices offer
|
||||
Import/Export network capability, and it is possible to either use [Compile tool](../../inference-engine/tools/compile_tool/README.md)
|
||||
or enable model caching to export compiled network automatically. Reusing cached networks can significantly reduce load network time.
|
||||
|
||||
|
||||
## Set "CACHE_DIR" config option to enable model caching
|
||||
|
||||
To enable model caching, the application must specify the folder where to store cached blobs. It can be done like this
|
||||
|
||||
|
||||
@snippet snippets/InferenceEngine_Caching0.cpp part0
|
||||
|
||||
With this code, if device supports Import/Export network capability, cached blob is automatically created inside the `myCacheFolder` folder
|
||||
CACHE_DIR config is set to the Core object. If device does not support Import/Export capability, cache is just not created and no error is thrown
|
||||
|
||||
Depending on your device, total time for loading network on application startup can be significantly reduced.
|
||||
Please also note that very first LoadNetwork (when cache is not yet created) takes slightly longer time to 'export' compiled blob into a cache file
|
||||
![caching_enabled]
|
||||
|
||||
## Even faster: use LoadNetwork(modelPath)
|
||||
|
||||
In some cases, applications do not need to customize inputs and outputs every time. Such applications always
|
||||
call `cnnNet = ie.ReadNetwork(...)`, then `ie.LoadNetwork(cnnNet, ..)` and it can be further optimized.
|
||||
For such cases, more convenient API to load network in one call is introduced in the 2021.4 release.
|
||||
|
||||
@snippet snippets/InferenceEngine_Caching1.cpp part1
|
||||
|
||||
With enabled model caching, total load time is even smaller - in case that ReadNetwork is optimized as well
|
||||
|
||||
@snippet snippets/InferenceEngine_Caching2.cpp part2
|
||||
|
||||
![caching_times]
|
||||
|
||||
|
||||
## Advanced examples
|
||||
|
||||
Not every device supports network import/export capability, enabling of caching for such devices do not have any effect.
|
||||
To check in advance if a particular device supports model caching, your application can use the following code:
|
||||
|
||||
@snippet snippets/InferenceEngine_Caching3.cpp part3
|
||||
|
||||
|
||||
[caching_enabled]: ../img/caching_enabled.png
|
||||
[caching_times]: ../img/caching_times.png
|
||||
@@ -278,6 +278,7 @@ limitations under the License.
|
||||
<tab type="user" title="Inference Engine API Changes History" url="@ref openvino_docs_IE_DG_API_Changes"/>
|
||||
<tab type="user" title="Inference Engine Memory primitives" url="@ref openvino_docs_IE_DG_Memory_primitives"/>
|
||||
<tab type="user" title="Inference Engine Device Query API" url="@ref openvino_docs_IE_DG_InferenceEngine_QueryAPI"/>
|
||||
<tab type="user" title="Inference Engine Model Caching" url="@ref openvino_docs_IE_DG_Model_caching_overview"/>
|
||||
<tab type="usergroup" title="Inference Engine Extensibility Mechanism" url="@ref openvino_docs_IE_DG_Extensibility_DG_Intro">
|
||||
<tab type="user" title="Extension Library" url="@ref openvino_docs_IE_DG_Extensibility_DG_Extension"/>
|
||||
<tab type="user" title="Custom Operations" url="@ref openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps"/>
|
||||
|
||||
3
docs/img/caching_enabled.png
Normal file
3
docs/img/caching_enabled.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:488a7a47e5086a6868c22219bc9d58a3508059e5a1dc470f2653a12552dea82f
|
||||
size 36207
|
||||
3
docs/img/caching_times.png
Normal file
3
docs/img/caching_times.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2eed189f9cb3d30fe13b4ba4515edd4e6da5d01545660e65fa8a33d945967281
|
||||
size 28894
|
||||
17
docs/snippets/InferenceEngine_Caching0.cpp
Normal file
17
docs/snippets/InferenceEngine_Caching0.cpp
Normal file
@@ -0,0 +1,17 @@
|
||||
#include <ie_core.hpp>
|
||||
|
||||
int main() {
|
||||
using namespace InferenceEngine;
|
||||
std::string modelPath = "/tmp/myModel.xml";
|
||||
std::string device = "GNA";
|
||||
std::map<std::string, std::string> deviceConfig;
|
||||
//! [part0]
|
||||
InferenceEngine::Core ie; // Step 1: create Inference engine object
|
||||
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
|
||||
auto cnnNet = ie.ReadNetwork(modelPath); // Step 2: ReadNetwork
|
||||
//... // Step 3: Prepare inputs/outputs
|
||||
//... // Step 4: Set device configuration
|
||||
ie.LoadNetwork(cnnNet, device, deviceConfig); // Step 5: LoadNetwork
|
||||
//! [part0]
|
||||
return 0;
|
||||
}
|
||||
13
docs/snippets/InferenceEngine_Caching1.cpp
Normal file
13
docs/snippets/InferenceEngine_Caching1.cpp
Normal file
@@ -0,0 +1,13 @@
|
||||
#include <ie_core.hpp>
|
||||
|
||||
int main() {
|
||||
using namespace InferenceEngine;
|
||||
std::string modelPath = "/tmp/myModel.xml";
|
||||
std::string device = "GNA";
|
||||
std::map<std::string, std::string> deviceConfig;
|
||||
//! [part1]
|
||||
InferenceEngine::Core ie; // Step 1: create Inference engine object
|
||||
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
|
||||
//! [part1]
|
||||
return 0;
|
||||
}
|
||||
14
docs/snippets/InferenceEngine_Caching2.cpp
Normal file
14
docs/snippets/InferenceEngine_Caching2.cpp
Normal file
@@ -0,0 +1,14 @@
|
||||
#include <ie_core.hpp>
|
||||
|
||||
int main() {
|
||||
using namespace InferenceEngine;
|
||||
std::string modelPath = "/tmp/myModel.xml";
|
||||
std::string device = "GNA";
|
||||
std::map<std::string, std::string> deviceConfig;
|
||||
//! [part2]
|
||||
InferenceEngine::Core ie; // Step 1: create Inference engine object
|
||||
ie.SetConfig({{CONFIG_KEY(CACHE_DIR), "myCacheFolder"}}); // Step 1b: Enable caching
|
||||
ie.LoadNetwork(modelPath, device, deviceConfig); // Step 2: LoadNetwork by model file path
|
||||
//! [part2]
|
||||
return 0;
|
||||
}
|
||||
20
docs/snippets/InferenceEngine_Caching3.cpp
Normal file
20
docs/snippets/InferenceEngine_Caching3.cpp
Normal file
@@ -0,0 +1,20 @@
|
||||
#include <ie_core.hpp>
|
||||
|
||||
int main() {
|
||||
using namespace InferenceEngine;
|
||||
std::string modelPath = "/tmp/myModel.xml";
|
||||
std::string deviceName = "GNA";
|
||||
std::map<std::string, std::string> deviceConfig;
|
||||
InferenceEngine::Core ie;
|
||||
//! [part3]
|
||||
// Get list of supported metrics
|
||||
std::vector<std::string> keys = ie.GetMetric(deviceName, METRIC_KEY(SUPPORTED_METRICS));
|
||||
|
||||
// Find 'IMPORT_EXPORT_SUPPORT' metric in supported metrics
|
||||
auto it = std::find(keys.begin(), keys.end(), METRIC_KEY(IMPORT_EXPORT_SUPPORT));
|
||||
|
||||
// If metric 'IMPORT_EXPORT_SUPPORT' exists, check it's value
|
||||
bool cachingSupported = (it != keys.end()) && ie.GetMetric(deviceName, METRIC_KEY(IMPORT_EXPORT_SUPPORT));
|
||||
//! [part3]
|
||||
return 0;
|
||||
}
|
||||
Reference in New Issue
Block a user