Prepare "memory_optimization_guide.md" (#17022)

* Prepare "memory_optimizing_guide.md"

* Modify `memory_optimization_guide.md`

* Add memory guide into toctree

* Rephrase memory_optimization_guide.md

Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>

---------

Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
This commit is contained in:
Vitaliy Urusovskij 2023-05-11 14:23:36 +04:00 committed by GitHub
parent c13423e2ca
commit d99a5ab1ba
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 52 additions and 0 deletions

View File

@ -13,6 +13,7 @@
openvino_docs_deployment_optimization_guide_tput_advanced
openvino_docs_OV_UG_Preprocessing_Overview
openvino_docs_deployment_optimization_guide_internals
openvino_docs_memory_optimization_guide
Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.

View File

@ -0,0 +1,51 @@
# Optimizing memory usage {#openvino_docs_memory_optimization_guide}
@sphinxdirective
.. warning::
Before applying any of the recommendations provided here, note that it may significantly impact first inference latency.
The most RAM-consuming OpenVINO stage is model compilation. It may cause several issues:
* Not enough memory to compile a model. To decrease memory requirement, the following options may be applied:
* Weights mapping - memory mapping (using ``mmap``) has been introduced as the default way to work
with weights. Currently, this feature is supported by the IR frontend.
Mapping may be switched by specifying the ``ov::enable_mmap(BOOL)`` property for the ``ov::Core``.
Because of its "memory-on-demand" nature, there is no need to store all weights
in RAM. Storing just the data that is needed at the moment lowers the amount of memory
required for compilation. Moreover, ``mmap`` provides extensive memory sharing, so the
consecutive compilation of the same model will fetch the information already stored in RAM
instead of reading it one more time from storage.
* Decrease the number of threads for compilation - to change the number of threads, specify
the ``ov::compilation_num_threads(NUMBER)`` property for the ``ov::Core`` or pass it as an additional
argument to ``ov::Core::compile_model()``
* Not enough memory to recompile a model. If model compilation is successful but one of the following recompilations fails due lack of resources, it may be caused by:
* Memory leak - to determine direct leaks, you can use tools like 'address-sanitizer' or
'valgrind'. In case of indirect leaks, which cannot be caught by tools, peak RAM (VMHWM)
may be tracked (you can use tests/stress_tests/memleaks_tests as a tracking tool). If you
experience significant memory usage increase, report it in
`Github "Issues" <https://github.com/openvinotoolkit/openvino/issues>`__
* Memory allocator behavior - each allocator works according to a unique strategy and
balances between performance and memory usage. For example, the GNU allocator aggressively
requests from the OS for more memory for consecutive model compilations than was
required for the first compilation (such behavior may be determined by tracking actual RAM
(VMRSS) after compilation - it will grow until some stable point). To optimize memory
pressure, the following options are available:
* Apply ``malloc_trim(0)``. The function attempts to release free memory even from thread
caches, so it may signifficantly decrease and stabilize VMRSS usage
* Use glibc ``Tunables``. A couple of promising options are:
``glibc.malloc.trim_threshold`` and `glibc.malloc.arena_max`.
More details on the two may be found in the
`GNU Tunables Manual <https://www.gnu.org/software/libc/manual/html_node/Tunables.html>`__
* Try another allocator. One of the allocators that handles memory carefully is ``jemalloc``
@endsphinxdirective