* Extensibility guide with FE extensions and remove OV_FRAMEWORK_MAP from docs * Rework of Extensibility Intro, adopted examples to missing OPENVINO_FRAMEWORK_MAP * Removed OPENVINO_FRAMEWORK_MAP reference * Frontend extension detailed documentation * Fixed distributed snippets * Fixed snippet inclusion in FE extension document and chapter headers * Fixed wrong name in a snippet reference * Fixed test for template extension due to changed number of loaded extensions * Update docs/Extensibility_UG/frontend_extensions.md Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> * Minor fixes in extension snippets * Small grammar fix Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> * DOCS: transition banner (#10973) * transition banner * minor fix * update transition banner * updates * update custom.js * updates * updates * Documentation fixes (#11044) * Benchmark app usage * Fixed link to the devices * More fixes * Update docs/OV_Runtime_UG/multi_device.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Removed several hardcoded links Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Updated documentation for compile_tool (#11049) * Added deployment guide (#11060) * Added deployment guide * Added local distribution * Updates * Fixed more indentations * Removed obsolete code snippets (#11061) * Removed obsolete code snippets * NCC style * Fixed NCC for BA * Add a troubleshooting issue for PRC installation (#11074) * updates * adding gna to linux * add missing reference * update * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * update * minor updates * add gna item to yum and apt * add gna to get started page * update reference formatting * merge commit * add a troubleshooting issue * update * update * fix CVS-71846 Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * DOCS: fixed hardcoded links (#11100) * Fixes * Use links * applying reviewers comments to the Opt Guide (#11093) * applying reviewrs comments * fixed refs, more structuring (bold, bullets, etc) * refactoring tput/latency sections * next iteration (mostly latency), also brushed the auto-batching and other sections * updates sync/async images * common opts brushed * WIP tput redesigned * minor brushing of common and auto-batching * Tput fully refactored * fixed doc name in the link * moved int8 perf counters to the right section * fixed links * fixed broken quotes * fixed more links * add ref to the internals to the TOC * Added a note on the batch size Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com> * [80085] New images for docs (#11114) * change doc structure * fix manager tools * fix manager tools 3 step * fix manager tools 3 step * new img * new img for OV Runtime * fix steps * steps * fix intendents * change list * fix space * fix space * code snippets fix * change display * Benchmarks 2022 1 (#11130) * Minor fixes * Updates for 2022.1 * Edits according to the review * Edits according to review comments * Edits according to review comments * Edits according to review comments * Fixed table * Edits according to review comments * Removed config for Intel® Core™ i7-11850HE * Removed forward-tacotron-duration-prediction-241 graph * Added resnet-18-pytorch * Add info about Docker images in Deployment guide (#11136) * Renamed user guides (#11137) * fix screenshot (#11140) * More conservative recommendations on dynamic shapes usage in docs (#11161) * More conservative recommendations about using dynamic shapes * Duplicated statement from C++ part to Python part of reshape doc (no semantical changes) * Update ShapeInference.md (#11168) * Benchmarks 2022 1 updates (#11180) * Updated graphs * Quick fix for TODO in Dynamic Shapes article * Anchor link fixes * Fixed DM config (#11199) * DOCS: doxy sphinxtabs (#11027) * initial implementation of doxy sphinxtabs * fixes * fixes * fixes * fixes * fixes * WA for ignored visibility attribute * Fixes Co-authored-by: Sergey Lyalin <sergey.lyalin@intel.com> Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com> Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> Co-authored-by: Yuan Xu <yuan1.xu@intel.com> Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Ilya Naumov <ilya.naumov@intel.com> Co-authored-by: Evgenya Stepyreva <evgenya.stepyreva@intel.com>
10 KiB
Dynamic Shapes
@sphinxdirective
.. toctree:: :maxdepth: 1 :hidden:
openvino_docs_OV_UG_NoDynamicShapes
@endsphinxdirective
As it was demonstrated in the Changing Input Shapes article, there are models that support changing of input shapes before model compilation in Core::compile_model.
Reshaping models provides an ability to customize the model input shape for exactly that size that is required in the end application.
This article explains how the ability of model to reshape can further be leveraged in more dynamic scenarios.
When to Apply Dynamic Shapes
Conventional "static" model reshaping works well when it can be done once per many model inference calls with the same shape.
However, this approach doesn't perform efficiently if the input tensor shape is changed on every inference call: calling reshape() and compile_model() each time when a new size comes is extremely time-consuming.
A popular example would be an inference of natural language processing models (like BERT) with arbitrarily-sized input sequences that come from the user.
In this case, the sequence length cannot be predicted and may change every time you need to call inference.
Below, such dimensions that can be frequently changed are called dynamic dimensions.
When real shape of input is not known at compile_model time, that's the case when dynamic shapes should be considered.
Here are several examples of dimensions that can be naturally dynamic:
- Sequence length dimension for various sequence processing models, like BERT
- Spatial dimensions in segmentation and style transfer models
- Batch dimension
- Arbitrary number of detections in object detection models output
There are various tricks to address input dynamic dimensions through combining multiple pre-reshaped models and input data padding. The tricks are sensitive to model internals, do not always give optimal performance and cumbersome. Short overview of the methods you can find here. Apply those methods only if native dynamic shape API described in the following sections doesn't work for you or doesn't give desired performance.
The decision about using dynamic shapes should be based on proper benchmarking of real application with real data. That's because unlike statically shaped models, inference of dynamically shaped ones takes different inference time depending on input data shape or input tensor content. Also using the dynamic shapes can bring more overheads in memory and running time per each inference call depending on hardware plugin and model used.
Dynamic Shapes without Tricks
This section describes how to handle dynamically shaped models natively with OpenVINO Runtime API version 2022.1 and higher. There are three main parts in the flow that differ from static shapes:
- configure the model
- prepare data for inference
- read resulting data after inference
Configure the Model
To avoid the tricks mentioned in the previous section there is a way to directly specify one or multiple dimensions in the model inputs to be dynamic.
This is achieved with the same reshape method that is used for alternating static shape of inputs.
Dynamic dimensions are specified as -1 or ov::Dimension() instead of a positive number used for static dimensions:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:reshape_undefined
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py reshape_undefined @endsphinxtab
@endsphinxtabset
To simplify the code, the examples assume that the model has a single input and single output. However, there are no limitations on the number of inputs and outputs to apply dynamic shapes.
Undefined Dimensions "Out Of the Box"
Dynamic dimensions may appear in the input model without calling reshape. Many DL frameworks support undefined dimensions. If such a model is converted with Model Optimizer or read directly by Core::read_model, undefined dimensions are preserved. Such dimensions automatically treated as dynamic ones. So you don't need to call reshape if undefined dimensions are already configured in the original model or in the IR file.
If the input model has undefined dimensions that you are not going to change during the inference, it is recommended to set them to static values, using the same reshape method of the model.
From the API perspective any combination of dynamic and static dimensions can be configured.
Model Optimizer provides identical capability to reshape the model during the conversion, including specifying dynamic dimensions.
Use this capability to save time on calling reshape method in the end application.
To get information about setting input shapes using Model Optimizer, refer to Setting Input Shapes
Dimension Bounds
Besides marking a dimension just dynamic, you can also specify lower and/or upper bounds that define a range of allowed values for the dimension.
Bounds are coded as arguments for ov::Dimension:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:reshape_bounds
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py reshape_bounds @endsphinxtab @endsphinxtabset
Information about bounds gives opportunity for the inference plugin to apply additional optimizations. Using dynamic shapes assumes the plugins apply more loose optimization technique during model compilation It may require more time/memory for model compilation and inference. So providing any additional information like bounds can be beneficial. For the same reason it is not recommended to leave dimensions as undefined without the real need.
When specifying bounds, the lower bound is not so important as upper bound, because knowing of upper bound allows inference devices to more precisely allocate memory for intermediate tensors for inference and use lesser number of tuned kernels for different sizes. Precisely speaking benefits of specifying lower or upper bound is device dependent. Depending on the plugin specifying upper bounds can be required. For information about dynamic shapes support on different devices, see the [Features Support Matrix](@ref features_support_matrix).
If users known lower and upper bounds for dimension it is recommended to specify them even when plugin can execute model without the bounds.
Setting Input Tensors
Preparing model with the reshape method was the first step. The second step is passing a tensor with an appropriate shape to infer request. This is similar to regular steps, but now we can pass tensors with different shapes for the same executable model and even for the same inference request:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:set_input_tensor
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py set_input_tensor @endsphinxtab @endsphinxtabset
In the example above set_input_tensor is used to specify input tensors.
The real dimensions of the tensor is always static, because it is a concrete tensor and it doesn't have any dimension variations in contrast to model inputs.
Similar to static shapes, get_input_tensor can be used instead of set_input_tensor.
In contrast to static input shapes, when using get_input_tensor for dynamic inputs, set_shape method for the returned tensor should be called to define the shape and allocate memory.
Without doing that, the tensor returned by get_input_tensor is an empty tensor, it's shape is not initialized and memory is not allocated, because infer request doesn't have information about real shape you are going to feed.
Setting shape for input tensor is required when the corresponding input has at least one dynamic dimension regardless of bounds information.
The following example makes the same sequence of two infer request as the previous example but using get_input_tensor instead of set_input_tensor:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:get_input_tensor
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py get_input_tensor @endsphinxtab
@endsphinxtabset
Dynamic Shapes in Outputs
Examples above handle correctly case when dynamic dimensions in output may be implied by propagating of dynamic dimension from the inputs. For example, batch dimension in input shape is usually propagated through the whole model and appears in the output shape. The same is true for other dimensions, like sequence length for NLP models or spatial dimensions for segmentation models, that are propagated through the entire network.
Whether or not output has dynamic dimensions can be examined by querying output partial shape after model read or reshape. The same is applicable for inputs. For example:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:print_dynamic
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py print_dynamic @endsphinxtab
@endsphinxtabset
Appearing ? or ranges like 1..10 means there are dynamic dimensions in corresponding inputs or outputs.
Or more programmatically:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_dynamic_shapes.cpp ov_dynamic_shapes:detect_dynamic
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_dynamic_shapes.py detect_dynamic @endsphinxtab
@endsphinxtabset
If at least one dynamic dimension exists in output of the model, shape of the corresponding output tensor will be set as the result of inference call.
Before the first inference, memory for such a tensor is not allocated and has shape [0].
If user call set_output_tensor with pre-allocated tensor, the inference will call set_shape internally, and the initial shape is replaced by the really calculated shape.
So setting shape for output tensors in this case is useful only if you want to pre-allocate enough memory for output tensor, because Tensor's set_shape method will re-allocate memory only if new shape requires more storage.