DOCS-menu-recreate-structure-step5 (#14636)
port https://github.com/openvinotoolkit/openvino/pull/14637 Shift to separating the Workflow section, including moving Run and Optimize inference inside Deploy locally change several article and menu titles minor additional restructuring minor content tweaks remove optimization introduction (may be brought back in parts later) * several link fixes * additional link fixes
This commit is contained in:
parent
0cf95d26bf
commit
2f95de3239
@ -6,29 +6,25 @@
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
Run Inference <openvino_docs_OV_UG_OV_Runtime_User_Guide>
|
||||
Inference Optimization <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>
|
||||
Deploy via OpenVINO Runtime <openvino_deployment_guide>
|
||||
Deploy via Model Serving <ovms_what_is_openvino_model_server>
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
Deploy Locally <openvino_deployment_guide>
|
||||
Deploy Using Model Server <ovms_what_is_openvino_model_server>
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
Once you have a model that meets both OpenVINO™ and your requirements, you can choose among several ways of deploying it with your application. The two default options are:
|
||||
Once you have a model that meets both OpenVINO™ and your requirements, you can choose how to deploy it with your application.
|
||||
|
||||
@sphinxdirective
|
||||
.. panels::
|
||||
|
||||
`Deploying locally <openvino_deployment_guide>`_
|
||||
`Deploy Locally <openvino_deployment_guide>`_
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Local deployment simply uses OpenVINO Runtime installed on the device. It utilizes resources available to the system.
|
||||
Local deployment uses OpenVINO Runtime installed on the device. It utilizes resources available to the system and provides the quickest way of launching inference.
|
||||
|
||||
---
|
||||
|
||||
`Deploying by Model Serving <ovms_what_is_openvino_model_server>`_
|
||||
`Deploy by Model Serving <ovms_what_is_openvino_model_server>`_
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Deployment via OpenVINO Model Server allows the device to connect to the server set up remotely. This way inference uses external resources instead of the ones provided by the device itself.
|
||||
@ -36,6 +32,5 @@ Once you have a model that meets both OpenVINO™ and your requirements, you can
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
> **NOTE**: Note that [running inference in OpenVINO Runtime](../OV_Runtime_UG/openvino_intro.md) is the most basic form of deployment. Before moving forward, make sure you know how to create a proper Inference configuration. Inference may be additionally optimized, as described in the [Inference Optimization section](../optimization_guide/dldt_deployment_optimization_guide.md).
|
||||
|
||||
Apart from the default deployment options, you may also [deploy your application for the TensorFlow framework with OpenVINO Integration](./openvino_ecosystem_ovtf.md).
|
@ -6,9 +6,9 @@
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
ovsa_get_started
|
||||
ovtf_integration
|
||||
ote_documentation
|
||||
ovsa_get_started
|
||||
openvino_inference_engine_tools_compile_tool_README
|
||||
openvino_docs_tuning_utilities
|
||||
workbench_docs_Workbench_DG_Introduction
|
||||
|
53
docs/Documentation/openvino_workflow.md
Normal file
53
docs/Documentation/openvino_workflow.md
Normal file
@ -0,0 +1,53 @@
|
||||
# OPENVINO Workflow {#openvino_workflow}
|
||||
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
Model Preparation <openvino_docs_model_processing_introduction>
|
||||
Model Optimization and Compression <openvino_docs_model_optimization_guide>
|
||||
Deployment <openvino_docs_deployment_guide_introduction>
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
|
||||
THIS IS A PAGE ABOUT THE WORKFLOW
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<div class="section" id="welcome-to-openvino-toolkit-s-documentation">
|
||||
|
||||
<link rel="stylesheet" type="text/css" href="_static/css/homepage_style.css">
|
||||
<div style="clear:both;"> </div>
|
||||
<div id="HP_flow-container">
|
||||
<div class="HP_flow-btn">
|
||||
<a href="https://docs.openvino.ai/latest/openvino_docs_model_processing_introduction.html">
|
||||
<img src="_static/images/OV_flow_model_hvr.svg" alt="link to model processing introduction" />
|
||||
</a>
|
||||
</div>
|
||||
<div class="HP_flow-arrow" >
|
||||
<img src="_static/images/OV_flow_arrow.svg" alt="" />
|
||||
</div>
|
||||
<div class="HP_flow-btn">
|
||||
<a href="https://docs.openvino.ai/latest/openvino_docs_deployment_optimization_guide_dldt_optimization_guide.html">
|
||||
<img src="_static/images/OV_flow_optimization_hvr.svg" alt="link to an optimization guide" />
|
||||
</a>
|
||||
</div>
|
||||
<div class="HP_flow-arrow" >
|
||||
<img src="_static/images/OV_flow_arrow.svg" alt="" />
|
||||
</div>
|
||||
<div class="HP_flow-btn">
|
||||
<a href="https://docs.openvino.ai/latest/openvino_docs_deployment_guide_introduction.html">
|
||||
<img src="_static/images/OV_flow_deployment_hvr.svg" alt="link to deployment introduction" />
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
@endsphinxdirective
|
@ -1,4 +1,4 @@
|
||||
# Deploying Your Applications with OpenVINO™ {#openvino_deployment_guide}
|
||||
# Deploy via OpenVINO Runtime {#openvino_deployment_guide}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
@ -6,14 +6,15 @@
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
openvino_docs_install_guides_deployment_manager_tool
|
||||
openvino_docs_deploy_local_distribution
|
||||
Run Inference <openvino_docs_OV_UG_OV_Runtime_User_Guide>
|
||||
Optimize Inference <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>
|
||||
Deploy Application with Deployment Manager <openvino_docs_install_guides_deployment_manager_tool>
|
||||
Local Distribution Libraries <openvino_docs_deploy_local_distribution>
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
Once [OpenVINO™ application development](../integrate_with_your_application.md) has been finished, application developers usually need to deploy their applications to end users. There are several ways to achieve that. This section will explain how you can deploy locally, using OpenVINO Runtime.
|
||||
> **NOTE**: Note that [running inference in OpenVINO Runtime](../openvino_intro.md) is the most basic form of deployment. Before moving forward, make sure you know how to create a proper Inference configuration and [develop your application properly](../integrate_with_your_application.md)
|
||||
|
||||
> **NOTE**: Note that [running inference in OpenVINO Runtime](../openvino_intro.md) is the most basic form of deployment. Before moving forward, make sure you know how to create a proper Inference configuration.
|
||||
|
||||
## Local Deployment Options
|
||||
|
||||
|
@ -12,9 +12,7 @@
|
||||
openvino_docs_Runtime_Inference_Modes_Overview
|
||||
openvino_docs_OV_UG_Working_with_devices
|
||||
openvino_docs_OV_UG_ShapeInference
|
||||
openvino_docs_OV_UG_Preprocessing_Overview
|
||||
openvino_docs_OV_UG_DynamicShapes
|
||||
openvino_docs_OV_UG_Performance_Hints
|
||||
openvino_docs_OV_UG_network_state_intro
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -9,7 +9,7 @@ Previously, a certain level of automatic configuration was the result of the *de
|
||||
The hints, in contrast, respect the actual model, so the parameters for optimal throughput are calculated for each model individually (based on its compute versus memory bandwidth requirements and capabilities of the device).
|
||||
|
||||
## Performance Hints: Latency and Throughput
|
||||
As discussed in the [Optimization Guide](../optimization_guide/dldt_optimization_guide.md) there are a few different metrics associated with inference speed.
|
||||
As discussed in the [Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md) there are a few different metrics associated with inference speed.
|
||||
Throughput and latency are some of the most widely used metrics that measure the overall performance of an application.
|
||||
|
||||
Therefore, in order to ease the configuration of the device, OpenVINO offers two dedicated hints, namely `ov::hint::PerformanceMode::THROUGHPUT` and `ov::hint::PerformanceMode::LATENCY`.
|
||||
|
@ -298,5 +298,5 @@ To enable denormals optimization in the application, the `denormals_optimization
|
||||
|
||||
## Additional Resources
|
||||
* [Supported Devices](Supported_Devices.md)
|
||||
* [Optimization guide](@ref openvino_docs_optimization_guide_dldt_optimization_guide)
|
||||
* [Optimization guide](@ref openvino_docs_deployment_optimization_guide_dldt_optimization_guide)
|
||||
* [СPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/wiki/CPUPluginDevelopersDocs)
|
||||
|
@ -309,5 +309,5 @@ Since OpenVINO relies on the OpenCL kernels for the GPU implementation, many gen
|
||||
|
||||
## Additional Resources
|
||||
* [Supported Devices](Supported_Devices.md)
|
||||
* [Optimization guide](@ref openvino_docs_optimization_guide_dldt_optimization_guide)
|
||||
* [Optimization guide](@ref openvino_docs_deployment_optimization_guide_dldt_optimization_guide)
|
||||
* [GPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/wiki/GPUPluginDevelopersDocs)
|
||||
|
@ -23,11 +23,11 @@
|
||||
All of the performance benchmarks are generated using the
|
||||
open-source tool within the Intel® Distribution of OpenVINO™ toolkit
|
||||
called `benchmark_app`. This tool is available
|
||||
`for C++ apps <openvino_inference_engine_samples_benchmark_app_README>`_
|
||||
`for C++ apps <http://openvino-doc.iotg.sclab.intel.com/2022.3/openvino_inference_engine_samples_benchmark_app_README.html>`_
|
||||
as well as
|
||||
`for Python apps<openvino_inference_engine_tools_benchmark_tool_README>`_.
|
||||
`for Python apps <http://openvino-doc.iotg.sclab.intel.com/2022.3/openvino_inference_engine_tools_benchmark_tool_README.html>`_.
|
||||
|
||||
For a simple instruction on testing performance, see the `Getting Performance Numbers Guide<openvino_docs_MO_DG_Getting_Performance_Numbers>`
|
||||
For a simple instruction on testing performance, see the `Getting Performance Numbers Guide <http://openvino-doc.iotg.sclab.intel.com/2022.3/openvino_docs_MO_DG_Getting_Performance_Numbers.html>`_.
|
||||
|
||||
.. dropdown:: What image sizes are used for the classification network models?
|
||||
|
||||
@ -107,16 +107,16 @@
|
||||
.. dropdown:: Where can I purchase the specific hardware used in the benchmarking?
|
||||
|
||||
Intel partners with vendors all over the world. For a list of Hardware Manufacturers, see the
|
||||
[Intel® AI: In Production Partners & Solutions Catalog](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html).
|
||||
`Intel® AI: In Production Partners & Solutions Catalog <https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html>`_.
|
||||
For more details, see the [Supported Devices](../OV_Runtime_UG/supported_plugins/Supported_Devices.md)
|
||||
documentation. Before purchasing any hardware, you can test and run
|
||||
models remotely, using [Intel® DevCloud for the Edge](http://devcloud.intel.com/edge/).
|
||||
models remotely, using `Intel® DevCloud for the Edge <http://devcloud.intel.com/edge/>`_.
|
||||
|
||||
.. dropdown:: How can I optimize my models for better performance or accuracy?
|
||||
|
||||
Set of guidelines and recommendations to optimize models are available in the
|
||||
[optimization guide](../optimization_guide/dldt_optimization_guide.md).
|
||||
Join the conversation in the [Community Forum](https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit)
|
||||
[optimization guide](../optimization_guide/dldt_deployment_optimization_guide.md).
|
||||
Join the conversation in the `Community Forum <https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit>`
|
||||
for further support.
|
||||
|
||||
.. dropdown:: Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support?
|
||||
|
@ -7,21 +7,11 @@
|
||||
:hidden:
|
||||
|
||||
API Reference <api/api_reference>
|
||||
Model Preparation <openvino_docs_model_processing_introduction>
|
||||
Model Optimization and Compression <openvino_docs_model_optimization_guide>
|
||||
Deployment <openvino_docs_deployment_guide_introduction>
|
||||
Tool Ecosystem <openvino_ecosystem>
|
||||
OpenVINO Extensibility <openvino_docs_Extensibility_UG_Intro>
|
||||
Media Processing and CV Libraries <media_processing_cv_libraries>
|
||||
OpenVINO™ Security <openvino_docs_security_guide_introduction>
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:hidden:
|
||||
|
||||
openvino_docs_optimization_guide_dldt_optimization_guide
|
||||
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
||||
|
@ -40,7 +40,7 @@ A typical workflow with OpenVINO is shown below.
|
||||
<img src="_static/images/OV_flow_arrow.svg" alt="" />
|
||||
</div>
|
||||
<div class="HP_flow-btn">
|
||||
<a href="https://docs.openvino.ai/latest/openvino_docs_optimization_guide_dldt_optimization_guide.html">
|
||||
<a href="https://docs.openvino.ai/latest/openvino_docs_deployment_optimization_guide_dldt_optimization_guide.html">
|
||||
<img src="_static/images/OV_flow_optimization_hvr.svg" alt="link to an optimization guide" />
|
||||
</a>
|
||||
</div>
|
||||
@ -122,6 +122,7 @@ Get Started
|
||||
|
||||
GET STARTED <get_started>
|
||||
LEARN OPENVINO <learn_openvino>
|
||||
OPENVINO WORKFLOW <openvino_workflow>
|
||||
DOCUMENTATION <documentation>
|
||||
MODEL ZOO <model_zoo>
|
||||
RESOURCES <resources>
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Runtime Inference Optimization {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide}
|
||||
# Optimize Inference {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
@ -7,9 +7,11 @@
|
||||
:hidden:
|
||||
|
||||
openvino_docs_deployment_optimization_guide_common
|
||||
openvino_docs_OV_UG_Performance_Hints
|
||||
openvino_docs_deployment_optimization_guide_latency
|
||||
openvino_docs_deployment_optimization_guide_tput
|
||||
openvino_docs_deployment_optimization_guide_tput_advanced
|
||||
openvino_docs_OV_UG_Preprocessing_Overview
|
||||
openvino_docs_deployment_optimization_guide_internals
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,4 +1,4 @@
|
||||
## Optimizing for the Latency {#openvino_docs_deployment_optimization_guide_latency}
|
||||
## Optimizing for Latency {#openvino_docs_deployment_optimization_guide_latency}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
|
@ -1,36 +0,0 @@
|
||||
# Introduction to Performance Optimization {#openvino_docs_optimization_guide_dldt_optimization_guide}
|
||||
Even though inference performance should be defined as a combination of many factors, including accuracy and efficiency, it is most often described as the speed of execution. As the rate with which the model processes live data, it is based on two fundamentally interconnected metrics: latency and throughput.
|
||||
|
||||
|
||||
|
||||

|
||||
|
||||
**Latency** measures inference time (in ms) required to process a single input. When it comes to executing multiple inputs simultaneously (for example, via batching), the overall throughput (inferences per second, or frames per second, FPS, in the specific case of visual processing) is usually more of a concern.
|
||||
**Throughput** is calculated by dividing the number of inputs that were processed by the processing time.
|
||||
|
||||
## End-to-End Application Performance
|
||||
It is important to separate the "pure" inference time of a neural network and the end-to-end application performance. For example, data transfers between the host and a device may unintentionally affect the performance when a host input tensor is processed on the accelerator such as dGPU.
|
||||
|
||||
Similarly, the input-preprocessing contributes significantly to the inference time. As described in the [getting performance numbers](../MO_DG/prepare_model/Getting_performance_numbers.md) section, when evaluating *inference* performance, one option is to measure all such items separately.
|
||||
For the **end-to-end scenario**, though, consider image pre-processing with OpenVINO and the asynchronous execution as a way to lessen the communication costs (like data transfers). For more details, see the [general optimizations guide](./dldt_deployment_optimization_common.md).
|
||||
|
||||
Another specific case is **first-inference latency** (for example, when a fast application start-up is required), where the resulting performance may be well dominated by the model loading time. [Model caching](../OV_Runtime_UG/Model_caching_overview.md) may be considered as a way to improve model loading/compilation time.
|
||||
|
||||
Finally, **memory footprint** restriction is another possible concern when designing an application. While this is a motivation for the use of the *model* optimization techniques, keep in mind that the throughput-oriented execution is usually much more memory consuming. For more details, see the [Runtime Inference Optimizations guide](../optimization_guide/dldt_deployment_optimization_guide.md).
|
||||
|
||||
|
||||
> **NOTE**: To get performance numbers for OpenVINO, along with the tips on how to measure and compare it with a native framework, see the [Getting performance numbers article](../MO_DG/prepare_model/Getting_performance_numbers.md).
|
||||
|
||||
## Improving Performance: Model vs Runtime Optimizations
|
||||
|
||||
> **NOTE**: First, make sure that your model can be successfully inferred with OpenVINO Runtime.
|
||||
|
||||
There are two primary optimization approaches to improving inference performance with OpenVINO: model- and runtime-level optimizations. They are **fully compatible** and can be done independently.
|
||||
|
||||
- **Model optimizations** include model modifications, such as quantization, pruning, optimization of preprocessing, etc. For more details, refer to this [document](./model_optimization_guide.md).
|
||||
- The model optimizations directly improve the inference time, even without runtime parameters tuning (described below).
|
||||
|
||||
- **Runtime (Deployment) optimizations** includes tuning of model *execution* parameters. Fore more details, see the [Runtime Inference Optimizations guide](../optimization_guide/dldt_deployment_optimization_guide.md).
|
||||
|
||||
## Performance benchmarks
|
||||
A wide range of public models for estimating performance and comparing the numbers (measured on various supported devices) are available in the [Performance benchmarks section](../benchmarks/performance_benchmarks.md).
|
@ -1,4 +1,4 @@
|
||||
# Tutorials {#tutorials}
|
||||
# Interactive Tutorials (Python) {#tutorials}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
|
@ -53,11 +53,8 @@ Note that the benchmark_app usually produces optimal performance for any device
|
||||
./benchmark_app -m <model> -i <input> -d CPU
|
||||
```
|
||||
|
||||
But it is still may be sub-optimal for some cases, especially for very small networks. More details can read in [Performance Optimization Guide](../../../docs/optimization_guide/dldt_optimization_guide.md).
|
||||
|
||||
As explained in the [Performance Optimization Guide](../../../docs/optimization_guide/dldt_optimization_guide.md) section, for all devices, including new [MULTI device](../../../docs/OV_Runtime_UG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model.
|
||||
Also if latency of the CPU inference on the multi-socket machines is of concern, please refer to the same
|
||||
[Performance Optimization Guide](../../../docs/optimization_guide/dldt_optimization_guide.md).
|
||||
It still may be sub-optimal for some cases, especially for very small networks. For all devices, including the [MULTI device](../../../docs/OV_Runtime_UG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model. If latency of the CPU inference on the multi-socket machines is of concern.
|
||||
These, as well as other topics are explained in the [Performance Optimization Guide](../../../docs/optimization_guide/dldt_deployment_optimization_guide.md).
|
||||
|
||||
Running the application with the `-h` option yields the following usage message:
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user