DOCS-menu_recreate_structure_step4 (#14583)

documentation section tweaks
create deployment section for further tweaks
API reference moved
compile tool moved
This commit is contained in:
Karol Blaszczak 2022-12-13 08:15:15 +01:00 committed by GitHub
parent 74f2128b3a
commit a007dcd878
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 55 additions and 26 deletions

View File

@ -0,0 +1,41 @@
# OpenVINO™ Deployment {#openvino_docs_deployment_guide_introduction}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
Run Inference <openvino_docs_OV_UG_OV_Runtime_User_Guide>
Inference Optimization <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>
.. toctree::
:maxdepth: 1
:hidden:
Deploy Locally <openvino_deployment_guide>
Deploy Using Model Server <ovms_what_is_openvino_model_server>
Once you have a model that meets both OpenVINO™ and your requirements, you can choose among several ways of deploying it with your application. The two default options are:
.. panels::
`Deploying locally <openvino_deployment_guide>`_
^^^^^^^^^^^^^^
Local deployment simply uses OpenVINO Runtime installed on the device. It utilizes resources available to the system.
---
`Deploying by Model Serving <ovms_what_is_openvino_model_server>`_
^^^^^^^^^^^^^^
Deployment via OpenVINO Model Server allows the device to connect to the server set up remotely. This way inference uses external resources instead of the ones provided by the device itself.
@endsphinxdirective
> **NOTE**: Note that [running inference in OpenVINO Runtime](../OV_Runtime_UG/openvino_intro.md) is the most basic form of deployment. Before moving forward, make sure you know how to create a proper Inference configuration. Inference may be additionally optimized, as described in the [Inference Optimization section](../optimization_guide/dldt_deployment_optimization_guide.md).
Apart from the default deployment options, you may also [deploy your application for the TensorFlow framework with OpenVINO Integration](./openvino_ecosystem_ovtf.md).

View File

@ -6,13 +6,13 @@
:maxdepth: 1
:hidden:
ovms_what_is_openvino_model_server
ovsa_get_started
ovtf_integration
ote_documentation
openvino_inference_engine_tools_compile_tool_README
openvino_docs_tuning_utilities
workbench_docs_Workbench_DG_Introduction
@endsphinxdirective

View File

@ -6,37 +6,25 @@
:maxdepth: 1
:hidden:
openvino_2_0_transition_guide
API Reference <api/api_reference>
Model Preparation <openvino_docs_model_processing_introduction>
Model Optimization and Compression <openvino_docs_model_optimization_guide>
Run Inference <openvino_docs_OV_UG_OV_Runtime_User_Guide>
Deploy Locally <openvino_deployment_guide>
Deployment <openvino_docs_deployment_guide_introduction>
Tool Ecosystem <openvino_ecosystem>
OpenVINO Extensibility <openvino_docs_Extensibility_UG_Intro>
Media Processing and CV Libraries <media_processing_cv_libraries>
OpenVINO™ Security <openvino_docs_security_guide_introduction>
.. toctree::
:maxdepth: 1
:caption: Running Inference
:hidden:
openvino_inference_engine_tools_compile_tool_README
.. toctree::
:maxdepth: 1
:caption: Optimization and Performance
:hidden:
openvino_docs_optimization_guide_dldt_optimization_guide
openvino_docs_deployment_optimization_guide_dldt_optimization_guide
openvino_docs_tuning_utilities
@endsphinxdirective
This section provides reference documents that guide you through the OpenVINO toolkit workflow, from preparing models, optimizing them, to deploying them in your own deep learning applications.

View File

@ -8,10 +8,11 @@
Interactive Tutorials (Python) <tutorials>
Sample Applications (Python & C++) <openvino_docs_OV_UG_Samples_Overview>
OpenVINO API 2.0 Transition <openvino_2_0_transition_guide>
@endsphinxdirective
This section will help you get a hands-on experience with OpenVINO even if you are just starting
to learn what OpenVINO is and how it works. It includes various types of learning materials,
to learn what OpenVINO is and how it works. It includes various types of learning materials
accommodating different learning needs, which means you should find it useful if you are a beginning,
as well as an experienced user.

View File

@ -1,4 +1,4 @@
# Runtime Inference Optimizations {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide}
# Runtime Inference Optimization {#openvino_docs_deployment_optimization_guide_dldt_optimization_guide}
@sphinxdirective
@ -14,7 +14,7 @@
@endsphinxdirective
Runtime optimizations, or deployment optimizations, focus on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
`ov::hint::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model.
Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.
@ -22,12 +22,11 @@ Therefore, optimization should start with defining the use case. For example, if
It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data.
For more information on this topic, see the following articles:
* [feature support by device](@ref features_support_matrix),
* [Inputs Pre-processing with the OpenVINO](@ref inputs_pre_processing).
* [Async API](@ref async_api).
* [The 'get_tensor' Idiom](@ref tensor_idiom).
* For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md).
* [feature support by device](@ref features_support_matrix)
* [Inputs Pre-processing with the OpenVINO](@ref inputs_pre_processing)
* [Async API](@ref async_api)
* [The 'get_tensor' Idiom](@ref tensor_idiom)
* For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md)
See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations**