[DOCS] benchmarks new page (#16620)

2023-04-17 16:43:57 +02:00
parent 25826bfe7d
commit 1471a6e8de
6 changed files with 154 additions and 187 deletions
--- a/docs/_static/images/performance_benchmarks_ovms_02.png
+++ b/docs/_static/images/performance_benchmarks_ovms_02.png
--- a/docs/benchmarks/performance_benchmarks.md
+++ b/docs/benchmarks/performance_benchmarks.md
@@ -6,22 +6,151 @@
   :maxdepth: 1
   :hidden:

-   openvino_docs_performance_benchmarks_openvino
+   openvino_docs_performance_benchmarks_faq
+   openvino_docs_performance_int8_vs_fp32
+   Performance Data Spreadsheet (download xlsx) <https://docs.openvino.ai/2022.3/_static/benchmarks_files/OV-2022.3-Performance-Data.xlsx>
   openvino_docs_MO_DG_Getting_Performance_Numbers


-The `Intel® Distribution of OpenVINO™ toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__ 
-helps accelerate deep learning inference across a variety of Intel® processors and accelerators.
+This page presents benchmark results for `Intel® Distribution of OpenVINO™ toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__ 
+and :doc:`OpenVINO Model Server <ovms_what_is_openvino_model_server>`, for a representative selection of public neural networks and Intel® devices.
+The results may help you decide which hardware to use in your applications or plan AI workload for the hardware you have already implemented in your solutions. 
+Click the buttons below to see the chosen benchmark data.

-The benchmark results presented here demonstrate high performance gains on several public neural networks on multiple Intel® CPUs, 
-GPUs, and GNAs covering a broad performance range. The results may be helpful when deciding which hardware is best for your 
-applications or to plan AI workload on the Intel computing already included in your solutions.
+.. grid:: 1 1 2 2
+   :gutter: 4

-Benchmarks are available for:
+   .. grid-item::

-* :doc:`Intel® Distribution of OpenVINO™ toolkit <openvino_docs_performance_benchmarks_openvino>`.
+      .. button-link:: #
+         :class: ov-toolkit-benchmark-results
+         :color: primary
+         :outline:
+         :expand:
+
+         :material-regular:`bar_chart;1.4em` OpenVINO Benchmark Graphs
+
+   .. grid-item::
+
+      .. button-link:: #
+         :class: ovms-toolkit-benchmark-results
+         :color: primary
+         :outline:
+         :expand:
+   
+         :material-regular:`bar_chart;1.4em` OVMS Benchmark Graphs
+
+
+For a successful deep learning inference application, the following four key metrics need to be considered: 
+
+
+
+.. tab:: :material-regular:`keyboard_double_arrow_right;1.4em` Throughput
+
+   Measures the number of inferences delivered within a latency threshold 
+   (for example, number of Frames Per Second - FPS). When deploying a system with 
+   deep learning inference, select the throughput that delivers the best trade-off 
+   between latency and power for the price and performance that meets your requirements.
+
+.. tab:: :material-regular:`attach_money;1.4em` Value
+
+   While throughput is important, what is more critical in edge AI deployments is 
+   the performance efficiency or performance-per-cost. Application performance in 
+   throughput per dollar of system cost is the best measure of value. The value KPI is 
+   calculated as “Throughput measured as inferences per second / price of inference engine”. 
+   This means for a 2 socket system 2x the price of a CPU is used. Prices are as per 
+   date of benchmarking and sources can be found as links in the Hardware Platforms (PDF) description below.
+
+.. tab:: :material-regular:`flash_on;1.4em` Efficiency
+
+   System power is a key consideration from the edge to the data center. When selecting 
+   deep learning solutions, power efficiency (throughput/watt) is a critical factor to consider. 
+   Intel designs provide excellent power efficiency for running deep learning workloads. 
+   The efficiency KPI is calculated as “Throughput measured as inferences per second / TDP of 
+   inference engine”. This means for a 2 socket system 2x the power dissipation (TDP) of a CPU is used. 
+   TDP-values are as per date of benchmarking and sources can be found as links in the Hardware Platforms (PDF) description below.
+
+.. tab:: :material-regular:`hourglass_empty;1.4em` Latency
+
+   This measures the synchronous execution of inference requests and is reported in milliseconds. 
+   Each inference request (for example: preprocess, infer, postprocess) is allowed to complete before 
+   the next is started. This performance metric is relevant in usage scenarios where a single image 
+   input needs to be acted upon as soon as possible. An example would be the healthcare sector where 
+   medical personnel only request analysis of a single ultra sound scanning image or in real-time or 
+   near real-time applications for example an industrial robot's response to actions in its environment 
+   or obstacle avoidance for autonomous vehicles.
+
+
+Platforms, Configurations, Methodology
+###########################################################
+
+For a listing of all platforms and configurations used for testing, refer to the following:
+
+.. grid:: 1 1 2 2
+   :gutter: 4
+
+   .. grid-item::
+
+      .. button-link:: _static/benchmarks_files/platform_list_22.3.pdf
+         :color: primary
+         :outline:
+         :expand:
+
+         :material-regular:`download;1.5em` Click for Hardware Platforms    [PDF]
+      
+      .. button-link:: _static/benchmarks_files/OV-2022.3-system-info-detailed.xlsx
+         :color: primary
+         :outline:
+         :expand:
+         
+         :material-regular:`download;1.5em` Click for Configuration Details [XLSX]
+
+
+.. the files above need to be updated with OVMS !!!
+
+
+
+The OpenVINO benchmark setup includes a single system with OpenVINO™, as well as the benchmark application installed.
+It measures the time spent on actual inference (excluding any pre or post processing) and then reports on the inferences 
+per second (or Frames Per Second). 
+
+OpenVINO™ Model Server (OVMS) employs the Intel® Distribution of OpenVINO™ toolkit runtime libraries and exposes a set of 
+models via a convenient inference API over gRPC or HTTP/REST. Its benchmark results are measured with the configuration of
+multiple-clients-single-server, using two hardware platforms connected by ethernet. Network bandwidth depends on both, platforms 
+and models under investigation. It is set not to be a bottleneck for workload intensity. The connection is dedicated 
+only to measuring performance. 
+
+.. dropdown:: See more details about OVMS benchmark setup
+
+   The benchmark setup for OVMS consists of four main parts:
+   
+   .. image:: _static/images/performance_benchmarks_ovms_02.png
+      :alt: OVMS Benchmark Setup Diagram
+
+   * **OpenVINO™ Model Server** is launched as a docker container on the server platform and it listens (and answers on) 
+     requests from clients. OpenVINO™ Model Server is run on the same machine as the OpenVINO™ toolkit benchmark application 
+     in corresponding benchmarking. Models served by OpenVINO™ Model Server are located in a local file system mounted into 
+     the docker container. The OpenVINO™ Model Server instance communicates with other components via ports over a dedicated docker network.
+   
+   * **Clients** are run in separated physical machine referred to as client platform. Clients are implemented in Python3 
+     programming language based on TensorFlow* API and they work as parallel processes. Each client waits for a response from OpenVINO™ 
+     Model Server before it will send a new next request. The role played by the clients is also verification of responses.
+   
+   * **Load balancer** works on the client platform in a docker container. HAProxy is used for this purpose. Its main role is 
+     counting of requests forwarded from clients to OpenVINO™ Model Server, estimating its latency, and sharing this information by 
+     Prometheus service. The reason of locating the load balancer on the client site is to simulate real life scenario that includes 
+     impact of physical network on reported metrics.
+   
+   * **Execution Controller** is launched on the client platform. It is responsible for synchronization of the whole measurement process, 
+     downloading metrics from the load balancer, and presenting the final report of the execution.
+
+
+
+Test performance yourself
+####################################

 You can also test performance for your system yourself, following the guide on :doc:`getting performance numbers <openvino_docs_MO_DG_Getting_Performance_Numbers>`.
+
 Performance of a particular application can also be evaluated virtually using `Intel® DevCloud for the Edge <https://devcloud.intel.com/edge/>`__. 
 It is a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. 
 To learn more about it, visit `the website <https://www.intel.com/content/www/us/en/developer/tools/devcloud/edge/overview.html>`__ 
@@ -29,6 +158,21 @@ or `create an account <https://www.intel.com/content/www/us/en/secure/forms/devc



+Disclaimers
+####################################
+
+* Intel® Distribution of OpenVINO™ toolkit performance results are based on release 2022.3, as of December 13, 2022.
+
+* OpenVINO Model Server performance results are based on release 2022.3, as of December 13, 2022.
+
+The results may not reflect all publicly available updates. Intel technologies’ features and benefits depend on system configuration 
+and may require enabled hardware, software, or service activation. Learn more at intel.com, or from the OEM or retailer. 
+
+See configuration disclosure for details. No product can be absolutely secure.
+Performance varies by use, configuration and other factors. Learn more at `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
+Your costs and results may vary.
+Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
+
+
@endsphinxdirective

-
--- a/docs/benchmarks/performance_benchmarks_openvino.md
+++ b/docs/benchmarks/performance_benchmarks_openvino.md
@@ -1,88 +0,0 @@
-# Intel® Distribution of OpenVINO™ toolkit Benchmark Results {#openvino_docs_performance_benchmarks_openvino}
-
-@sphinxdirective
-.. toctree::
-   :maxdepth: 1
-   :hidden:
-
-   openvino_docs_performance_benchmarks_faq
-   openvino_docs_performance_int8_vs_fp32
-   Performance Data Spreadsheet (download xlsx) <https://docs.openvino.ai/2022.3/_static/benchmarks_files/OV-2022.3-Performance-Data.xlsx>
-
-
-Click the "Benchmark Graphs" button to see the OpenVINO™ benchmark graphs. Select the models, the hardware platforms (CPU SKUs), 
-precision and performance index from the lists and click the “Build Graphs” button.
-
-.. button-link:: #
-   :class: ov-toolkit-benchmark-results
-   :color: primary
-   :outline:
-   
-   :material-regular:`bar_chart;1.4em` Benchmark Graphs
-
-
-Measuring inference performance involves many variables and is extremely use-case and application dependent. 
-Below are four parameters for measurements, which are key elements to consider for a successful deep learning inference application:
-
-
-.. tab:: :material-regular:`keyboard_double_arrow_right;1.4em` Throughput
-
-   Measures the number of inferences delivered within a latency threshold (for example, number of Frames Per Second - FPS). When deploying a system with deep learning inference, select the throughput that delivers the best trade-off between latency and power for the price and performance that meets your requirements.
-
-.. tab:: :material-regular:`attach_money;1.4em` Value
-
-   While throughput is important, what is more critical in edge AI deployments is the performance efficiency or performance-per-cost. Application performance in throughput per dollar of system cost is the best measure of value. The value KPI is calculated as “Throughput measured as inferences per second / price of inference engine”. This means for a 2 socket system 2x the price of a CPU is used. Prices are as per date of benchmarking and sources can be found as links in the Hardware Platforms (PDF) description below.
-
-.. tab:: :material-regular:`flash_on;1.4em` Efficiency
-
-   System power is a key consideration from the edge to the data center. When selecting deep learning solutions, power efficiency (throughput/watt) is a critical factor to consider. Intel designs provide excellent power efficiency for running deep learning workloads. The efficiency KPI is calculated as “Throughput measured as inferences per second / TDP of inference engine”. This means for a 2 socket system 2x the power dissipation (TDP) of a CPU is used. TDP-values are as per date of benchmarking and sources can be found as links in the Hardware Platforms (PDF) description below.
-
-.. tab:: :material-regular:`hourglass_empty;1.4em` Latency
-
-   This measures the synchronous execution of inference requests and is reported in milliseconds. Each inference request (for example: preprocess, infer, postprocess) is allowed to complete before the next is started. This performance metric is relevant in usage scenarios where a single image input needs to be acted upon as soon as possible. An example would be the healthcare sector where medical personnel only request analysis of a single ultra sound scanning image or in real-time or near real-time applications for example an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
-
-
-
-Platform & Configurations
-####################################
-
-For a listing of all platforms and configurations used for testing, refer to the following:
-
-.. button-link:: _static/benchmarks_files/platform_list_22.3.pdf
-   :color: primary
-   :outline:
-
-   :material-regular:`download;1.5em` Click for Hardware Platforms [PDF]
-
-.. button-link:: _static/benchmarks_files/OV-2022.3-system-info-detailed.xlsx
-   :color: primary
-   :outline:
-
-   :material-regular:`download;1.5em` Click for Configuration Details [XLSX]
-
-
-This benchmark setup includes a single machine on which both the benchmark application and the OpenVINO™ installation reside. The presented performance benchmark numbers are based on the release 2022.3 of the Intel® Distribution of OpenVINO™ toolkit.
-The benchmark application loads the OpenVINO™ Runtime and executes inferences on the specified hardware (CPU, GPU or GNA). 
-It measures the time spent on actual inference (excluding any pre or post processing) and then reports on the inferences per second (or Frames Per Second). 
-
-
-Disclaimers
-####################################
-
-Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2022.3.
-
-Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of December 13, 2022 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.
-
-Performance varies by use, configuration and other factors. Learn more at `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
-
-Your costs and results may vary.
-
-Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
-
-© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
-
-
-
-@endsphinxdirective
-
-
--- a/docs/benchmarks/performance_benchmarks_ovms.md
+++ b/docs/benchmarks/performance_benchmarks_ovms.md
@@ -1,86 +0,0 @@
-@sphinxdirective
-:orphan:
-@endsphinxdirective
-# OpenVINO™ Model Server Benchmark Results {#openvino_docs_performance_benchmarks_ovms}
-
-
-@sphinxdirective
-Click the "Benchmark Graphs" button to see the OpenVINO™ benchmark graphs. Select the models, the hardware platforms (CPU SKUs), 
-precision and performance index from the lists and click the “Build Graphs” button.
-
-.. button-link:: #
-   :class: ov-toolkit-benchmark-results
-   :color: primary
-   :outline:
-   
-   :material-regular:`bar_chart;1.4em` Benchmark Graphs
-
-
-OpenVINO™ Model Server is an open-source, production-grade inference platform that exposes a set of models via a convenient inference API 
-over gRPC or HTTP/REST. It employs the OpenVINO™ Runtime libraries from the Intel® Distribution of OpenVINO™ toolkit to extend workloads 
-across Intel® hardware including CPU, GPU and others.
-@endsphinxdirective
-
-
-![OpenVINO™ Model Server](../img/performance_benchmarks_ovms_01.png)
-
-## Measurement Methodology
-
-OpenVINO™ Model Server is measured in multiple-client-single-server configuration using two hardware platforms connected by ethernet network. The network bandwidth depends on the platforms as well as models under investigation and it is set to not be a bottleneck for workload intensity. This connection is dedicated only to the performance measurements. The benchmark setup is consists of four main parts:
-
-![OVMS Benchmark Setup Diagram](../img/performance_benchmarks_ovms_02.png)
-
-* **OpenVINO™ Model Server** is launched as a docker container on the server platform and it listens (and answers on) requests from clients. OpenVINO™ Model Server is run on the same machine as the OpenVINO™ toolkit benchmark application in corresponding benchmarking. Models served by OpenVINO™ Model Server are located in a local file system mounted into the docker container. The OpenVINO™ Model Server instance communicates with other components via ports over a dedicated docker network.
-
-* **Clients** are run in separated physical machine referred to as client platform. Clients are implemented in Python3 programming language based on TensorFlow* API and they work as parallel processes. Each client waits for a response from OpenVINO™ Model Server before it will send a new next request. The role played by the clients is also verification of responses.
-
-* **Load balancer** works on the client platform in a docker container. HAProxy is used for this purpose. Its main role is counting of requests forwarded from clients to OpenVINO™ Model Server, estimating its latency, and sharing this information by Prometheus service. The reason of locating the load balancer on the client site is to simulate real life scenario that includes impact of physical network on reported metrics.
-
-* **Execution Controller** is launched on the client platform. It is responsible for synchronization of the whole measurement process, downloading metrics from the load balancer, and presenting the final report of the execution.
-
-
-
-@sphinxdirective
-
-
-
-Platform & Configurations
-####################################
-
-For a listing of all platforms and configurations used for testing, refer to the following:
-
-.. button-link:: _static/benchmarks_files/platform_list_22.3.pdf
-   :color: primary
-   :outline:
-
-   :material-regular:`download;1.5em` Click for Hardware Platforms [PDF]
-
-.. button-link:: _static/benchmarks_files/OV-2022.3-system-info-detailed.xlsx
-   :color: primary
-   :outline:
-
-   :material-regular:`download;1.5em` Click for Configuration Details [XLSX]
-
-.. the files above need to be changed to the proper ones!!!
-
-The presented performance benchmark numbers are based on the release 2022.2 of the Intel® Distribution of OpenVINO™ toolkit.
-The benchmark application loads the OpenVINO™ Runtime and executes inferences on the specified hardware (CPU, GPU or GNA). 
-It measures the time spent on actual inference (excluding any pre or post processing) and then reports on the inferences per second (or Frames Per Second). 
-
-Disclaimers
-####################################
-
-Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2022.3.
-
-Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of November 16, 2022 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.
-
-Performance varies by use, configuration and other factors. Learn more at `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
-
-Your costs and results may vary.
-
-Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
-
-© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
-
-
-@endsphinxdirective
--- a/docs/img/performance_benchmarks_ovms_01.png
+++ b/docs/img/performance_benchmarks_ovms_01.png
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:d86125db1e295334c04e92d0645c773f679d21bf52e25dce7c887fdf972b7a28
-size 19154
--- a/tools/pot/README.md
+++ b/tools/pot/README.md
@@ -55,5 +55,5 @@ OpenVINO provides several examples to demonstrate the POT optimization workflow:

 ## See Also

-* [Performance Benchmarks](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks_openvino.html)
+* [Performance Benchmarks](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks.html)
 * [INT8 Quantization by Using Web-Based Interface of the DL Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Int_8_Quantization.html)