As described in the section on the :doc:`latency-specific considerations <openvino_docs_deployment_optimization_guide_latency>`, one of the possible use cases is *delivering every single request at the minimal delay*.
Throughput, on the other hand, is about inference scenarios in which potentially **large number of inference requests are served simultaneously to improve the device utilization**.
A trade-off between overall throughput and serial performance of individual requests can be achieved with the right performance configuration of OpenVINO.
* **Basic (high-level)** flow with :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` which is inherently **portable and future-proof**.
* **Advanced (low-level)** approach of explicit **batching** and **streams**. For more details, see the :doc:`runtime inference optimizations <openvino_docs_deployment_optimization_guide_tput_advanced>`
* Setup the configuration for the *device* (for example, as parameters of the ``ov::Core::compile_model``) via either previously introduced :doc:`low-level explicit options <openvino_docs_deployment_optimization_guide_tput_advanced>` or :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` (**preferable**):
* Query the ``ov::optimal_number_of_infer_requests`` from the ``ov::CompiledModel`` (resulted from a compilation of the model for the device) to create the number of the requests required to saturate the device.
* Use the Async API with callbacks, to avoid any dependency on the completion order of the requests and possible device starvation, as explained in the :doc:`common-optimizations section <openvino_docs_deployment_optimization_guide_common>`.
OpenVINO offers the automatic, scalable :doc:`multi-device inference mode <openvino_docs_OV_UG_Running_on_multiple_devices>`, which is a simple *application-transparent* way to improve throughput. There is no need to re-architecture existing applications for any explicit multi-device support: no explicit network loading to each device, no separate per-device queues, no additional logic to balance inference requests between devices, etc. For the application using it, multi-device is like any other device, as it manages all processes internally.
* Using the :ref:`Asynchronous API <async_api>` and :doc:`callbacks <openvino_docs_OV_UG_Infer_request>` in particular.
* Providing the multi-device (and hence the underlying devices) with enough data to crunch. As the inference requests are naturally independent data pieces, the multi-device performs load-balancing at the "requests" (outermost) level to minimize the scheduling overhead.
Keep in mind that the resulting performance is usually a fraction of the "ideal" (plain sum) value, when the devices compete for certain resources such as the memory-bandwidth, which is shared between CPU and iGPU.
While the legacy approach of optimizing the parameters of each device separately works, the :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` allow configuring all devices (that are part of the specific multi-device configuration) at once.