Files
openvino/docs/IE_DG/supported_plugins/CPU.md
Andrey Zaytsev 322c874113 Feature/azaytsev/cherry picks from 2021 4 (#7389)
* Added info on DockerHub CI Framework

* Feature/azaytsev/change layout (#3295)

* Changes according to feedback comments

* Replaced @ref's with html links

* Fixed links, added a title page for installing from repos and images, fixed formatting issues

* Added links

* minor fix

* Added DL Streamer to the list of components installed by default

* Link fixes

* Link fixes

* ovms doc fix (#2988)

* added OpenVINO Model Server

* ovms doc fixes

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

* Updated openvino_docs.xml

* Updated the link to software license agreements

* Revert "Updated the link to software license agreements"

This reverts commit 706dac500e.

* Updated legal info (#6409)

# Conflicts:
#	thirdparty/ade

* Cherry-pick 4833c8db72

[DOCS]Changed DL WB related docs and tips (#6318)

* changed DL WB related docs and tips

* added two tips to benchmark and changed layout

* changed layout

* changed links

* page title added

* changed tips

* ie layout fixed

* updated diagram and hints

* changed tooltip and ref link

* changet tooltip link

* changed DL WB description

* typo fix
# Conflicts:
#	docs/doxygen/ie_docs.xml
#	thirdparty/ade

* Cherry-pick 6405

Feature/azaytsev/mo devguide changes (#6405)

* MO devguide edits

* MO devguide edits

* MO devguide edits

* MO devguide edits

* MO devguide edits

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Experimenting with videos

* Additional edits

* Additional edits

* Updated the workflow diagram

* Minor fix

* Experimenting with videos

* Updated the workflow diagram

* Removed  Prepare_Trained_Model, changed the title for Config_Model_Optimizer

* Rolled back

* Revert "Rolled back"

This reverts commit 6a4a3e1765.

* Revert "Removed  Prepare_Trained_Model, changed the title for Config_Model_Optimizer"

This reverts commit 0810bd534f.

* Fixed ie_docs.xml, Removed  Prepare_Trained_Model, changed the title for Config_Model_Optimizer

* Fixed ie_docs.xml

* Minor fix

* <details> tag issue

* <details> tag issue

* Fix <details> tag issue

* Fix <details> tag issue

* Fix <details> tag issue
# Conflicts:
#	thirdparty/ade

* Cherry-pick #6419

* [Runtime] INT8 inference documentation update

* [Runtime] INT8 inference documentation: typo was fixed

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Table of Contents was removed

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>
# Conflicts:
#	docs/IE_DG/Int8Inference.md
#	thirdparty/ade

* Cherry pick (#6437)

* Q2 changes

* Changed Convert_RNNT.md

Co-authored-by: baychub <cbay@yahoo.com>
# Conflicts:
#	docs/IE_DG/Int8Inference.md
#	docs/install_guides/installing-openvino-conda.md
#	docs/install_guides/pypi-openvino-dev.md
#	thirdparty/ade

* Cherry-pick (#6447)

* Added benchmark page changes

* Make the picture smaller

* Added Intel® Iris® Xe MAX Graphics

* Changed the TIP about DL WB

* Added Note on the driver for Intel® Iris® Xe MAX Graphics

* Fixed formatting

* Added the link to Intel® software for general purpose GPU capabilities

* OVSA ovsa_get_started updates

* Fixed link
# Conflicts:
#	thirdparty/ade

* Cherry-pick #6450

* fix layout

* 4
# Conflicts:
#	thirdparty/ade

* Cherry-pick #6466

* Cherry-pick #6548

* install docs fixes

* changed video width

* CMake reference added

* fixed table

* added backtics and table formating

* new table changes

* GPU table changes

* added more backtics and changed table format

* gpu table changes

* Update get_started_dl_workbench.md

Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com>
# Conflicts:
#	thirdparty/ade

* [Runtime] INT8 inference documentation update (#6419)

* [Runtime] INT8 inference documentation update

* [Runtime] INT8 inference documentation: typo was fixed

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Update docs/IE_DG/Int8Inference.md

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>

* Table of Contents was removed

Co-authored-by: Anastasiya Ageeva <anastasiya.ageeva@intel.com>
# Conflicts:
#	docs/IE_DG/Int8Inference.md
#	thirdparty/ade

* Cherry-pick #6651

* Edits to MO

Per findings spreadsheet

* macOS changes

per issue spreadsheet

* Fixes from review spreadsheet

Mostly IE_DG fixes

* Consistency changes

* Make doc fixes from last round of review

* Add GSG build-all details

* Fix links to samples and demos pages

* Make MO_DG v2 changes

* Add image view step to classify demo

* Put MO dependency with others

* Edit docs per issues spreadsheet

* Add file to pytorch_specific

* More fixes per spreadsheet

* Prototype sample page

* Add build section

* Update README.md

* Batch download/convert by default

* Add detail to How It Works

* Minor change

* Temporary restored topics

* corrected layout

* Resized

* Added white background into the picture

* fixed link to omz_tools_downloader

* fixed title in the layout

Co-authored-by: baychub <cbay@yahoo.com>
Co-authored-by: baychub <31420038+baychub@users.noreply.github.com>
# Conflicts:
#	docs/doxygen/ie_docs.xml

* Cherry-pick  (#6789) [59449][DOCS] GPU table layout change

* changed argument display

* added br tag to more arguments

* changed argument display in GPU table

* changed more arguments

* changed Quantized_ models display
# Conflicts:
#	thirdparty/ade

* Sync doxygen-ignore

* Removed ref to FPGA.md

* Fixed link to ONNX format doc

Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Edward Shogulin <edward.shogulin@intel.com>
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
2021-09-07 19:21:41 +03:00

9.2 KiB

CPU Plugin

Introducing CPU Plugin

The CPU plugin was developed in order to provide opportunity for high performance scoring of neural networks on CPU, using the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN).

Currently, the CPU plugin uses Intel® Threading Building Blocks (Intel® TBB) in order to parallelize calculations. Please refer to the Optimization Guide for associated performance considerations.

The set of supported layers can be expanded with the Extensibility mechanism.

Supported Platforms

OpenVINO™ toolkit is officially supported and validated on the following platforms:

Host OS (64-bit)
Development Ubuntu* 18.04, CentOS* 7.5, MS Windows* 10
Target Ubuntu* 18.04, CentOS* 7.5, MS Windows* 10

The CPU Plugin supports inference on Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE).

You can use -pc the flag for samples to know which configuration is used by some layer. This flag shows execution statistics that you can use to get information about layer name, execution status, layer type, execution time, and the type of the execution primitive.

Internal CPU Plugin Optimizations

CPU plugin supports several graph optimization algorithms, such as fusing or removing layers. Refer to the sections below for details.

Note

: For layer descriptions, see the IR Notation Reference.

Lowering Inference Precision

CPU plugin follows default optimization approach. This approach means that inference is made with lower precision if it is possible on a given platform to reach better performance with acceptable range of accuracy.

Note

: For details, see the Using Bfloat16 Inference.

Fusing Convolution and Simple Layers

Merge of a Convolution layer and any of the simple layers listed below:

  • Activation: ReLU, ELU, Sigmoid, Clamp
  • Depthwise: ScaleShift, PReLU
  • FakeQuantize

Note

: You can have any number and order of simple layers.

A combination of a Convolution layer and simple layers results in a single fused layer called Convolution:
conv_simple_01

Fusing Pooling and FakeQuantize Layers

A combination of Pooling and FakeQuantize layers results in a single fused layer called Pooling:
pooling_fakequant_01

Fusing FullyConnected and Activation Layers

A combination of FullyConnected and Activation layers results in a single fused layer called FullyConnected:
fullyconnected_activation_01

Fusing Convolution and Depthwise Convolution Layers Grouped with Simple Layers

Note

: This pattern is possible only on CPUs with support of Streaming SIMD Extensions 4.2 (SSE 4.2) and Intel AVX2 Instruction Set Architecture (ISA).

A combination of a group of a Convolution (or Binary Convolution) layer and simple layers and a group of a Depthwise Convolution layer and simple layers results in a single layer called Convolution (or Binary Convolution):

Note

: Depthwise convolution layers should have the same values for the group, input channels, and output channels parameters.

conv_depth_01

Fusing Convolution and Sum Layers

A combination of Convolution, Simple, and Eltwise layers with the sum operation results in a single layer called Convolution:
conv_sum_relu_01

Fusing a Group of Convolutions

If a topology contains the following pipeline, a CPU plugin merges Split, Convolution, and Concatenation layers into a single Convolution layer with the group parameter:

Note

: Parameters of the Convolution layers must coincide.

group_convolutions_01

Removing a Power Layer

CPU plugin removes a Power layer from a topology if it has the following parameters:

  • power = 1
  • scale = 1
  • offset = 0

Supported Configuration Parameters

The plugin supports the configuration parameters listed below. All parameters must be set with the InferenceEngine::Core::LoadNetwork() method. When specifying key values as raw strings (that is, when using Python API), omit the KEY_ prefix. Refer to the OpenVINO samples for usage examples: Benchmark App.

These are general options, also supported by other plugins:

Parameter name Parameter values Default Description
KEY_EXCLUSIVE_ASYNC_REQUESTS YES/NO NO Forces async requests (also from different executable networks) to execute serially. This prevents potential oversubscription
KEY_PERF_COUNT YES/NO NO Enables gathering performance counters

CPU-specific settings:

Parameter name Parameter values Default Description
KEY_CPU_THREADS_NUM positive integer values 0 Specifies the number of threads that CPU plugin should use for inference. Zero (default) means using all (logical) cores
KEY_CPU_BIND_THREAD YES/NUMA/NO YES Binds inference threads to CPU cores. 'YES' (default) binding option maps threads to cores - this works best for static/synthetic scenarios like benchmarks. The 'NUMA' binding is more relaxed, binding inference threads only to NUMA nodes, leaving further scheduling to specific cores to the OS. This option might perform better in the real-life/contended scenarios. Note that for the latency-oriented cases (number of the streams is less or equal to the number of NUMA nodes, see below) both YES and NUMA options limit number of inference threads to the number of hardware cores (ignoring hyper-threading) on the multi-socket machines.
KEY_CPU_THROUGHPUT_STREAMS KEY_CPU_THROUGHPUT_NUMA, KEY_CPU_THROUGHPUT_AUTO, or positive integer values 1 Specifies number of CPU "execution" streams for the throughput mode. Upper bound for the number of inference requests that can be executed simultaneously. All available CPU cores are evenly distributed between the streams. The default value is 1, which implies latency-oriented behavior for single NUMA-node machine, with all available cores processing requests one by one. On the multi-socket (multiple NUMA nodes) machine, the best latency numbers usually achieved with a number of streams matching the number of NUMA-nodes.
KEY_CPU_THROUGHPUT_NUMA creates as many streams as needed to accommodate NUMA and avoid associated penalties.
KEY_CPU_THROUGHPUT_AUTO creates bare minimum of streams to improve the performance; this is the most portable option if you don't know how many cores your target machine has (and what would be the optimal number of streams). Note that your application should provide enough parallel slack (for example, run many inference requests) to leverage the throughput mode.
Non-negative integer value creates the requested number of streams. If a number of streams is 0, no internal streams are created and user threads are interpreted as stream master threads.
KEY_ENFORCE_BF16 YES/NO YES The name for setting to execute in bfloat16 precision whenever it is possible. This option lets plugin know to downscale the precision where it sees performance benefits from bfloat16 execution. Such option does not guarantee accuracy of the network, you need to verify the accuracy in this mode separately, based on performance and accuracy results. It should be your decision whether to use this option or not.

Note

: To disable all internal threading, use the following set of configuration parameters: KEY_CPU_THROUGHPUT_STREAMS=0, KEY_CPU_THREADS_NUM=1, KEY_CPU_BIND_THREAD=NO.

See Also