Quote: The Skylake microarchitecture implements a different state
machine than prior generations to manage the YMM state transition
associated with mixing SSE and AVX instructions.
It no longer saves the entire upper YMM state when executing
an SSE instruction when in “Modified and Unsaved” state,
but saves the upper bits of individual register.
As a result, mixing SSE and AVX instructions will experience
a penalty associated with partial register dependency of
the destination registers being used and additional blend
operation on the upper bits of the destination registers.
Such type of penalties have a huge impact on openvino's and oneDNN's kernels.
Basically the mixing of VEX and non-VEX instructions should be avoided.