fixes the FP32 and FP16 swap of results. authored-by: Michael Frank Hansen <michael.f.hansen@intel.com>