* selectdevice returns MULTI:device in cumulative_throughput * load multi with throughput and disable cpu helper in cumulative * disable cpu helper in cumulative_throughput * add cumulative to bechmark_app help message * modify benchmark_app.hpp clang-format