* New CumSum implementation init
* Unified ndim approach
* Move transpose to separate function
* Move transpose to original to separate function
* Move slice_count calculation to function
* Negative axes support
* Refactor redundant copy
* Changed copy to move
* Temp more backend tests
* Add const to shape arg
* Use span for slices calculation
* Remove unused headers
* CumSum new ref tests
* Add more ref tests
* Add all cumsum modes ref tests
* new optimized cum_sum reference
* Add reverse mode
* Optimized cumsum ref
* Remove deprecated cumsum backend tests
* Add more CumSum reference tests
* Simplify CumSum shared layer tests SetUp
* Replace auto to size_t in loop
* Change static_cast to T{}