This commits adds cmake functionality that can
hipify the cuistl framework to support AMD GPUs.
Some tests have been written as HIP does not mirror
CUDA exactly.
CONVERT_CUDA_TO_HIP is the new CMAKE argument.
CMAKE version is increased to include HIP
as a language (3.21 required).
A macro is added to create a layer of indirection
that will make only cuistl files that have been
changed rehipified.
Some BDA stuff is extracted to make sure CUDA
is not accidentally included.
Make some changes to Georgs original code:
dynamically allocated arrays with std::vectors instead
Implement new class structure handling what
MPI communication implementation to use
create extra scopes to avoid reuse of index variable i
Update related tests:
Update test_cuowneroverlapcopy to account for new
class strucutre
Also remove line that invalidates the MPI tests for multiple processes
preconditioner. Uses graph coloring to exploit
parallelism in upper and triangular solves when
computing a diagonal approximate inverse of a
sparse matrix. Supports blocksizes up to 3.
Implement calls to cuBlas, cuSparse and implement necessary
CUDA kernels to perform a single iteration of the jacobi preconditioner.
Add tests that verify new kernels and the preconditioner in its totality.
The preconditioner is verified on 2x2 and 3x3 blocks, which as of now
are the only supported sizes. 1x1 are not supported because cuSparse
does not support it.