strictly lower and stricly upper part. Add tests checking that the result matches the CPU dilu implementation.
preconditioner. Uses graph coloring to exploit parallelism in upper and triangular solves when computing a diagonal approximate inverse of a sparse matrix. Supports blocksizes up to 3.