From fac681dffefd1e786df6b36d46439eac8f20d534 Mon Sep 17 00:00:00 2001 From: "James E. McClure" Date: Wed, 7 Sep 2022 15:44:16 -0400 Subject: [PATCH] Rc 08.15.2022 (#70) * summit configure * add spock scripts to FOM * get new models to build with hip * add hip slipping bc * testing communication on spock * update spock build based on olcf docs * update configure & test scripts for spock * Fixing potential bugs with communication * Adding simple test of GPU aware MPI * some changes to configure for spock * Modifying GPU aware MPI test to send multiple messages * playing with spock Gpu test * added gpu wrapper test * Cleaning up some compiler warnings * add barrier between pack / MPI send * Updating build to support HIP as a language * fixing gpu mpi sync * Adding script * local spock changes * add membrane class * update membrane structure * membrane communications * working on new comm data structures * add membrane unit test * membrane compiles * membrane test * Updating hip port to match cuda * update summit config * update summit config * add configure script for crusher * update membrane test * update membrane test * convention for inside / outside membrane link direction * working on membrane comm * try to fix time conversion factor for Poisson solver; to be built and tested * fix dumb typo * update summit config * tune launch for crusher * tune launch for mrt on crusher * update color * summit script with specific module versions * update crusher config * add crusher examples * add dense case for crusher * Fixing some quick annoying compile warnings * fix binding in example * working on fix * Adding simple crusher test * Adding new crusher MPI test * disable MPI thread multiple for crusher * updates to crusher configure * cpu test for crusher * Working on standalone reproducer for MPI bug * More work on creating standalone test * More work on creating standalone test * More work on creating standalone test * Reverting TestCrusher2, standalone version passes (TestCrusher3.cpp), need to figure out why * Working on standalone MPI test on crusher * Working on standalone MPI test on crusher * Getting closer to stand alone test * Still trying to create standalone reproducer * hang fix / workaround * Created standalone MPI failure test * Removing TestCrusher tests, the bug deals with the StackTrace which we disable the multistack trace for now. Moving the test out of LBPM * fix sendcount / recvcount * Testing persistent communication * Updating calculation of bandwidth * crusher hackathon final version * working on membrane communication structures * add cell simulator * added cell simulator * make sure halo is filled when measuring object * add membrane transport function for d3q7 * add membrane unpack function * poking at MF issue * update crusher build * membrane data structures compiling * update to membrane capability * update comments in ion model * fix dumb print bug * clean up relabel * adding membrane functions * move membrane to common folder * membrane structure in IonModel * membrane structure in IonModel * try at membrane simulator * add python script to generate bubble * add python script to generate bubble * cell simulator runs * read input files * add single cell example * refining cell example * start on cuda function * werkin * start on cuda function * start on hip function * updates and fix for user input reader * update cell example * add sigmoid to ion equilibrium dist * cuda build succeeds * update crusher script * getting ready to merge gpu * refactor compact AA routines for testing * add testing functions to ScaLBL * testing membrane ion transport * membrane transport test passing * membrane starts working ok... * original wang poisson solver (broken) * rex d3q19 (broken) * tau from wang paper * still broken wang * d319 poisson works good * Poisson working pretty good now * initialize nernst-planck simulator; to be built subject to debugging * fix a few syntax bugs and build passed * Poisson solver; enable specifying initial values * update cell example * add GPU functions for d3q19 poisson * fix dumb bugs * fix bugs in initializing electric potential; the Psi on solid was accidentally overwritten before. * small change * fix bugs in importing ion model's dummy velocity * add membrane concentration init * remove bad warnings * remove print staetements * add barriers to poisson solver * update print * print membrane input concentration * read Membrane ion concentration list * fix bad ref to D3Q7 * update error analysis for Poisson solver * fix typo * update hip poisson solver * deprecate old error methods * a bunch of summit debug things to roll back later * fix poisson typo * update hip * debug crusher * debug charge density problems on crusher * fix charge density (i think) * remove Stokes solver from cell simulator; need to test build * update cpu ion valence * added membrane properties to input db * update cell db * update executable list and NP_cell simulator * correct use_membrane functionality * add functionality for user to choose either D3Q7 or D3Q19 lattice for Poisson;to be built and tested * build passed * make further corrections * correct D3Q7 Poisson LB algorithm * correct ion LB collison * udpate output precision * add more tweaks for cell simulator * update print-out * this makes mpi hang error explicit;to be debugged * cleanup with help from valgrind * update to cell vis routine * add hip for ion update * fix missing bracket * add new ion code for cuda * add barrier to membrane transport * debug gpu launch issue for ion * debug gpu * add functions to copy send / recv list from ScaLBL * updating membrane communication structure * membrane test works with new comm * communication seems to work * add sample files for plane membrane * update gpu routines first try * update hip * multiple nvidia gpu working with membrane * added membrane analysis capability * added support for swc file * support for SWC input format * swc reader works with MPI * shift swc data * SWC reader update * SWC reader update 2 * add offset to Domain for swc * add input files for simple bacteria * add performance counters to ion / poisson solvers * fix bug with SWC * add BC to poisson solver * fix compiler warnings * fix memory leaks * fix zlib download path * Fixing memory leak * Fixing memory leaks * restart for Poisson model * fix bug in ion model restart * trying to fix yaml * fix workflow indentation * porosity factor in effperm * porosity factor in effperm * porosity factor in effperm * porosity factor in effperm Co-authored-by: James E McClure Co-authored-by: Mark Berrill Co-authored-by: Zhe Rex Li Co-authored-by: Zhe Li Co-authored-by: Zhe Li Co-authored-by: Zhe Li Co-authored-by: Zhe Li Co-authored-by: Zhe Li --- .github/workflows/c-cpp.yml | 3 +- CMakeLists.txt | 20 +- IO/PackData.cpp | 1 - IO/PackData.h | 2 +- StackTrace/StackTrace.cpp | 2 +- ValgrindSuppresionFile | 227 ++- analysis/ElectroChemistry.cpp | 645 ++++++- analysis/ElectroChemistry.h | 11 +- analysis/Minkowski.cpp | 6 +- analysis/SubPhase.cpp | 17 +- cmake/macros.cmake | 12 +- common/Array.h | 2 + common/ArraySize.h | 2 + common/Communication.h | 122 +- common/Domain.cpp | 945 ++++++---- common/Domain.h | 6 + common/FunctionTable.cpp | 21 +- common/FunctionTable.hpp | 8 +- common/MPI.I | 1597 ++++++++--------- common/MPI.cpp | 284 +-- common/MPI.h | 115 +- common/Membrane.cpp | 850 +++++++++ common/Membrane.h | 178 ++ common/ScaLBL.cpp | 947 ++++++---- common/ScaLBL.h | 100 +- common/Utilities.cpp | 4 +- common/UtilityMacros.h | 3 +- cpu/D3Q19.cpp | 5 +- cpu/Ion.cpp | 417 ++++- cpu/MembraneHelper.cpp | 42 + cpu/Poisson.cpp | 532 +++++- cuda/BGK.cu | 2 +- cuda/D3Q19.cu | 12 +- cuda/D3Q7BC.cu | 140 +- cuda/Ion.cu | 807 +++++++-- cuda/MRT.cu | 4 +- cuda/Poisson.cu | 407 +++++ example/Bubble/CreateBubble.py | 18 + example/Bubble/CreateCell.py | 77 + example/Bubble/cell.db | 75 + example/PlaneMembrane/plane_membrane.db | 117 ++ example/PlaneMembrane/plane_membrane.py | 90 + example/SingleCell/Bacterium.db | 86 + example/SingleCell/Bacterium.swc | 8 + example/SingleCell/NaCl-cell.py | 41 + example/SingleCell/NaCl.db | 80 + example/systems/crusher/A2-8GPU/Color.sh | 36 + example/systems/crusher/A2-8GPU/MRT.sh | 36 + example/systems/crusher/A2-8GPU/input.db | 69 + .../crusher/Dense-8GPU/MPI-multinode.sh | 36 + .../crusher/Dense-8GPU/MPI-singlenode.sh | 36 + .../crusher/Dense-8GPU/RandomSystem.py | 9 + .../systems/crusher/Dense-8GPU/multinode.db | 69 + example/systems/spock/GenerateSphereTest.sh | 14 + example/systems/spock/Spock.sh | 17 + example/systems/spock/Test.sh | 32 + example/systems/spock/input.db | 54 + example/systems/spock/sphere322.db | 0 example/systems/spock/test.db | 54 + hip/{BGK.cu => BGK.hip} | 9 +- hip/CMakeLists.txt | 10 - hip/{Color.cu => Color.hip} | 126 +- hip/{CudaExtras.cu => CudaExtras.hip} | 0 hip/{D3Q19.cu => D3Q19.hip} | 30 +- hip/{D3Q7.cu => D3Q7.hip} | 0 hip/D3Q7BC.cu | 567 ------ hip/D3Q7BC.hip | 917 ++++++++++ hip/{Extras.cu => Extras.hip} | 0 hip/{FreeLee.cu => FreeLee.hip} | 37 +- hip/{Greyscale.cu => Greyscale.hip} | 0 hip/{GreyscaleColor.cu => GreyscaleColor.hip} | 9 +- hip/Ion.cu | 422 ----- hip/Ion.hip | 969 ++++++++++ hip/{MRT.cu => MRT.hip} | 7 +- hip/{MixedGradient.cu => MixedGradient.hip} | 2 - hip/Poisson.cu | 332 ---- hip/Poisson.hip | 705 ++++++++ hip/{Stokes.cu => Stokes.hip} | 0 hip/{dfh.cu => dfh.hip} | 0 models/BGKModel.cpp | 538 ++++++ models/BGKModel.h | 94 + models/ColorModel.cpp | 65 +- models/IonModel.cpp | 512 +++++- models/IonModel.h | 33 +- models/MRTModel.cpp | 6 +- models/MRTModel.h | 1 + models/MultiPhysController.cpp | 54 +- models/MultiPhysController.h | 4 +- models/PoissonSolver.cpp | 767 +++++--- models/PoissonSolver.h | 7 +- sample_scripts/configure_crusher_cpu | 47 + sample_scripts/configure_crusher_hip | 52 + sample_scripts/configure_spock_TPLs | 9 +- sample_scripts/configure_spock_hip | 42 +- sample_scripts/configure_spock_hip_mark | 39 + sample_scripts/configure_summit | 27 +- tests/CMakeLists.txt | 7 +- tests/TestCommD3Q19.cpp | 37 +- tests/TestMembrane.cpp | 348 ++++ tests/TestTorus.cpp | 1 + tests/lbpm_BGK_simulator.cpp | 425 +---- tests/lbpm_cell_simulator.cpp | 159 ++ tests/lbpm_color_simulator.cpp | 4 +- ...m_electrokinetic_SingleFluid_simulator.cpp | 19 +- tests/lbpm_nernst_planck_cell_simulator.cpp | 174 ++ tests/lbpm_nernst_planck_simulator.cpp | 136 ++ tests/lbpm_permeability_simulator.cpp | 3 +- tests/test_MPI.cpp | 143 +- 108 files changed, 13083 insertions(+), 4364 deletions(-) create mode 100644 common/Membrane.cpp create mode 100644 common/Membrane.h create mode 100644 cpu/MembraneHelper.cpp create mode 100644 example/Bubble/CreateBubble.py create mode 100644 example/Bubble/CreateCell.py create mode 100644 example/Bubble/cell.db create mode 100644 example/PlaneMembrane/plane_membrane.db create mode 100755 example/PlaneMembrane/plane_membrane.py create mode 100644 example/SingleCell/Bacterium.db create mode 100644 example/SingleCell/Bacterium.swc create mode 100644 example/SingleCell/NaCl-cell.py create mode 100644 example/SingleCell/NaCl.db create mode 100644 example/systems/crusher/A2-8GPU/Color.sh create mode 100644 example/systems/crusher/A2-8GPU/MRT.sh create mode 100644 example/systems/crusher/A2-8GPU/input.db create mode 100644 example/systems/crusher/Dense-8GPU/MPI-multinode.sh create mode 100644 example/systems/crusher/Dense-8GPU/MPI-singlenode.sh create mode 100644 example/systems/crusher/Dense-8GPU/RandomSystem.py create mode 100644 example/systems/crusher/Dense-8GPU/multinode.db create mode 100644 example/systems/spock/GenerateSphereTest.sh create mode 100644 example/systems/spock/Spock.sh create mode 100644 example/systems/spock/Test.sh create mode 100644 example/systems/spock/input.db create mode 100644 example/systems/spock/sphere322.db create mode 100644 example/systems/spock/test.db rename hip/{BGK.cu => BGK.hip} (96%) delete mode 100644 hip/CMakeLists.txt rename hip/{Color.cu => Color.hip} (97%) rename hip/{CudaExtras.cu => CudaExtras.hip} (100%) rename hip/{D3Q19.cu => D3Q19.hip} (98%) rename hip/{D3Q7.cu => D3Q7.hip} (100%) delete mode 100644 hip/D3Q7BC.cu create mode 100644 hip/D3Q7BC.hip rename hip/{Extras.cu => Extras.hip} (100%) rename hip/{FreeLee.cu => FreeLee.hip} (99%) rename hip/{Greyscale.cu => Greyscale.hip} (100%) rename hip/{GreyscaleColor.cu => GreyscaleColor.hip} (99%) delete mode 100644 hip/Ion.cu create mode 100644 hip/Ion.hip rename hip/{MRT.cu => MRT.hip} (98%) rename hip/{MixedGradient.cu => MixedGradient.hip} (98%) delete mode 100644 hip/Poisson.cu create mode 100644 hip/Poisson.hip rename hip/{Stokes.cu => Stokes.hip} (100%) rename hip/{dfh.cu => dfh.hip} (100%) create mode 100644 models/BGKModel.cpp create mode 100644 models/BGKModel.h create mode 100755 sample_scripts/configure_crusher_cpu create mode 100755 sample_scripts/configure_crusher_hip create mode 100755 sample_scripts/configure_spock_hip_mark create mode 100644 tests/TestMembrane.cpp create mode 100644 tests/lbpm_cell_simulator.cpp create mode 100644 tests/lbpm_nernst_planck_cell_simulator.cpp create mode 100644 tests/lbpm_nernst_planck_simulator.cpp diff --git a/.github/workflows/c-cpp.yml b/.github/workflows/c-cpp.yml index 67708501..90f3ee54 100644 --- a/.github/workflows/c-cpp.yml +++ b/.github/workflows/c-cpp.yml @@ -16,8 +16,7 @@ jobs: LBPM_SILO_DIR: /home/runner/extlib/silo MPI_DIR: /home/runner/.openmpi - steps: - + steps: - name: download dependencies run: | echo $LBPM_ZLIB_DIR diff --git a/CMakeLists.txt b/CMakeLists.txt index 507b10b5..a060b5f8 100755 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -124,21 +124,7 @@ IF ( USE_CUDA ) ADD_DEFINITIONS( -DUSE_CUDA ) ENABLE_LANGUAGE( CUDA ) ELSEIF ( USE_HIP ) - IF ( NOT DEFINED HIP_PATH ) - IF ( NOT DEFINED ENV{HIP_PATH} ) - SET( HIP_PATH "/opt/rocm/hip" CACHE PATH "Path to which HIP has been installed" ) - ELSE() - SET( HIP_PATH $ENV{HIP_PATH} CACHE PATH "Path to which HIP has been installed" ) - ENDIF() - ENDIF() - SET( CMAKE_MODULE_PATH "${HIP_PATH}/cmake" ${CMAKE_MODULE_PATH} ) - FIND_PACKAGE( HIP REQUIRED ) - FIND_PACKAGE( CUDA QUIET ) - MESSAGE( "HIP Found") - MESSAGE( " HIP version: ${HIP_VERSION_STRING}") - MESSAGE( " HIP platform: ${HIP_PLATFORM}") - MESSAGE( " HIP Include Path: ${HIP_INCLUDE_DIRS}") - MESSAGE( " HIP Libraries: ${HIP_LIBRARIES}") + ENABLE_LANGUAGE( HIP ) ADD_DEFINITIONS( -DUSE_HIP ) ENDIF() @@ -180,8 +166,7 @@ IF ( NOT ONLY_BUILD_DOCS ) IF ( USE_CUDA ) ADD_PACKAGE_SUBDIRECTORY( cuda ) ELSEIF ( USE_HIP ) - ADD_SUBDIRECTORY( hip ) - SET( LBPM_LIBRARIES lbpm-hip lbpm-wia ) + ADD_PACKAGE_SUBDIRECTORY( hip ) ELSE() ADD_PACKAGE_SUBDIRECTORY( cpu ) ENDIF() @@ -190,5 +175,6 @@ IF ( NOT ONLY_BUILD_DOCS ) ADD_SUBDIRECTORY( example ) #ADD_SUBDIRECTORY( workflows ) INSTALL_PROJ_LIB() + CONFIGURE_FILE( ${CMAKE_CURRENT_SOURCE_DIR}/ValgrindSuppresionFile ${CMAKE_CURRENT_BINARY_DIR}/test/ValgrindSuppresionFile COPYONLY ) ENDIF() diff --git a/IO/PackData.cpp b/IO/PackData.cpp index 4a50dad8..7b8bfcad 100644 --- a/IO/PackData.cpp +++ b/IO/PackData.cpp @@ -1,5 +1,4 @@ #include "IO/PackData.h" - #include diff --git a/IO/PackData.h b/IO/PackData.h index f7c1d748..5c58ce0b 100644 --- a/IO/PackData.h +++ b/IO/PackData.h @@ -5,7 +5,7 @@ #include #include #include - +#include //! Template function to return the buffer size required to pack a class template diff --git a/StackTrace/StackTrace.cpp b/StackTrace/StackTrace.cpp index b288ed75..bcae7a66 100644 --- a/StackTrace/StackTrace.cpp +++ b/StackTrace/StackTrace.cpp @@ -1263,7 +1263,7 @@ static int backtrace_thread( if ( tid == pthread_self() ) { count = ::backtrace( buffer, size ); } else { - // Note: this will get the backtrace, but terminates the thread in the process!!! + // Send a signal to the desired thread to get the call stack StackTrace_mutex.lock(); struct sigaction sa; sigfillset( &sa.sa_mask ); diff --git a/ValgrindSuppresionFile b/ValgrindSuppresionFile index 34bb2527..85fc1b08 100644 --- a/ValgrindSuppresionFile +++ b/ValgrindSuppresionFile @@ -1,52 +1,225 @@ -# ACML suppressions +# To run valgrind: +# mpirun -np 2 valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --verbose --suppressions=ValgrindSuppresionFile --log-file=valgrind-out.txt ./lbpm_nernst_planck_cell_simulator test.db + + +# MPI supressions { - IdentifyCPUCond + MPI_init_cond Memcheck:Cond ... - fun:acmlcpuid2 + fun:PMPI_Init ... } { - IdentifyCPUValue + MPI_init_value Memcheck:Value8 ... - fun:acmlcpuid_once - fun:acmlcpuid2 + fun:PMPI_Init ... } - - -# MPI suppressions { - HYD_pmci_wait_for_completion - Memcheck:Leak - ... - fun:HYD_pmci_wait_for_completion - fun:main -} -{ - HYDT_dmxu_poll_wait_for_event - Memcheck:Leak - ... - fun:HYDT_dmxu_poll_wait_for_event - fun:main -} -{ - PMPI_Init - Memcheck:Leak + MPI_init_addr16 + Memcheck:Addr16 ... fun:PMPI_Init - fun:main + ... +} +{ + MPI_init_addr8 + Memcheck:Addr8 + ... + fun:PMPI_Init + ... +} +{ + MPI_init_addr4 + Memcheck:Addr4 + ... + fun:PMPI_Init + ... +} +{ + MPI_init_addr1 + Memcheck:Addr1 + ... + fun:PMPI_Init + ... +} +{ + gethostname_cond + Memcheck:Cond + ... + fun:gethostbyname_r + fun:gethostbyname + ... +} +{ + gethostname_value + Memcheck:Value8 + ... + fun:gethostbyname_r + fun:gethostbyname + ... } -# System suppressions +# System errors +{ + map_doit_memory + Memcheck:Cond + fun:index + fun:expand_dynamic_string_token + fun:_dl_map_object + fun:map_doit + fun:_dl_catch_error + ... +} { expand_dynamic_string_token Memcheck:Cond fun:index fun:expand_dynamic_string_token ... + fun:dl_main + fun:_dl_sysdep_start + fun:_dl_start + ... +} +{ + call_init + Memcheck:Leak + match-leak-kinds: reachable + ... + fun:call_init + fun:_dl_init + ... +} + + +# pthread errors +{ + pthread_initialize_param + Memcheck:Param + set_robust_list(head) + fun:__pthread_initialize_minimal + fun:(below main) +} +{ + pthread_initialize_cond + Memcheck:Cond + fun:__register_atfork + fun:__libc_pthread_init + fun:__pthread_initialize_minimal + fun:(below main) +} + + +# gfortran +{ + gfortran_leak + Memcheck:Leak + match-leak-kinds: reachable + fun:malloc + obj:/usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0 + ... +} + + +# std +{ + libc_cond + Memcheck:Cond + ... + fun:_dl_init_paths + fun:_dl_non_dynamic_init + fun:__libc_init_first + fun:(below main) +} +{ + libc_val8 + Memcheck:Value8 + ... + fun:_dl_init_paths + fun:_dl_non_dynamic_init + fun:__libc_init_first + fun:(below main) +} +{ + mallinfo_cond + Memcheck:Cond + fun:int_mallinfo + fun:mallinfo + ... +} +{ + mallinfo_value + Memcheck:Value8 + fun:int_mallinfo + fun:mallinfo + ... +} +{ + int_free_cond + Memcheck:Cond + fun:_int_free + ... +} +{ + string_len_cond + Memcheck:Cond + fun:strlen + ... +} +{ + int_malloc_cond + Memcheck:Cond + fun:_int_malloc + fun:malloc + ... +} +{ + malloc_consolidate_malloc + Memcheck:Cond + fun:malloc_consolidate + fun:_int_malloc + ... +} +{ + malloc_consolidate_free + Memcheck:Cond + fun:malloc_consolidate + fun:_int_free + ... +} +{ + catch_cond + Memcheck:Cond + fun:__cxa_begin_catch + ... +} +{ + popen + Memcheck:Param + set_robust_list(head) + fun:__nptl_set_robust + fun:__libc_fork + fun:_IO_proc_open + fun:popen + ... +} +{ + exit + Memcheck:Value8 + fun:__run_exit_handlers + ... + fun:exit + ... +} +{ + sse42 + Memcheck:Cond + fun:__strstr_sse42 + ... } diff --git a/analysis/ElectroChemistry.cpp b/analysis/ElectroChemistry.cpp index fcd4d2a2..4f23f00f 100644 --- a/analysis/ElectroChemistry.cpp +++ b/analysis/ElectroChemistry.cpp @@ -49,7 +49,7 @@ ElectroChemistryAnalyzer::ElectroChemistryAnalyzer(std::shared_ptr dm) IonFluxElectrical_y.fill(0); IonFluxElectrical_z.resize(Nx, Ny, Nz); IonFluxElectrical_z.fill(0); - + if (Dm->rank() == 0) { bool WriteHeader = false; TIMELOG = fopen("electrokinetic.csv", "r"); @@ -67,6 +67,87 @@ ElectroChemistryAnalyzer::ElectroChemistryAnalyzer(std::shared_ptr dm) } } +ElectroChemistryAnalyzer::ElectroChemistryAnalyzer(ScaLBL_IonModel &IonModel) + : Dm(IonModel.Dm) { + + Nx = Dm->Nx; + Ny = Dm->Ny; + Nz = Dm->Nz; + Volume = (Nx - 2) * (Ny - 2) * (Nz - 2) * Dm->nprocx() * Dm->nprocy() * + Dm->nprocz() * 1.0; + + if (Dm->rank()==0) printf("Analyze system with sub-domain size = %i x %i x %i \n",Nx,Ny,Nz); + + USE_MEMBRANE = IonModel.USE_MEMBRANE; + + ChemicalPotential.resize(Nx, Ny, Nz); + ChemicalPotential.fill(0); + ElectricalPotential.resize(Nx, Ny, Nz); + ElectricalPotential.fill(0); + ElectricalField_x.resize(Nx, Ny, Nz); + ElectricalField_x.fill(0); + ElectricalField_y.resize(Nx, Ny, Nz); + ElectricalField_y.fill(0); + ElectricalField_z.resize(Nx, Ny, Nz); + ElectricalField_z.fill(0); + Pressure.resize(Nx, Ny, Nz); + Pressure.fill(0); + Rho.resize(Nx, Ny, Nz); + Rho.fill(0); + Vel_x.resize(Nx, Ny, Nz); + Vel_x.fill(0); // Gradient of the phase indicator field + Vel_y.resize(Nx, Ny, Nz); + Vel_y.fill(0); + Vel_z.resize(Nx, Ny, Nz); + Vel_z.fill(0); + SDs.resize(Nx, Ny, Nz); + SDs.fill(0); + IonFluxDiffusive_x.resize(Nx, Ny, Nz); + IonFluxDiffusive_x.fill(0); + IonFluxDiffusive_y.resize(Nx, Ny, Nz); + IonFluxDiffusive_y.fill(0); + IonFluxDiffusive_z.resize(Nx, Ny, Nz); + IonFluxDiffusive_z.fill(0); + IonFluxAdvective_x.resize(Nx, Ny, Nz); + IonFluxAdvective_x.fill(0); + IonFluxAdvective_y.resize(Nx, Ny, Nz); + IonFluxAdvective_y.fill(0); + IonFluxAdvective_z.resize(Nx, Ny, Nz); + IonFluxAdvective_z.fill(0); + IonFluxElectrical_x.resize(Nx, Ny, Nz); + IonFluxElectrical_x.fill(0); + IonFluxElectrical_y.resize(Nx, Ny, Nz); + IonFluxElectrical_y.fill(0); + IonFluxElectrical_z.resize(Nx, Ny, Nz); + IonFluxElectrical_z.fill(0); + + + if (Dm->rank() == 0) { + printf("Set up analysis routines for %lu ions \n",IonModel.number_ion_species); + + bool WriteHeader = false; + TIMELOG = fopen("electrokinetic.csv", "r"); + if (TIMELOG != NULL) + fclose(TIMELOG); + else + WriteHeader = true; + + TIMELOG = fopen("electrokinetic.csv", "a+"); + if (WriteHeader) { + // If timelog is empty, write a short header to list the averages + //fprintf(TIMELOG,"--------------------------------------------------------------------------------------\n"); + fprintf(TIMELOG, "timestep voltage_out voltage_in "); + fprintf(TIMELOG, "voltage_out_membrane voltage_in_membrane "); + for (size_t i=0; irank() == 0) { fclose(TIMELOG); @@ -75,6 +156,163 @@ ElectroChemistryAnalyzer::~ElectroChemistryAnalyzer() { void ElectroChemistryAnalyzer::SetParams() {} +void ElectroChemistryAnalyzer::Membrane(ScaLBL_IonModel &Ion, + ScaLBL_Poisson &Poisson, + int timestep) { + + int i, j, k; + + Poisson.getElectricPotential(ElectricalPotential); + + if (Dm->rank() == 0) + fprintf(TIMELOG, "%i ", timestep); + + + /* int iq, ip, nq, np, nqm, npm; + Ion.MembraneDistance(i,j,k); // inside (-) or outside (+) the ion + for (int link; linkmembraneLinkCount; link++){ + int iq = Ion.IonMembrane->membraneLinks[2*link]; + int ip = Ion.IonMembrane->membraneLinks[2*link+1]; + + iq = membrane[2*link]; ip = membrane[2*link+1]; + nq = iq%Np; np = ip%Np; + nqm = Map[nq]; npm = Map[np]; + } +*/ + unsigned long int in_local_count, out_local_count; + unsigned long int in_global_count, out_global_count; + + double value_in_local, value_out_local; + double value_in_global, value_out_global; + + double value_membrane_in_local, value_membrane_out_local; + double value_membrane_in_global, value_membrane_out_global; + + unsigned long int membrane_in_local_count, membrane_out_local_count; + unsigned long int membrane_in_global_count, membrane_out_global_count; + + double memdist,value; + in_local_count = 0; + out_local_count = 0; + membrane_in_local_count = 0; + membrane_out_local_count = 0; + + value_membrane_in_local = 0.0; + value_membrane_out_local = 0.0; + value_in_local = 0.0; + value_out_local = 0.0; + for (k = 1; k < Nz; k++) { + for (j = 1; j < Ny; j++) { + for (i = 1; i < Nx; i++) { + /* electric potential */ + memdist = Ion.MembraneDistance(i,j,k); + value = ElectricalPotential(i,j,k); + if (memdist < 0.0){ + // inside the membrane + if (fabs(memdist) < 1.0){ + value_membrane_in_local += value; + membrane_in_local_count++; + } + value_in_local += value; + in_local_count++; + + } + else { + // outside the membrane + if (fabs(memdist) < 1.0){ + value_membrane_out_local += value; + membrane_out_local_count++; + } + value_out_local += value; + out_local_count++; + } + } + } + } + /* these only need to be computed the first time through */ + out_global_count = Dm->Comm.sumReduce(out_local_count); + in_global_count = Dm->Comm.sumReduce(in_local_count); + membrane_out_global_count = Dm->Comm.sumReduce(membrane_out_local_count); + membrane_in_global_count = Dm->Comm.sumReduce(membrane_in_local_count); + + value_out_global = Dm->Comm.sumReduce(value_out_local); + value_in_global = Dm->Comm.sumReduce(value_in_local); + value_membrane_out_global = Dm->Comm.sumReduce(value_membrane_out_local); + value_membrane_in_global = Dm->Comm.sumReduce(value_membrane_in_local); + + value_out_global /= out_global_count; + value_in_global /= in_global_count; + value_membrane_out_global /= membrane_out_global_count; + value_membrane_in_global /= membrane_in_global_count; + + if (Dm->rank() == 0) { + fprintf(TIMELOG, "%.8g ", value_out_global); + fprintf(TIMELOG, "%.8g ", value_in_global); + fprintf(TIMELOG, "%.8g ", value_membrane_out_global); + fprintf(TIMELOG, "%.8g ", value_membrane_in_global); + } + + value_membrane_in_local = 0.0; + value_membrane_out_local = 0.0; + value_in_local = 0.0; + value_out_local = 0.0; + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + Ion.getIonConcentration(Rho, ion); + value_membrane_in_local = 0.0; + value_membrane_out_local = 0.0; + value_in_local = 0.0; + value_out_local = 0.0; + for (k = 1; k < Nz; k++) { + for (j = 1; j < Ny; j++) { + for (i = 1; i < Nx; i++) { + /* electric potential */ + memdist = Ion.MembraneDistance(i,j,k); + value = Rho(i,j,k); + if (memdist < 0.0){ + // inside the membrane + if (fabs(memdist) < 1.0){ + value_membrane_in_local += value; + } + value_in_local += value; + } + else { + // outside the membrane + if (fabs(memdist) < 1.0){ + value_membrane_out_local += value; + } + value_out_local += value; + } + } + } + } + value_out_global = Dm->Comm.sumReduce(value_out_local); + value_in_global = Dm->Comm.sumReduce(value_in_local); + value_membrane_out_global = Dm->Comm.sumReduce(value_membrane_out_local); + value_membrane_in_global = Dm->Comm.sumReduce(value_membrane_in_local); + + value_out_global /= out_global_count; + value_in_global /= in_global_count; + value_membrane_out_global /= membrane_out_global_count; + value_membrane_in_global /= membrane_in_global_count; + + if (Dm->rank() == 0) { + fprintf(TIMELOG, "%.8g ", value_out_global); + fprintf(TIMELOG, "%.8g ", value_in_global); + fprintf(TIMELOG, "%.8g ", value_membrane_out_global); + fprintf(TIMELOG, "%.8g ", value_membrane_in_global); + } + } + + if (Dm->rank() == 0) { + fprintf(TIMELOG, "%lu ", out_global_count); + fprintf(TIMELOG, "%lu ", in_global_count); + fprintf(TIMELOG, "%lu ", membrane_out_global_count); + fprintf(TIMELOG, "%lu\n", membrane_in_global_count); + fflush(TIMELOG); + } +} + + void ElectroChemistryAnalyzer::Basic(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, ScaLBL_StokesModel &Stokes, int timestep) { @@ -595,3 +833,408 @@ void ElectroChemistryAnalyzer::WriteVis(ScaLBL_IonModel &Ion, } */ } + +void ElectroChemistryAnalyzer::Basic(ScaLBL_IonModel &Ion, + ScaLBL_Poisson &Poisson, + int timestep) { + + int i, j, k; + double Vin = 0.0; + double Vout = 0.0; + Poisson.getElectricPotential(ElectricalPotential); + + /* local sub-domain averages */ + double *rho_avg_local; + double *rho_mu_avg_local; + double *rho_mu_fluctuation_local; + double *rho_psi_avg_local; + double *rho_psi_fluctuation_local; + /* global averages */ + double *rho_avg_global; + double *rho_mu_avg_global; + double *rho_mu_fluctuation_global; + double *rho_psi_avg_global; + double *rho_psi_fluctuation_global; + + /* Get the distance to the membrane */ + if (Ion.USE_MEMBRANE){ + //Ion.MembraneDistance; + } + + /* local sub-domain averages */ + rho_avg_local = new double[Ion.number_ion_species]; + rho_mu_avg_local = new double[Ion.number_ion_species]; + rho_mu_fluctuation_local = new double[Ion.number_ion_species]; + rho_psi_avg_local = new double[Ion.number_ion_species]; + rho_psi_fluctuation_local = new double[Ion.number_ion_species]; + /* global averages */ + rho_avg_global = new double[Ion.number_ion_species]; + rho_mu_avg_global = new double[Ion.number_ion_species]; + rho_mu_fluctuation_global = new double[Ion.number_ion_species]; + rho_psi_avg_global = new double[Ion.number_ion_species]; + rho_psi_fluctuation_global = new double[Ion.number_ion_species]; + + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + rho_avg_local[ion] = 0.0; + rho_mu_avg_local[ion] = 0.0; + rho_psi_avg_local[ion] = 0.0; + Ion.getIonConcentration(Rho, ion); + /* Compute averages for each ion */ + for (k = 1; k < Nz; k++) { + for (j = 1; j < Ny; j++) { + for (i = 1; i < Nx; i++) { + rho_avg_local[ion] += Rho(i, j, k); + rho_mu_avg_local[ion] += Rho(i, j, k) * Rho(i, j, k); + rho_psi_avg_local[ion] += + Rho(i, j, k) * ElectricalPotential(i, j, k); + } + } + } + rho_avg_global[ion] = Dm->Comm.sumReduce(rho_avg_local[ion]) / Volume; + rho_mu_avg_global[ion] = + Dm->Comm.sumReduce(rho_mu_avg_local[ion]) / Volume; + rho_psi_avg_global[ion] = + Dm->Comm.sumReduce(rho_psi_avg_local[ion]) / Volume; + + if (rho_avg_global[ion] > 0.0) { + rho_mu_avg_global[ion] /= rho_avg_global[ion]; + rho_psi_avg_global[ion] /= rho_avg_global[ion]; + } + } + + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + rho_mu_fluctuation_local[ion] = 0.0; + rho_psi_fluctuation_local[ion] = 0.0; + /* Compute averages for each ion */ + for (k = 1; k < Nz; k++) { + for (j = 1; j < Ny; j++) { + for (i = 1; i < Nx; i++) { + rho_mu_fluctuation_local[ion] += + (Rho(i, j, k) * Rho(i, j, k) - rho_mu_avg_global[ion]); + rho_psi_fluctuation_local[ion] += + (Rho(i, j, k) * ElectricalPotential(i, j, k) - + rho_psi_avg_global[ion]); + } + } + } + rho_mu_fluctuation_global[ion] = + Dm->Comm.sumReduce(rho_mu_fluctuation_local[ion]); + rho_psi_fluctuation_global[ion] = + Dm->Comm.sumReduce(rho_psi_fluctuation_local[ion]); + } + + if (Dm->rank() == 0) { + fprintf(TIMELOG, "%i ", timestep); + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + fprintf(TIMELOG, "%.8g ", rho_avg_global[ion]); + fprintf(TIMELOG, "%.8g ", rho_mu_avg_global[ion]); + fprintf(TIMELOG, "%.8g ", rho_psi_avg_global[ion]); + fprintf(TIMELOG, "%.8g ", rho_mu_fluctuation_global[ion]); + fprintf(TIMELOG, "%.8g ", rho_psi_fluctuation_global[ion]); + } + fprintf(TIMELOG, "%.8g %.8g\n", Vin, Vout); + fflush(TIMELOG); + } + /* else{ + fprintf(TIMELOG,"%i ",timestep); + for (int ion=0; ion input_db, + int timestep) { + + auto vis_db = input_db->getDatabase("Visualization"); + char VisName[40]; + auto format = vis_db->getWithDefault( "format", "hdf5" ); + + std::vector visData; + fillHalo fillData(Dm->Comm, Dm->rank_info, + {Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2}, {1, 1, 1}, + 0, 1); + + IO::initialize("",format,"false"); + // Create the MeshDataStruct + visData.resize(1); + + visData[0].meshName = "domain"; + visData[0].mesh = + std::make_shared(Dm->rank_info, Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2, Dm->Lx, Dm->Ly, Dm->Lz); + //electric potential + auto ElectricPotentialVar = std::make_shared(); + //electric field + auto ElectricFieldVar_x = std::make_shared(); + auto ElectricFieldVar_y = std::make_shared(); + auto ElectricFieldVar_z = std::make_shared(); + + //ion concentration + std::vector> IonConcentration; + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + IonConcentration.push_back(std::make_shared()); + } + + // diffusive ion flux + std::vector> IonFluxDiffusive; + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + //push in x-,y-, and z-component for each ion species + IonFluxDiffusive.push_back(std::make_shared()); + IonFluxDiffusive.push_back(std::make_shared()); + IonFluxDiffusive.push_back(std::make_shared()); + } + + // electro-migrational ion flux + std::vector> IonFluxElectrical; + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + //push in x-,y-, and z-component for each ion species + IonFluxElectrical.push_back(std::make_shared()); + IonFluxElectrical.push_back(std::make_shared()); + IonFluxElectrical.push_back(std::make_shared()); + } + //-------------------------------------------------------------------------------------------------------------------- + + //-------------------------------------Create Names for Variables------------------------------------------------------ + if (vis_db->getWithDefault("save_electric_potential", true)) { + ElectricPotentialVar->name = "ElectricPotential"; + ElectricPotentialVar->type = IO::VariableType::VolumeVariable; + ElectricPotentialVar->dim = 1; + ElectricPotentialVar->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(ElectricPotentialVar); + } + + if (vis_db->getWithDefault("save_concentration", true)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + sprintf(VisName, "IonConcentration_%zu", ion + 1); + IonConcentration[ion]->name = VisName; + IonConcentration[ion]->type = IO::VariableType::VolumeVariable; + IonConcentration[ion]->dim = 1; + IonConcentration[ion]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonConcentration[ion]); + } + } + + if (vis_db->getWithDefault("save_ion_flux_diffusive", false)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + // x-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_x", ion + 1); + IonFluxDiffusive[3 * ion + 0]->name = VisName; + IonFluxDiffusive[3 * ion + 0]->type = + IO::VariableType::VolumeVariable; + IonFluxDiffusive[3 * ion + 0]->dim = 1; + IonFluxDiffusive[3 * ion + 0]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxDiffusive[3 * ion + 0]); + // y-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_y", ion + 1); + IonFluxDiffusive[3 * ion + 1]->name = VisName; + IonFluxDiffusive[3 * ion + 1]->type = + IO::VariableType::VolumeVariable; + IonFluxDiffusive[3 * ion + 1]->dim = 1; + IonFluxDiffusive[3 * ion + 1]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxDiffusive[3 * ion + 1]); + // z-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_z", ion + 1); + IonFluxDiffusive[3 * ion + 2]->name = VisName; + IonFluxDiffusive[3 * ion + 2]->type = + IO::VariableType::VolumeVariable; + IonFluxDiffusive[3 * ion + 2]->dim = 1; + IonFluxDiffusive[3 * ion + 2]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxDiffusive[3 * ion + 2]); + } + } + + + if (vis_db->getWithDefault("save_ion_flux_electrical", false)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + // x-component of electro-migrational flux + sprintf(VisName, "Ion%zu_FluxElectrical_x", ion + 1); + IonFluxElectrical[3 * ion + 0]->name = VisName; + IonFluxElectrical[3 * ion + 0]->type = + IO::VariableType::VolumeVariable; + IonFluxElectrical[3 * ion + 0]->dim = 1; + IonFluxElectrical[3 * ion + 0]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxElectrical[3 * ion + 0]); + // y-component of electro-migrational flux + sprintf(VisName, "Ion%zu_FluxElectrical_y", ion + 1); + IonFluxElectrical[3 * ion + 1]->name = VisName; + IonFluxElectrical[3 * ion + 1]->type = + IO::VariableType::VolumeVariable; + IonFluxElectrical[3 * ion + 1]->dim = 1; + IonFluxElectrical[3 * ion + 1]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxElectrical[3 * ion + 1]); + // z-component of electro-migrational flux + sprintf(VisName, "Ion%zu_FluxElectrical_z", ion + 1); + IonFluxElectrical[3 * ion + 2]->name = VisName; + IonFluxElectrical[3 * ion + 2]->type = + IO::VariableType::VolumeVariable; + IonFluxElectrical[3 * ion + 2]->dim = 1; + IonFluxElectrical[3 * ion + 2]->data.resize(Dm->Nx - 2, Dm->Ny - 2, + Dm->Nz - 2); + visData[0].vars.push_back(IonFluxElectrical[3 * ion + 2]); + } + } + + if (vis_db->getWithDefault("save_electric_field", false)) { + ElectricFieldVar_x->name = "ElectricField_x"; + ElectricFieldVar_x->type = IO::VariableType::VolumeVariable; + ElectricFieldVar_x->dim = 1; + ElectricFieldVar_x->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(ElectricFieldVar_x); + ElectricFieldVar_y->name = "ElectricField_y"; + ElectricFieldVar_y->type = IO::VariableType::VolumeVariable; + ElectricFieldVar_y->dim = 1; + ElectricFieldVar_y->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(ElectricFieldVar_y); + ElectricFieldVar_z->name = "ElectricField_z"; + ElectricFieldVar_z->type = IO::VariableType::VolumeVariable; + ElectricFieldVar_z->dim = 1; + ElectricFieldVar_z->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(ElectricFieldVar_z); + } + //-------------------------------------------------------------------------------------------------------------------- + + //------------------------------------Save All Variables-------------------------------------------------------------- + if (vis_db->getWithDefault("save_electric_potential", true)) { + ASSERT(visData[0].vars[0]->name == "ElectricPotential"); + Poisson.getElectricPotential(ElectricalPotential); + Array &ElectricPotentialData = visData[0].vars[0]->data; + fillData.copy(ElectricalPotential, ElectricPotentialData); + } + + if (vis_db->getWithDefault("save_concentration", true)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + sprintf(VisName, "IonConcentration_%zu", ion + 1); + //IonConcentration[ion]->name = VisName; + ASSERT(visData[0].vars[1 + ion]->name == VisName); + Array &IonConcentrationData = + visData[0].vars[1 + ion]->data; + Ion.getIonConcentration(Rho, ion); + fillData.copy(Rho, IonConcentrationData); + } + } + + if (vis_db->getWithDefault("save_ion_flux_diffusive", false)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + + // x-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_x", ion + 1); + //IonFluxDiffusive[3*ion+0]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species + 3 * ion + 0] + ->name == VisName); + // y-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_y", ion + 1); + //IonFluxDiffusive[3*ion+1]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species + 3 * ion + 1] + ->name == VisName); + // z-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxDiffusive_z", ion + 1); + //IonFluxDiffusive[3*ion+2]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species + 3 * ion + 2] + ->name == VisName); + + Array &IonFluxData_x = + visData[0].vars[4 + Ion.number_ion_species + 3 * ion + 0]->data; + Array &IonFluxData_y = + visData[0].vars[4 + Ion.number_ion_species + 3 * ion + 1]->data; + Array &IonFluxData_z = + visData[0].vars[4 + Ion.number_ion_species + 3 * ion + 2]->data; + Ion.getIonFluxDiffusive(IonFluxDiffusive_x, IonFluxDiffusive_y, + IonFluxDiffusive_z, ion); + fillData.copy(IonFluxDiffusive_x, IonFluxData_x); + fillData.copy(IonFluxDiffusive_y, IonFluxData_y); + fillData.copy(IonFluxDiffusive_z, IonFluxData_z); + } + } + + if (vis_db->getWithDefault("save_ion_flux_electrical", false)) { + for (size_t ion = 0; ion < Ion.number_ion_species; ion++) { + + // x-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxElectrical_x", ion + 1); + //IonFluxDiffusive[3*ion+0]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 0] + ->name == VisName); + // y-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxElectrical_y", ion + 1); + //IonFluxDiffusive[3*ion+1]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 1] + ->name == VisName); + // z-component of diffusive flux + sprintf(VisName, "Ion%zu_FluxElectrical_z", ion + 1); + //IonFluxDiffusive[3*ion+2]->name = VisName; + ASSERT(visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 2] + ->name == VisName); + + Array &IonFluxData_x = + visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 0] + ->data; + Array &IonFluxData_y = + visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 1] + ->data; + Array &IonFluxData_z = + visData[0] + .vars[4 + Ion.number_ion_species * (1 + 6) + 3 * ion + 2] + ->data; + Ion.getIonFluxElectrical(IonFluxElectrical_x, IonFluxElectrical_y, + IonFluxElectrical_z, ion); + fillData.copy(IonFluxElectrical_x, IonFluxData_x); + fillData.copy(IonFluxElectrical_y, IonFluxData_y); + fillData.copy(IonFluxElectrical_z, IonFluxData_z); + } + } + + if (vis_db->getWithDefault("save_electric_field", false)) { + ASSERT( + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 0]->name == + "ElectricField_x"); + ASSERT( + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 1]->name == + "ElectricField_y"); + ASSERT( + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 2]->name == + "ElectricField_z"); + Poisson.getElectricField(ElectricalField_x, ElectricalField_y, + ElectricalField_z); + Array &ElectricalFieldxData = + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 0]->data; + Array &ElectricalFieldyData = + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 1]->data; + Array &ElectricalFieldzData = + visData[0].vars[4 + Ion.number_ion_species * (1 + 9) + 2]->data; + fillData.copy(ElectricalField_x, ElectricalFieldxData); + fillData.copy(ElectricalField_y, ElectricalFieldyData); + fillData.copy(ElectricalField_z, ElectricalFieldzData); + } + + if (vis_db->getWithDefault("write_silo", true)) + IO::writeData(timestep, visData, Dm->Comm); + //-------------------------------------------------------------------------------------------------------------------- + /* if (vis_db->getWithDefault( "save_8bit_raw", true )){ + char CurrentIDFilename[40]; + sprintf(CurrentIDFilename,"id_t%d.raw",timestep); + Averages.AggregateLabels(CurrentIDFilename); + } +*/ +} diff --git a/analysis/ElectroChemistry.h b/analysis/ElectroChemistry.h index 1b739611..f7d3031a 100644 --- a/analysis/ElectroChemistry.h +++ b/analysis/ElectroChemistry.h @@ -29,6 +29,8 @@ public: double nu_n, nu_w; double gamma_wn, beta; double Fx, Fy, Fz; + + bool USE_MEMBRANE; //........................................................................... int Nx, Ny, Nz; @@ -54,13 +56,16 @@ public: DoubleArray IonFluxElectrical_z; ElectroChemistryAnalyzer(std::shared_ptr Dm); + ElectroChemistryAnalyzer( ScaLBL_IonModel &IonModel); ~ElectroChemistryAnalyzer(); void SetParams(); - void Basic(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, - ScaLBL_StokesModel &Stokes, int timestep); + void Basic(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, ScaLBL_StokesModel &Stokes, int timestep); + void Membrane(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, int timestep); + void WriteVis(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, + ScaLBL_StokesModel &Stokes,std::shared_ptr input_db, int timestep); + void Basic(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, int timestep); void WriteVis(ScaLBL_IonModel &Ion, ScaLBL_Poisson &Poisson, - ScaLBL_StokesModel &Stokes, std::shared_ptr input_db, int timestep); private: diff --git a/analysis/Minkowski.cpp b/analysis/Minkowski.cpp index 938828f8..e6bfb55d 100644 --- a/analysis/Minkowski.cpp +++ b/analysis/Minkowski.cpp @@ -67,6 +67,7 @@ Minkowski::~Minkowski() { void Minkowski::ComputeScalar(const DoubleArray &Field, const double isovalue) { PROFILE_START("ComputeScalar"); + Xi = Ji = Ai = 0.0; DCEL object; int e1, e2, e3; @@ -160,6 +161,7 @@ void Minkowski::MeasureObject() { * 1 - labels the rest of the */ //DoubleArray smooth_distance(Nx,Ny,Nz); + for (int k = 0; k < Nz; k++) { for (int j = 0; j < Ny; j++) { for (int i = 0; i < Nx; i++) { @@ -168,6 +170,8 @@ void Minkowski::MeasureObject() { } } CalcDist(distance, id, *Dm); + Dm->CommunicateMeshHalo(distance); + //Mean3D(distance,smooth_distance); //Eikonal(distance, id, *Dm, 20, {true, true, true}); ComputeScalar(distance, 0.0); @@ -179,7 +183,7 @@ void Minkowski::MeasureObject(double factor, const DoubleArray &Phi) { * * THIS ALGORITHM ASSUMES THAT id() is populated with phase id to distinguish objects * 0 - labels the object - * 1 - labels the rest of the + * 1 - labels the rest */ for (int k = 0; k < Nz; k++) { for (int j = 0; j < Ny; j++) { diff --git a/analysis/SubPhase.cpp b/analysis/SubPhase.cpp index b176750b..dafd31fd 100644 --- a/analysis/SubPhase.cpp +++ b/analysis/SubPhase.cpp @@ -411,6 +411,7 @@ void SubPhase::Basic() { dir_z = 1.0; force_mag = 1.0; } + double Porosity = (gwb.V + gnb.V)/Dm->Volume; double saturation = gwb.V / (gwb.V + gnb.V); double water_flow_rate = gwb.V * (gwb.Px * dir_x + gwb.Py * dir_y + gwb.Pz * dir_z) / gwb.M / @@ -429,11 +430,11 @@ void SubPhase::Basic() { //double total_flow_rate = water_flow_rate + not_water_flow_rate; //double fractional_flow = water_flow_rate / total_flow_rate; double h = Dm->voxel_length; - double krn = h * h * nu_n * not_water_flow_rate / force_mag; - double krw = h * h * nu_w * water_flow_rate / force_mag; + double krn = h * h * nu_n * Porosity * not_water_flow_rate / force_mag; + double krw = h * h * nu_w * Porosity* water_flow_rate / force_mag; /* not counting films */ - double krnf = krn - h * h * nu_n * not_water_film_flow_rate / force_mag; - double krwf = krw - h * h * nu_w * water_film_flow_rate / force_mag; + double krnf = krn - h * h * nu_n * Porosity * not_water_film_flow_rate / force_mag; + double krwf = krw - h * h * nu_w * Porosity * water_film_flow_rate / force_mag; double eff_pressure = 1.0 / (krn + krw); // effective pressure drop fprintf(TIMELOG, @@ -595,7 +596,7 @@ void SubPhase::Full() { for (j = 0; j < Ny; j++) { for (i = 0; i < Nx; i++) { n = k * Nx * Ny + j * Nx + i; - if (!(Dm->id[n] > 0)) { + if (SDs(n) <= 0.0) { // Solid phase morph_n->id(i, j, k) = 1; @@ -642,7 +643,7 @@ void SubPhase::Full() { for (j = 0; j < Ny; j++) { for (i = 0; i < Nx; i++) { n = k * Nx * Ny + j * Nx + i; - if (!(Dm->id[n] > 0)) { + if (SDs(n) <= 0.0) { // Solid phase morph_w->id(i, j, k) = 1; @@ -688,7 +689,7 @@ void SubPhase::Full() { for (j = 0; j < Ny; j++) { for (i = 0; i < Nx; i++) { n = k * Nx * Ny + j * Nx + i; - if (!(Dm->id[n] > 0)) { + if (SDs(n) <= 0.0) { // Solid phase morph_i->id(i, j, k) = 1; } else if (DelPhi(n) > 1e-4) { @@ -731,7 +732,7 @@ void SubPhase::Full() { for (i = imin; i < Nx - 1; i++) { n = k * Nx * Ny + j * Nx + i; // Compute volume averages - if (Dm->id[n] > 0) { + if (SDs(n) > 0.0) { // compute density double nA = Rho_n(n); double nB = Rho_w(n); diff --git a/cmake/macros.cmake b/cmake/macros.cmake index 47150cf0..56e9a684 100644 --- a/cmake/macros.cmake +++ b/cmake/macros.cmake @@ -193,6 +193,9 @@ MACRO( FIND_FILES ) # Find the CUDA sources SET( T_CUDASOURCES "" ) FILE( GLOB T_CUDASOURCES "*.cu" ) + # Find the HIP sources + SET( T_HIPSOURCES "" ) + FILE( GLOB T_HIPSOURCES "*.hip" ) # Find the C sources SET( T_CSOURCES "" ) FILE( GLOB T_CSOURCES "*.c" ) @@ -212,10 +215,11 @@ MACRO( FIND_FILES ) SET( HEADERS ${HEADERS} ${T_HEADERS} ) SET( CXXSOURCES ${CXXSOURCES} ${T_CXXSOURCES} ) SET( CUDASOURCES ${CUDASOURCES} ${T_CUDASOURCES} ) + SET( HIPSOURCES ${HIPSOURCES} ${T_HIPSOURCES} ) SET( CSOURCES ${CSOURCES} ${T_CSOURCES} ) SET( FSOURCES ${FSOURCES} ${T_FSOURCES} ) SET( M4FSOURCES ${M4FSOURCES} ${T_M4FSOURCES} ) - SET( SOURCES ${SOURCES} ${T_CXXSOURCES} ${T_CSOURCES} ${T_FSOURCES} ${T_M4FSOURCES} ${CUDASOURCES} ) + SET( SOURCES ${SOURCES} ${T_CXXSOURCES} ${T_CSOURCES} ${T_FSOURCES} ${T_M4FSOURCES} ${CUDASOURCES} ${HIPSOURCES} ) ENDMACRO() @@ -227,6 +231,9 @@ MACRO( FIND_FILES_PATH IN_PATH ) # Find the CUDA sources SET( T_CUDASOURCES "" ) FILE( GLOB T_CUDASOURCES "${IN_PATH}/*.cu" ) + # Find the HIP sources + SET( T_HIPSOURCES "" ) + FILE( GLOB T_HIPSOURCES "${IN_PATH}/*.hip" ) # Find the C sources SET( T_CSOURCES "" ) FILE( GLOB T_CSOURCES "${IN_PATH}/*.c" ) @@ -246,9 +253,10 @@ MACRO( FIND_FILES_PATH IN_PATH ) SET( HEADERS ${HEADERS} ${T_HEADERS} ) SET( CXXSOURCES ${CXXSOURCES} ${T_CXXSOURCES} ) SET( CUDASOURCES ${CUDASOURCES} ${T_CUDASOURCES} ) + SET( HIPSOURCES ${HIPSOURCES} ${T_HIPSOURCES} ) SET( CSOURCES ${CSOURCES} ${T_CSOURCES} ) SET( FSOURCES ${FSOURCES} ${T_FSOURCES} ) - SET( SOURCES ${SOURCES} ${T_CXXSOURCES} ${T_CSOURCES} ${T_FSOURCES} ${CUDASOURCES} ) + SET( SOURCES ${SOURCES} ${T_CXXSOURCES} ${T_CSOURCES} ${T_FSOURCES} ${CUDASOURCES} ${HIPSOURCES} ) ENDMACRO() diff --git a/common/Array.h b/common/Array.h index 8f76f518..423a33f6 100644 --- a/common/Array.h +++ b/common/Array.h @@ -20,10 +20,12 @@ #include "common/ArraySize.h" #include +#include #include #include #include #include +#include #include #include diff --git a/common/ArraySize.h b/common/ArraySize.h index a4df4191..5661bcc8 100644 --- a/common/ArraySize.h +++ b/common/ArraySize.h @@ -4,11 +4,13 @@ #include "common/Utilities.h" #include +#include #include #include #include #include #include +#include #include #if defined(__CUDA_ARCH__) diff --git a/common/Communication.h b/common/Communication.h index 6df1e1d4..045c20c7 100644 --- a/common/Communication.h +++ b/common/Communication.h @@ -208,72 +208,68 @@ inline void CommunicateSendRecvCounts( } //*************************************************************************************** -inline void CommunicateRecvLists( - const Utilities::MPI &comm, int sendtag, int recvtag, int *sendList_x, - int *sendList_y, int *sendList_z, int *sendList_X, int *sendList_Y, - int *sendList_Z, int *sendList_xy, int *sendList_XY, int *sendList_xY, - int *sendList_Xy, int *sendList_xz, int *sendList_XZ, int *sendList_xZ, - int *sendList_Xz, int *sendList_yz, int *sendList_YZ, int *sendList_yZ, - int *sendList_Yz, int sendCount_x, int sendCount_y, int sendCount_z, - int sendCount_X, int sendCount_Y, int sendCount_Z, int sendCount_xy, - int sendCount_XY, int sendCount_xY, int sendCount_Xy, int sendCount_xz, - int sendCount_XZ, int sendCount_xZ, int sendCount_Xz, int sendCount_yz, - int sendCount_YZ, int sendCount_yZ, int sendCount_Yz, int *recvList_x, - int *recvList_y, int *recvList_z, int *recvList_X, int *recvList_Y, - int *recvList_Z, int *recvList_xy, int *recvList_XY, int *recvList_xY, - int *recvList_Xy, int *recvList_xz, int *recvList_XZ, int *recvList_xZ, - int *recvList_Xz, int *recvList_yz, int *recvList_YZ, int *recvList_yZ, - int *recvList_Yz, int recvCount_x, int recvCount_y, int recvCount_z, - int recvCount_X, int recvCount_Y, int recvCount_Z, int recvCount_xy, - int recvCount_XY, int recvCount_xY, int recvCount_Xy, int recvCount_xz, - int recvCount_XZ, int recvCount_xZ, int recvCount_Xz, int recvCount_yz, - int recvCount_YZ, int recvCount_yZ, int recvCount_Yz, int rank_x, - int rank_y, int rank_z, int rank_X, int rank_Y, int rank_Z, int rank_xy, - int rank_XY, int rank_xY, int rank_Xy, int rank_xz, int rank_XZ, - int rank_xZ, int rank_Xz, int rank_yz, int rank_YZ, int rank_yZ, - int rank_Yz) { - MPI_Request req1[18], req2[18]; - req1[0] = comm.Isend(sendList_x, sendCount_x, rank_x, sendtag); - req2[0] = comm.Irecv(recvList_X, recvCount_X, rank_X, recvtag); - req1[1] = comm.Isend(sendList_X, sendCount_X, rank_X, sendtag); - req2[1] = comm.Irecv(recvList_x, recvCount_x, rank_x, recvtag); - req1[2] = comm.Isend(sendList_y, sendCount_y, rank_y, sendtag); - req2[2] = comm.Irecv(recvList_Y, recvCount_Y, rank_Y, recvtag); - req1[3] = comm.Isend(sendList_Y, sendCount_Y, rank_Y, sendtag); - req2[3] = comm.Irecv(recvList_y, recvCount_y, rank_y, recvtag); - req1[4] = comm.Isend(sendList_z, sendCount_z, rank_z, sendtag); - req2[4] = comm.Irecv(recvList_Z, recvCount_Z, rank_Z, recvtag); - req1[5] = comm.Isend(sendList_Z, sendCount_Z, rank_Z, sendtag); - req2[5] = comm.Irecv(recvList_z, recvCount_z, rank_z, recvtag); +inline void CommunicateRecvLists( const Utilities::MPI& comm, int sendtag, int recvtag, + int *sendList_x, int *sendList_y, int *sendList_z, int *sendList_X, int *sendList_Y, int *sendList_Z, + int *sendList_xy, int *sendList_XY, int *sendList_xY, int *sendList_Xy, + int *sendList_xz, int *sendList_XZ, int *sendList_xZ, int *sendList_Xz, + int *sendList_yz, int *sendList_YZ, int *sendList_yZ, int *sendList_Yz, + int sendCount_x, int sendCount_y, int sendCount_z, int sendCount_X, int sendCount_Y, int sendCount_Z, + int sendCount_xy, int sendCount_XY, int sendCount_xY, int sendCount_Xy, + int sendCount_xz, int sendCount_XZ, int sendCount_xZ, int sendCount_Xz, + int sendCount_yz, int sendCount_YZ, int sendCount_yZ, int sendCount_Yz, + int *recvList_x, int *recvList_y, int *recvList_z, int *recvList_X, int *recvList_Y, int *recvList_Z, + int *recvList_xy, int *recvList_XY, int *recvList_xY, int *recvList_Xy, + int *recvList_xz, int *recvList_XZ, int *recvList_xZ, int *recvList_Xz, + int *recvList_yz, int *recvList_YZ, int *recvList_yZ, int *recvList_Yz, + int recvCount_x, int recvCount_y, int recvCount_z, int recvCount_X, int recvCount_Y, int recvCount_Z, + int recvCount_xy, int recvCount_XY, int recvCount_xY, int recvCount_Xy, + int recvCount_xz, int recvCount_XZ, int recvCount_xZ, int recvCount_Xz, + int recvCount_yz, int recvCount_YZ, int recvCount_yZ, int recvCount_Yz, + int rank_x, int rank_y, int rank_z, int rank_X, int rank_Y, int rank_Z, int rank_xy, int rank_XY, int rank_xY, + int rank_Xy, int rank_xz, int rank_XZ, int rank_xZ, int rank_Xz, int rank_yz, int rank_YZ, int rank_yZ, int rank_Yz) +{ + MPI_Request req1[18], req2[18]; + req1[0] = comm.Isend(sendList_x,sendCount_x,rank_x,sendtag+0); + req2[0] = comm.Irecv(recvList_X,recvCount_X,rank_X,recvtag+0); + req1[1] = comm.Isend(sendList_X,sendCount_X,rank_X,sendtag+1); + req2[1] = comm.Irecv(recvList_x,recvCount_x,rank_x,recvtag+1); + req1[2] = comm.Isend(sendList_y,sendCount_y,rank_y,sendtag+2); + req2[2] = comm.Irecv(recvList_Y,recvCount_Y,rank_Y,recvtag+2); + req1[3] = comm.Isend(sendList_Y,sendCount_Y,rank_Y,sendtag+3); + req2[3] = comm.Irecv(recvList_y,recvCount_y,rank_y,recvtag+3); + req1[4] = comm.Isend(sendList_z,sendCount_z,rank_z,sendtag+4); + req2[4] = comm.Irecv(recvList_Z,recvCount_Z,rank_Z,recvtag+4); + req1[5] = comm.Isend(sendList_Z,sendCount_Z,rank_Z,sendtag+5); + req2[5] = comm.Irecv(recvList_z,recvCount_z,rank_z,recvtag+5); - req1[6] = comm.Isend(sendList_xy, sendCount_xy, rank_xy, sendtag); - req2[6] = comm.Irecv(recvList_XY, recvCount_XY, rank_XY, recvtag); - req1[7] = comm.Isend(sendList_XY, sendCount_XY, rank_XY, sendtag); - req2[7] = comm.Irecv(recvList_xy, recvCount_xy, rank_xy, recvtag); - req1[8] = comm.Isend(sendList_Xy, sendCount_Xy, rank_Xy, sendtag); - req2[8] = comm.Irecv(recvList_xY, recvCount_xY, rank_xY, recvtag); - req1[9] = comm.Isend(sendList_xY, sendCount_xY, rank_xY, sendtag); - req2[9] = comm.Irecv(recvList_Xy, recvCount_Xy, rank_Xy, recvtag); + req1[6] = comm.Isend(sendList_xy,sendCount_xy,rank_xy,sendtag+6); + req2[6] = comm.Irecv(recvList_XY,recvCount_XY,rank_XY,recvtag+6); + req1[7] = comm.Isend(sendList_XY,sendCount_XY,rank_XY,sendtag+7); + req2[7] = comm.Irecv(recvList_xy,recvCount_xy,rank_xy,recvtag+7); + req1[8] = comm.Isend(sendList_Xy,sendCount_Xy,rank_Xy,sendtag+8); + req2[8] = comm.Irecv(recvList_xY,recvCount_xY,rank_xY,recvtag+8); + req1[9] = comm.Isend(sendList_xY,sendCount_xY,rank_xY,sendtag+9); + req2[9] = comm.Irecv(recvList_Xy,recvCount_Xy,rank_Xy,recvtag+9); - req1[10] = comm.Isend(sendList_xz, sendCount_xz, rank_xz, sendtag); - req2[10] = comm.Irecv(recvList_XZ, recvCount_XZ, rank_XZ, recvtag); - req1[11] = comm.Isend(sendList_XZ, sendCount_XZ, rank_XZ, sendtag); - req2[11] = comm.Irecv(recvList_xz, recvCount_xz, rank_xz, recvtag); - req1[12] = comm.Isend(sendList_Xz, sendCount_Xz, rank_Xz, sendtag); - req2[12] = comm.Irecv(recvList_xZ, recvCount_xZ, rank_xZ, recvtag); - req1[13] = comm.Isend(sendList_xZ, sendCount_xZ, rank_xZ, sendtag); - req2[13] = comm.Irecv(recvList_Xz, recvCount_Xz, rank_Xz, recvtag); + req1[10] = comm.Isend(sendList_xz,sendCount_xz,rank_xz,sendtag+10); + req2[10] = comm.Irecv(recvList_XZ,recvCount_XZ,rank_XZ,recvtag+10); + req1[11] = comm.Isend(sendList_XZ,sendCount_XZ,rank_XZ,sendtag+11); + req2[11] = comm.Irecv(recvList_xz,recvCount_xz,rank_xz,recvtag+11); + req1[12] = comm.Isend(sendList_Xz,sendCount_Xz,rank_Xz,sendtag+12); + req2[12] = comm.Irecv(recvList_xZ,recvCount_xZ,rank_xZ,recvtag+12); + req1[13] = comm.Isend(sendList_xZ,sendCount_xZ,rank_xZ,sendtag+13); + req2[13] = comm.Irecv(recvList_Xz,recvCount_Xz,rank_Xz,recvtag+13); - req1[14] = comm.Isend(sendList_yz, sendCount_yz, rank_yz, sendtag); - req2[14] = comm.Irecv(recvList_YZ, recvCount_YZ, rank_YZ, recvtag); - req1[15] = comm.Isend(sendList_YZ, sendCount_YZ, rank_YZ, sendtag); - req2[15] = comm.Irecv(recvList_yz, recvCount_yz, rank_yz, recvtag); - req1[16] = comm.Isend(sendList_Yz, sendCount_Yz, rank_Yz, sendtag); - req2[16] = comm.Irecv(recvList_yZ, recvCount_yZ, rank_yZ, recvtag); - req1[17] = comm.Isend(sendList_yZ, sendCount_yZ, rank_yZ, sendtag); - req2[17] = comm.Irecv(recvList_Yz, recvCount_Yz, rank_Yz, recvtag); - comm.waitAll(18, req1); - comm.waitAll(18, req2); + req1[14] = comm.Isend(sendList_yz,sendCount_yz,rank_yz,sendtag+14); + req2[14] = comm.Irecv(recvList_YZ,recvCount_YZ,rank_YZ,recvtag+14); + req1[15] = comm.Isend(sendList_YZ,sendCount_YZ,rank_YZ,sendtag+15); + req2[15] = comm.Irecv(recvList_yz,recvCount_yz,rank_yz,recvtag+15); + req1[16] = comm.Isend(sendList_Yz,sendCount_Yz,rank_Yz,sendtag+16); + req2[16] = comm.Irecv(recvList_yZ,recvCount_yZ,rank_yZ,recvtag+16); + req1[17] = comm.Isend(sendList_yZ,sendCount_yZ,rank_yZ,sendtag+17); + req2[17] = comm.Irecv(recvList_Yz,recvCount_Yz,rank_Yz,recvtag+17); + comm.waitAll( 18, req1 ); + comm.waitAll( 18, req2 ); } //*************************************************************************************** diff --git a/common/Domain.cpp b/common/Domain.cpp index 2e808411..945a6c6a 100644 --- a/common/Domain.cpp +++ b/common/Domain.cpp @@ -40,6 +40,214 @@ static inline void fgetl(char *str, int num, FILE *stream) { } } +void Domain::read_swc(const std::string &Filename) { + //...... READ IN SWC FILE................................... + int count = 0; + int number_of_lines = 0; + if (rank() == 0){ + cout << "Reading SWC file..." << endl; + { + std::string line; + std::ifstream myfile(Filename); + while (std::getline(myfile, line)) + ++number_of_lines; + number_of_lines -= 1; + } + std::cout << " Number of lines in SWC file: " << number_of_lines << endl; + } + count = Comm.sumReduce(number_of_lines); // nonzero only for rank=0 + number_of_lines = count; + + // set up structures to read + double *List_cx = new double [number_of_lines]; + double *List_cy = new double [number_of_lines]; + double *List_cz = new double [number_of_lines]; + double *List_rad = new double [number_of_lines]; + int *List_index = new int [number_of_lines]; + int *List_parent = new int [number_of_lines]; + int *List_type = new int [number_of_lines]; + + if (rank()==0){ + FILE *fid = fopen(Filename.c_str(), "rb"); + INSIST(fid != NULL, "Error opening SWC file"); + //.........Trash the header lines (x 1).......... + char line[100]; + fgetl(line, 100, fid); + //........read the spheres.................. + // We will read until a blank like or end-of-file is reached + count = 0; + while (!feof(fid) && fgets(line, 100, fid) != NULL) { + char *line2 = line; + List_index[count] = int(strtod(line2, &line2)); + List_type[count] = int(strtod(line2, &line2)); + List_cx[count] = strtod(line2, &line2); + List_cy[count] = strtod(line2, &line2); + List_cz[count] = strtod(line2, &line2); + List_rad[count] = strtod(line2, &line2); + List_parent[count] = int(strtod(line2, &line2)); + count++; + } + fclose( fid ); + cout << " Number of lines extracted is: " << count << endl; + INSIST(count == number_of_lines, "Problem reading swc file!"); + + double min_cx = List_cx[0]-List_rad[0]; + double min_cy = List_cy[0]-List_rad[0]; + double min_cz = List_cz[0]-List_rad[0]; + for (count=1; count Nx-1 ) start_idx = Nx; + if (start_idy > Ny-1 ) start_idy = Ny; + if (start_idz > Nz-1 ) start_idz = Nz; + if (finish_idx < 0 ) finish_idx = 0; + if (finish_idy < 0 ) finish_idy = 0; + if (finish_idz < 0 ) finish_idz = 0; + if (finish_idx > Nx-1 ) finish_idx = Nx; + if (finish_idy > Ny-1 ) finish_idy = Ny; + if (finish_idz > Nz-1 ) finish_idz = Nz; + + /* if (rank()==1) printf(" alpha = %f, beta = %f, gamma= %f\n",alpha, beta,gamma); + if (rank()==1) printf(" xi = %f, yi = %f, zi= %f, ri = %f \n",xi, yi, zi, ri); + if (rank()==1) printf(" xp = %f, yp = %f, zp= %f, rp = %f \n",xp, yp, zp, rp); + + if (rank()==1) printf( "start: %i, %i, %i \n",start_idx,start_idy,start_idz); + if (rank()==1) printf( "finish: %i, %i, %i \n",finish_idx,finish_idy,finish_idz); + */ + + for (int k = start_idz; k length){ + distance = ri - sqrt((x-xi)*(x-xi) + (y-yi)*(y-yi) + (z-zi)*(z-zi)); + } + else if (s < 0.0){ + distance = rp - sqrt((x-xp)*(x-xp) + (y-yp)*(y-yp) + (z-zp)*(z-zp)); + } + else { + // linear variation for radius + double radius = rp + (ri - rp)*s/length; + distance = radius - sqrt((x-xp-alpha*s)*(x-xp-alpha*s) + (y-yp-beta*s)*(y-yp-beta*s) + (z-zp-gamma*s)*(z-zp-gamma*s)); + } + if ( distance > 0.0 ){ + /* label the voxel */ + //id[k*Nx*Ny + j*Nx + i] = label; + id[k*Nx*Ny + j*Nx + i] = 2; + } + } + } + } + //if (rank()==0) printf( "next line..\n"); + } + delete[] List_cx; + delete[] List_cy; + delete[] List_cz; + delete[] List_rad; + delete[] List_index; + delete[] List_type; + delete[] List_parent; +} + /******************************************************** * Constructors * ********************************************************/ @@ -101,6 +309,7 @@ void Domain::initialize(std::shared_ptr db) { int nx = n[0]; int ny = n[1]; int nz = n[2]; + offset_x = offset_y = offset_z = 0; if (d_db->keyExists("InletLayers")) { auto InletCount = d_db->getVector("InletLayers"); @@ -302,6 +511,9 @@ void Domain::Decomp(const std::string &Filename) { xStart = offset[0]; yStart = offset[1]; zStart = offset[2]; + offset_x = xStart; + offset_y = yStart; + offset_z = zStart; } if (database->keyExists("InletLayers")) { auto InletCount = database->getVector("InletLayers"); @@ -333,380 +545,391 @@ void Domain::Decomp(const std::string &Filename) { if (ReadType == "8bit") { } else if (ReadType == "16bit") { + } else if (ReadType == "swc") { } else { //printf("INPUT ERROR: Valid ReadType are 8bit, 16bit \n"); ReadType = "8bit"; } - nx = size[0]; - ny = size[1]; - nz = size[2]; - nprocx = nproc[0]; - nprocy = nproc[1]; - nprocz = nproc[2]; - global_Nx = SIZE[0]; - global_Ny = SIZE[1]; - global_Nz = SIZE[2]; - nprocs = nprocx * nprocy * nprocz; - char *SegData = NULL; - - if (RANK == 0) { - printf("Input media: %s\n", Filename.c_str()); - printf("Relabeling %lu values\n", ReadValues.size()); - for (size_t idx = 0; idx < ReadValues.size(); idx++) { - int oldvalue = ReadValues[idx]; - int newvalue = WriteValues[idx]; - printf("oldvalue=%d, newvalue =%d \n", oldvalue, newvalue); - } - - // Rank=0 reads the entire segmented data and distributes to worker processes - printf("Dimensions of segmented image: %ld x %ld x %ld \n", global_Nx, - global_Ny, global_Nz); - int64_t SIZE = global_Nx * global_Ny * global_Nz; - SegData = new char[SIZE]; - if (ReadType == "8bit") { - printf("Reading 8-bit input data \n"); - FILE *SEGDAT = fopen(Filename.c_str(), "rb"); - if (SEGDAT == NULL) - ERROR("Domain.cpp: Error reading segmented data"); - size_t ReadSeg; - ReadSeg = fread(SegData, 1, SIZE, SEGDAT); - if (ReadSeg != size_t(SIZE)) - printf("Domain.cpp: Error reading segmented data \n"); - fclose(SEGDAT); - } else if (ReadType == "16bit") { - printf("Reading 16-bit input data \n"); - short int *InputData; - InputData = new short int[SIZE]; - FILE *SEGDAT = fopen(Filename.c_str(), "rb"); - if (SEGDAT == NULL) - ERROR("Domain.cpp: Error reading segmented data"); - size_t ReadSeg; - ReadSeg = fread(InputData, 2, SIZE, SEGDAT); - if (ReadSeg != size_t(SIZE)) - printf("Domain.cpp: Error reading segmented data \n"); - fclose(SEGDAT); - for (int n = 0; n < SIZE; n++) { - SegData[n] = char(InputData[n]); - } - } - printf("Read segmented data from %s \n", Filename.c_str()); - - // relabel the data - std::vector LabelCount(ReadValues.size(), 0); - for (int k = 0; k < global_Nz; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = 0; i < global_Nx; i++) { - n = k * global_Nx * global_Ny + j * global_Nx + i; - //char locval = loc_id[n]; - signed char locval = SegData[n]; - for (size_t idx = 0; idx < ReadValues.size(); idx++) { - signed char oldvalue = ReadValues[idx]; - signed char newvalue = WriteValues[idx]; - if (locval == oldvalue) { - SegData[n] = newvalue; - LabelCount[idx]++; - idx = ReadValues.size(); - } - } - } - } - } - for (size_t idx = 0; idx < ReadValues.size(); idx++) { - long int label = ReadValues[idx]; - long int count = LabelCount[idx]; - printf("Label=%ld, Count=%ld \n", label, count); - } - if (USE_CHECKER) { - if (inlet_layers_x > 0) { - // use checkerboard pattern - printf("Checkerboard pattern at x inlet for %i layers \n", - inlet_layers_x); - for (int k = 0; k < global_Nz; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = xStart; i < xStart + inlet_layers_x; i++) { - if ((j / checkerSize + k / checkerSize) % 2 == 0) { - // void checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 2; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - - if (inlet_layers_y > 0) { - printf("Checkerboard pattern at y inlet for %i layers \n", - inlet_layers_y); - // use checkerboard pattern - for (int k = 0; k < global_Nz; k++) { - for (int j = yStart; j < yStart + inlet_layers_y; j++) { - for (int i = 0; i < global_Nx; i++) { - if ((i / checkerSize + k / checkerSize) % 2 == 0) { - // void checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 2; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - - if (inlet_layers_z > 0) { - printf("Checkerboard pattern at z inlet for %i layers, " - "saturated with phase label=%i \n", - inlet_layers_z, inlet_layers_phase); - // use checkerboard pattern - for (int k = zStart; k < zStart + inlet_layers_z; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = 0; i < global_Nx; i++) { - if ((i / checkerSize + j / checkerSize) % 2 == 0) { - // void checkers - //SegData[k*global_Nx*global_Ny+j*global_Nx+i] = 2; - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = inlet_layers_phase; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - - if (outlet_layers_x > 0) { - // use checkerboard pattern - printf("Checkerboard pattern at x outlet for %i layers \n", - outlet_layers_x); - for (int k = 0; k < global_Nz; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = xStart + nx * nprocx - outlet_layers_x; - i < xStart + nx * nprocx; i++) { - if ((j / checkerSize + k / checkerSize) % 2 == 0) { - // void checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 2; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - - if (outlet_layers_y > 0) { - printf("Checkerboard pattern at y outlet for %i layers \n", - outlet_layers_y); - // use checkerboard pattern - for (int k = 0; k < global_Nz; k++) { - for (int j = yStart + ny * nprocy - outlet_layers_y; - j < yStart + ny * nprocy; j++) { - for (int i = 0; i < global_Nx; i++) { - if ((i / checkerSize + k / checkerSize) % 2 == 0) { - // void checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 2; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - - if (outlet_layers_z > 0) { - printf("Checkerboard pattern at z outlet for %i layers, " - "saturated with phase label=%i \n", - outlet_layers_z, outlet_layers_phase); - // use checkerboard pattern - for (int k = zStart + nz * nprocz - outlet_layers_z; - k < zStart + nz * nprocz; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = 0; i < global_Nx; i++) { - if ((i / checkerSize + j / checkerSize) % 2 == 0) { - // void checkers - //SegData[k*global_Nx*global_Ny+j*global_Nx+i] = 2; - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = - outlet_layers_phase; - } else { - // solid checkers - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = 0; - } - } - } - } - } - } else { - if (inlet_layers_z > 0) { - printf("Mixed reflection pattern at z inlet for %i layers, " - "saturated with phase label=%i \n", - inlet_layers_z, inlet_layers_phase); - for (int k = zStart; k < zStart + inlet_layers_z; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = 0; i < global_Nx; i++) { - signed char local_id = - SegData[k * global_Nx * global_Ny + - j * global_Nx + i]; - signed char reflection_id = - SegData[(zStart + nz * nprocz - 1) * global_Nx * - global_Ny + - j * global_Nx + i]; - if (local_id < 1 && reflection_id > 0) { - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = reflection_id; - } - } - } - } - } - if (outlet_layers_z > 0) { - printf("Mixed reflection pattern at z outlet for %i layers, " - "saturated with phase label=%i \n", - outlet_layers_z, outlet_layers_phase); - for (int k = zStart + nz * nprocz - outlet_layers_z; - k < zStart + nz * nprocz; k++) { - for (int j = 0; j < global_Ny; j++) { - for (int i = 0; i < global_Nx; i++) { - signed char local_id = - SegData[k * global_Nx * global_Ny + - j * global_Nx + i]; - signed char reflection_id = - SegData[zStart * global_Nx * global_Ny + - j * global_Nx + i]; - if (local_id < 1 && reflection_id > 0) { - SegData[k * global_Nx * global_Ny + - j * global_Nx + i] = reflection_id; - } - } - } - } - } - } + + /* swc format for neurons */ + if (ReadType == "swc") { + read_swc(Filename); } + else { + nx = size[0]; + ny = size[1]; + nz = size[2]; + nprocx = nproc[0]; + nprocy = nproc[1]; + nprocz = nproc[2]; + global_Nx = SIZE[0]; + global_Ny = SIZE[1]; + global_Nz = SIZE[2]; + nprocs = nprocx * nprocy * nprocz; + char *SegData = NULL; - // Get the rank info - int64_t N = (nx + 2) * (ny + 2) * (nz + 2); + if (RANK == 0) { + printf("Input media: %s\n", Filename.c_str()); + printf("Relabeling %lu values\n", ReadValues.size()); + for (size_t idx = 0; idx < ReadValues.size(); idx++) { + int oldvalue = ReadValues[idx]; + int newvalue = WriteValues[idx]; + printf("oldvalue=%d, newvalue =%d \n", oldvalue, newvalue); + } - // number of sites to use for periodic boundary condition transition zone - int64_t z_transition_size = (nprocz * nz - (global_Nz - zStart)) / 2; - if (z_transition_size < 0) - z_transition_size = 0; + // Rank=0 reads the entire segmented data and distributes to worker processes + printf("Dimensions of segmented image: %ld x %ld x %ld \n", global_Nx, + global_Ny, global_Nz); + int64_t SIZE = global_Nx * global_Ny * global_Nz; + SegData = new char[SIZE]; + if (ReadType == "8bit") { + printf("Reading 8-bit input data \n"); + FILE *SEGDAT = fopen(Filename.c_str(), "rb"); + if (SEGDAT == NULL) + ERROR("Domain.cpp: Error reading segmented data"); + size_t ReadSeg; + ReadSeg = fread(SegData, 1, SIZE, SEGDAT); + if (ReadSeg != size_t(SIZE)) + printf("Domain.cpp: Error reading segmented data \n"); + fclose(SEGDAT); + } else if (ReadType == "16bit") { + printf("Reading 16-bit input data \n"); + short int *InputData; + InputData = new short int[SIZE]; + FILE *SEGDAT = fopen(Filename.c_str(), "rb"); + if (SEGDAT == NULL) + ERROR("Domain.cpp: Error reading segmented data"); + size_t ReadSeg; + ReadSeg = fread(InputData, 2, SIZE, SEGDAT); + if (ReadSeg != size_t(SIZE)) + printf("Domain.cpp: Error reading segmented data \n"); + fclose(SEGDAT); + for (int n = 0; n < SIZE; n++) { + SegData[n] = char(InputData[n]); + } + } + else if (ReadType == "SWC"){ - // Set up the sub-domains - if (RANK == 0) { - printf("Distributing subdomains across %i processors \n", nprocs); - printf("Process grid: %i x %i x %i \n", nprocx, nprocy, nprocz); - printf("Subdomain size: %i x %i x %i \n", nx, ny, nz); - printf("Size of transition region: %ld \n", z_transition_size); - auto loc_id = new char[(nx + 2) * (ny + 2) * (nz + 2)]; - for (int kp = 0; kp < nprocz; kp++) { - for (int jp = 0; jp < nprocy; jp++) { - for (int ip = 0; ip < nprocx; ip++) { - // rank of the process that gets this subdomain - int rnk = kp * nprocx * nprocy + jp * nprocx + ip; - // Pack and send the subdomain for rnk - for (k = 0; k < nz + 2; k++) { - for (j = 0; j < ny + 2; j++) { - for (i = 0; i < nx + 2; i++) { - int64_t x = xStart + ip * nx + i - 1; - int64_t y = yStart + jp * ny + j - 1; - // int64_t z = zStart + kp*nz + k-1; - int64_t z = zStart + kp * nz + k - 1 - - z_transition_size; - if (x < xStart) - x = xStart; - if (!(x < global_Nx)) - x = global_Nx - 1; - if (y < yStart) - y = yStart; - if (!(y < global_Ny)) - y = global_Ny - 1; - if (z < zStart) - z = zStart; - if (!(z < global_Nz)) - z = global_Nz - 1; - int64_t nlocal = - k * (nx + 2) * (ny + 2) + j * (nx + 2) + i; - int64_t nglobal = z * global_Nx * global_Ny + - y * global_Nx + x; - loc_id[nlocal] = SegData[nglobal]; - } - } - } - if (rnk == 0) { - for (k = 0; k < nz + 2; k++) { - for (j = 0; j < ny + 2; j++) { - for (i = 0; i < nx + 2; i++) { - int nlocal = k * (nx + 2) * (ny + 2) + - j * (nx + 2) + i; - id[nlocal] = loc_id[nlocal]; - } - } - } - } else { - //printf("Sending data to process %i \n", rnk); - Comm.send(loc_id, N, rnk, 15); - } - // Write the data for this rank data - char LocalRankFilename[40]; - sprintf(LocalRankFilename, "ID.%05i", rnk + rank_offset); - FILE *ID = fopen(LocalRankFilename, "wb"); - fwrite(loc_id, 1, (nx + 2) * (ny + 2) * (nz + 2), ID); - fclose(ID); - } - } - } - delete[] loc_id; - } else { - // Recieve the subdomain from rank = 0 - //printf("Ready to recieve data %i at process %i \n", N,rank); - Comm.recv(id.data(), N, 0, 15); + } + printf("Read segmented data from %s \n", Filename.c_str()); + + // relabel the data + std::vector LabelCount(ReadValues.size(), 0); + for (int k = 0; k < global_Nz; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = 0; i < global_Nx; i++) { + n = k * global_Nx * global_Ny + j * global_Nx + i; + //char locval = loc_id[n]; + signed char locval = SegData[n]; + for (size_t idx = 0; idx < ReadValues.size(); idx++) { + signed char oldvalue = ReadValues[idx]; + signed char newvalue = WriteValues[idx]; + if (locval == oldvalue) { + SegData[n] = newvalue; + LabelCount[idx]++; + idx = ReadValues.size(); + } + } + } + } + } + for (size_t idx = 0; idx < ReadValues.size(); idx++) { + long int label = ReadValues[idx]; + long int count = LabelCount[idx]; + printf("Label=%ld, Count=%ld \n", label, count); + } + if (USE_CHECKER) { + if (inlet_layers_x > 0) { + // use checkerboard pattern + printf("Checkerboard pattern at x inlet for %i layers \n", + inlet_layers_x); + for (int k = 0; k < global_Nz; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = xStart; i < xStart + inlet_layers_x; i++) { + if ((j / checkerSize + k / checkerSize) % 2 == 0) { + // void checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 2; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + + if (inlet_layers_y > 0) { + printf("Checkerboard pattern at y inlet for %i layers \n", + inlet_layers_y); + // use checkerboard pattern + for (int k = 0; k < global_Nz; k++) { + for (int j = yStart; j < yStart + inlet_layers_y; j++) { + for (int i = 0; i < global_Nx; i++) { + if ((i / checkerSize + k / checkerSize) % 2 == 0) { + // void checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 2; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + + if (inlet_layers_z > 0) { + printf("Checkerboard pattern at z inlet for %i layers, " + "saturated with phase label=%i \n", + inlet_layers_z, inlet_layers_phase); + // use checkerboard pattern + for (int k = zStart; k < zStart + inlet_layers_z; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = 0; i < global_Nx; i++) { + if ((i / checkerSize + j / checkerSize) % 2 == 0) { + // void checkers + //SegData[k*global_Nx*global_Ny+j*global_Nx+i] = 2; + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = inlet_layers_phase; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + + if (outlet_layers_x > 0) { + // use checkerboard pattern + printf("Checkerboard pattern at x outlet for %i layers \n", + outlet_layers_x); + for (int k = 0; k < global_Nz; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = xStart + nx * nprocx - outlet_layers_x; + i < xStart + nx * nprocx; i++) { + if ((j / checkerSize + k / checkerSize) % 2 == 0) { + // void checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 2; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + + if (outlet_layers_y > 0) { + printf("Checkerboard pattern at y outlet for %i layers \n", + outlet_layers_y); + // use checkerboard pattern + for (int k = 0; k < global_Nz; k++) { + for (int j = yStart + ny * nprocy - outlet_layers_y; + j < yStart + ny * nprocy; j++) { + for (int i = 0; i < global_Nx; i++) { + if ((i / checkerSize + k / checkerSize) % 2 == 0) { + // void checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 2; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + + if (outlet_layers_z > 0) { + printf("Checkerboard pattern at z outlet for %i layers, " + "saturated with phase label=%i \n", + outlet_layers_z, outlet_layers_phase); + // use checkerboard pattern + for (int k = zStart + nz * nprocz - outlet_layers_z; + k < zStart + nz * nprocz; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = 0; i < global_Nx; i++) { + if ((i / checkerSize + j / checkerSize) % 2 == 0) { + // void checkers + //SegData[k*global_Nx*global_Ny+j*global_Nx+i] = 2; + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = + outlet_layers_phase; + } else { + // solid checkers + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = 0; + } + } + } + } + } + } else { + if (inlet_layers_z > 0) { + printf("Mixed reflection pattern at z inlet for %i layers, " + "saturated with phase label=%i \n", + inlet_layers_z, inlet_layers_phase); + for (int k = zStart; k < zStart + inlet_layers_z; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = 0; i < global_Nx; i++) { + signed char local_id = + SegData[k * global_Nx * global_Ny + + j * global_Nx + i]; + signed char reflection_id = + SegData[(zStart + nz * nprocz - 1) * global_Nx * + global_Ny + + j * global_Nx + i]; + if (local_id < 1 && reflection_id > 0) { + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = reflection_id; + } + } + } + } + } + if (outlet_layers_z > 0) { + printf("Mixed reflection pattern at z outlet for %i layers, " + "saturated with phase label=%i \n", + outlet_layers_z, outlet_layers_phase); + for (int k = zStart + nz * nprocz - outlet_layers_z; + k < zStart + nz * nprocz; k++) { + for (int j = 0; j < global_Ny; j++) { + for (int i = 0; i < global_Nx; i++) { + signed char local_id = + SegData[k * global_Nx * global_Ny + + j * global_Nx + i]; + signed char reflection_id = + SegData[zStart * global_Nx * global_Ny + + j * global_Nx + i]; + if (local_id < 1 && reflection_id > 0) { + SegData[k * global_Nx * global_Ny + + j * global_Nx + i] = reflection_id; + } + } + } + } + } + } + } + + // Get the rank info + int64_t N = (nx + 2) * (ny + 2) * (nz + 2); + + // number of sites to use for periodic boundary condition transition zone + int64_t z_transition_size = (nprocz * nz - (global_Nz - zStart)) / 2; + if (z_transition_size < 0) + z_transition_size = 0; + + // Set up the sub-domains + if (RANK == 0) { + printf("Distributing subdomains across %i processors \n", nprocs); + printf("Process grid: %i x %i x %i \n", nprocx, nprocy, nprocz); + printf("Subdomain size: %i x %i x %i \n", nx, ny, nz); + printf("Size of transition region: %ld \n", z_transition_size); + auto loc_id = new char[(nx + 2) * (ny + 2) * (nz + 2)]; + for (int kp = 0; kp < nprocz; kp++) { + for (int jp = 0; jp < nprocy; jp++) { + for (int ip = 0; ip < nprocx; ip++) { + // rank of the process that gets this subdomain + int rnk = kp * nprocx * nprocy + jp * nprocx + ip; + // Pack and send the subdomain for rnk + for (k = 0; k < nz + 2; k++) { + for (j = 0; j < ny + 2; j++) { + for (i = 0; i < nx + 2; i++) { + int64_t x = xStart + ip * nx + i - 1; + int64_t y = yStart + jp * ny + j - 1; + // int64_t z = zStart + kp*nz + k-1; + int64_t z = zStart + kp * nz + k - 1 - + z_transition_size; + if (x < xStart) + x = xStart; + if (!(x < global_Nx)) + x = global_Nx - 1; + if (y < yStart) + y = yStart; + if (!(y < global_Ny)) + y = global_Ny - 1; + if (z < zStart) + z = zStart; + if (!(z < global_Nz)) + z = global_Nz - 1; + int64_t nlocal = + k * (nx + 2) * (ny + 2) + j * (nx + 2) + i; + int64_t nglobal = z * global_Nx * global_Ny + + y * global_Nx + x; + loc_id[nlocal] = SegData[nglobal]; + } + } + } + if (rnk == 0) { + for (k = 0; k < nz + 2; k++) { + for (j = 0; j < ny + 2; j++) { + for (i = 0; i < nx + 2; i++) { + int nlocal = k * (nx + 2) * (ny + 2) + + j * (nx + 2) + i; + id[nlocal] = loc_id[nlocal]; + } + } + } + } else { + //printf("Sending data to process %i \n", rnk); + Comm.send(loc_id, N, rnk, 15); + } + // Write the data for this rank data + char LocalRankFilename[40]; + sprintf(LocalRankFilename, "ID.%05i", rnk + rank_offset); + FILE *ID = fopen(LocalRankFilename, "wb"); + fwrite(loc_id, 1, (nx + 2) * (ny + 2) * (nz + 2), ID); + fclose(ID); + } + } + } + delete[] loc_id; + } else { + // Recieve the subdomain from rank = 0 + //printf("Ready to recieve data %i at process %i \n", N,rank); + Comm.recv(id.data(), N, 0, 15); + } + delete[] SegData; } Comm.barrier(); ComputePorosity(); - delete[] SegData; } void Domain::ComputePorosity() { - // Compute the porosity - double sum; - double sum_local = 0.0; - double iVol_global = 1.0 / (1.0 * (Nx - 2) * (Ny - 2) * (Nz - 2) * - nprocx() * nprocy() * nprocz()); - if (BoundaryCondition > 0 && BoundaryCondition != 5) - iVol_global = - 1.0 / (1.0 * (Nx - 2) * nprocx() * (Ny - 2) * nprocy() * - ((Nz - 2) * nprocz() - inlet_layers_z - outlet_layers_z)); - //......................................................... - for (int k = inlet_layers_z + 1; k < Nz - outlet_layers_z - 1; k++) { - for (int j = 1; j < Ny - 1; j++) { - for (int i = 1; i < Nx - 1; i++) { - int n = k * Nx * Ny + j * Nx + i; - if (id[n] > 0) { - sum_local += 1.0; - } - } - } - } - sum = Comm.sumReduce(sum_local); - porosity = sum * iVol_global; - if (rank() == 0) - printf("Media porosity = %f \n", porosity); - //......................................................... + // Compute the porosity + double sum; + double sum_local = 0.0; + double iVol_global = 1.0 / (1.0 * (Nx - 2) * (Ny - 2) * (Nz - 2) * + nprocx() * nprocy() * nprocz()); + if (BoundaryCondition > 0 && BoundaryCondition != 5) + iVol_global = + 1.0 / (1.0 * (Nx - 2) * nprocx() * (Ny - 2) * nprocy() * + ((Nz - 2) * nprocz() - inlet_layers_z - outlet_layers_z)); + //......................................................... + for (int k = inlet_layers_z + 1; k < Nz - outlet_layers_z - 1; k++) { + for (int j = 1; j < Ny - 1; j++) { + for (int i = 1; i < Nx - 1; i++) { + int n = k * Nx * Ny + j * Nx + i; + if (id[n] > 0) { + sum_local += 1.0; + } + } + } + } + sum = Comm.sumReduce(sum_local); + porosity = sum * iVol_global; + if (rank() == 0) + printf("Media porosity = %f \n", porosity); + //......................................................... } void Domain::AggregateLabels(const std::string &filename) { @@ -1543,7 +1766,7 @@ void Domain::ReadFromFile(const std::string &Filename, } else { // Recieve the subdomain from rank = 0 //printf("Ready to recieve data %i at process %i \n", N,rank); - Comm.recv(id.data(), N, 0, 15); + Comm.recv(UserData, N, 0, 15); } Comm.barrier(); } diff --git a/common/Domain.h b/common/Domain.h index b12535ba..dfdc4fd8 100644 --- a/common/Domain.h +++ b/common/Domain.h @@ -134,6 +134,7 @@ public: // Public variables (need to create accessors instead) int Nx, Ny, Nz, N; int inlet_layers_x, inlet_layers_y, inlet_layers_z; int outlet_layers_x, outlet_layers_y, outlet_layers_z; + int offset_x, offset_y, offset_z; int inlet_layers_phase; //as usual: 1->n, 2->w int outlet_layers_phase; double porosity; @@ -202,6 +203,11 @@ public: // Public variables (need to create accessors instead) * \brief Read domain IDs from file */ void ReadIDs(); + + /** + * \brief Read domain IDs from SWC file + */ + void read_swc(const std::string &Filename); /** * \brief Compute the porosity diff --git a/common/FunctionTable.cpp b/common/FunctionTable.cpp index 927b0ee4..ad5c37cc 100644 --- a/common/FunctionTable.cpp +++ b/common/FunctionTable.cpp @@ -93,12 +93,11 @@ template<> long double genRand() * axpy * ********************************************************/ template <> -void call_axpy(size_t N, const float alpha, const float *x, float *y) { +void call_axpy(size_t, const float, const float*, float*) { ERROR("Not finished"); } template <> -void call_axpy(size_t N, const double alpha, const double *x, - double *y) { +void call_axpy(size_t, const double, const double*, double*) { ERROR("Not finished"); } @@ -106,22 +105,22 @@ void call_axpy(size_t N, const double alpha, const double *x, * Multiply two arrays * ********************************************************/ template <> -void call_gemv(size_t M, size_t N, double alpha, double beta, - const double *A, const double *x, double *y) { +void call_gemv(size_t, size_t, double, double, + const double*, const double*, double*) { ERROR("Not finished"); } template <> -void call_gemv(size_t M, size_t N, float alpha, float beta, - const float *A, const float *x, float *y) { +void call_gemv(size_t, size_t, float, float, + const float*, const float*, float*) { ERROR("Not finished"); } template <> -void call_gemm(size_t M, size_t N, size_t K, double alpha, double beta, - const double *A, const double *B, double *C) { +void call_gemm(size_t, size_t, size_t, double, double, + const double*, const double*, double*) { ERROR("Not finished"); } template <> -void call_gemm(size_t M, size_t N, size_t K, float alpha, float beta, - const float *A, const float *B, float *C) { +void call_gemm(size_t, size_t, size_t, float, float, + const float*, const float*, float*) { ERROR("Not finished"); } diff --git a/common/FunctionTable.hpp b/common/FunctionTable.hpp index 469df732..e8d6dd33 100644 --- a/common/FunctionTable.hpp +++ b/common/FunctionTable.hpp @@ -297,10 +297,10 @@ TYPE FunctionTable::sum(const Array &A) { } template -inline void FunctionTable::gemmWrapper(char TRANSA, char TRANSB, int M, int N, - int K, TYPE alpha, const TYPE *A, - int LDA, const TYPE *B, int LDB, - TYPE beta, TYPE *C, int LDC) { +inline void FunctionTable::gemmWrapper(char, char, int, int, + int, TYPE, const TYPE*, + int, const TYPE*, int, + TYPE, TYPE*, int) { ERROR("Not finished"); } diff --git a/common/MPI.I b/common/MPI.I index 6d44858a..47e7789c 100644 --- a/common/MPI.I +++ b/common/MPI.I @@ -7,1178 +7,1133 @@ #include - #define MPI_CLASS MPI #define MPI_CLASS_ERROR ERROR #define MPI_CLASS_ASSERT ASSERT #undef NULL_USE -#define NULL_USE( variable ) \ - do { \ - if ( 0 ) { \ - auto static t = (char *) &variable; \ - t++; \ - } \ - } while ( 0 ) - +#define NULL_USE(variable) \ + do { \ + if (0) { \ + auto static t = (char *)&variable; \ + t++; \ + } \ + } while (0) namespace Utilities { - // Function to test if a type is a std::pair -template -struct is_pair : std::false_type { -}; -template -struct is_pair> : std::true_type { -}; - +template struct is_pair : std::false_type {}; +template +struct is_pair> : std::true_type {}; // Function to test if a type can be passed by MPI -template -constexpr typename std::enable_if::value,bool>::type - is_mpi_copyable() -{ +template +constexpr + typename std::enable_if::value, bool>::type + is_mpi_copyable() { return true; } -template -constexpr typename std::enable_if::value&&is_pair::value,bool>::type - is_mpi_copyable() -{ - return is_mpi_copyable() && is_mpi_copyable(); +template +constexpr typename std::enable_if::value && + is_pair::value, + bool>::type +is_mpi_copyable() { + return is_mpi_copyable() && + is_mpi_copyable(); } -template -constexpr typename std::enable_if::value&&!is_pair::value,bool>::type - is_mpi_copyable() -{ +template +constexpr typename std::enable_if::value && + !is_pair::value, + bool>::type +is_mpi_copyable() { return false; } - /************************************************************************ * sumReduce * ************************************************************************/ -template -inline TYPE MPI_CLASS::sumReduce( const TYPE value ) const -{ - if ( comm_size > 1 ) { +template inline TYPE MPI_CLASS::sumReduce(const TYPE value) const { + if (comm_size > 1) { TYPE tmp = value; - call_sumReduce( &tmp, 1 ); + call_sumReduce(&tmp, 1); return tmp; } else { return value; } } -template -inline void MPI_CLASS::sumReduce( TYPE *x, const int n ) const -{ - if ( comm_size > 1 ) - call_sumReduce( x, n ); +template inline void MPI_CLASS::sumReduce(TYPE *x, int n) const { + if (comm_size > 1) + call_sumReduce(x, n); } -template -inline void MPI_CLASS::sumReduce( const TYPE *x, TYPE *y, const int n ) const -{ - if ( comm_size > 1 ) { - call_sumReduce( x, y, n ); +template +inline void MPI_CLASS::sumReduce(const TYPE *x, TYPE *y, int n) const { + if (comm_size > 1) { + call_sumReduce(x, y, n); } else { - for ( int i = 0; i < n; i++ ) + for (int i = 0; i < n; i++) y[i] = x[i]; } } -// Define specializations of call_sumReduce(TYPE*, const int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_sumReduce( unsigned char *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( char *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( unsigned int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( unsigned long int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( long int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( size_t *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( float *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( double *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce>( std::complex *, const int ) const; +// Define specializations of call_sumReduce(TYPE*, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_sumReduce(unsigned char *, int) const; +template <> void MPI_CLASS::call_sumReduce(char *, int) const; +template <> +void MPI_CLASS::call_sumReduce(unsigned int *, int) const; +template <> void MPI_CLASS::call_sumReduce(int *, int) const; +template <> +void MPI_CLASS::call_sumReduce(unsigned long int *, + int) const; +template <> void MPI_CLASS::call_sumReduce(long int *, int) const; +template <> void MPI_CLASS::call_sumReduce(size_t *, int) const; +template <> void MPI_CLASS::call_sumReduce(float *, int) const; +template <> void MPI_CLASS::call_sumReduce(double *, int) const; +template <> +void MPI_CLASS::call_sumReduce>(std::complex *, + int) const; #endif -// Default instantiations of call_sumReduce(TYPE*, const int) -template -void MPI_CLASS::call_sumReduce( TYPE *, const int ) const -{ +// Default instantiations of call_sumReduce(TYPE*, int) +template void MPI_CLASS::call_sumReduce(TYPE *, int) const { char message[200]; - sprintf( message, "Default instantion of sumReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of sumReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } -// Define specializations of call_sumReduce(const TYPE*, TYPE*, const int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_sumReduce( - const unsigned char *, unsigned char *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const char *, char *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( - const unsigned int *, unsigned int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const int *, int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( - const unsigned long int *, unsigned long int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const long int *, long int *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const size_t *, size_t *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const float *, float *, const int ) const; -template<> -void MPI_CLASS::call_sumReduce( const double *, double *, const int ) const; -template<> +// Define specializations of call_sumReduce(const TYPE*, TYPE*, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_sumReduce(const unsigned char *, + unsigned char *, int) const; +template <> +void MPI_CLASS::call_sumReduce(const char *, char *, int) const; +template <> +void MPI_CLASS::call_sumReduce(const unsigned int *, + unsigned int *, int) const; +template <> void MPI_CLASS::call_sumReduce(const int *, int *, int) const; +template <> +void MPI_CLASS::call_sumReduce(const unsigned long int *, + unsigned long int *, + int) const; +template <> +void MPI_CLASS::call_sumReduce(const long int *, long int *, + int) const; +template <> +void MPI_CLASS::call_sumReduce(const size_t *, size_t *, int) const; +template <> +void MPI_CLASS::call_sumReduce(const float *, float *, int) const; +template <> +void MPI_CLASS::call_sumReduce(const double *, double *, int) const; +template <> void MPI_CLASS::call_sumReduce>( - const std::complex *, std::complex *, const int ) const; + const std::complex *, std::complex *, int) const; #endif -// Default instantiations of call_sumReduce(const TYPE*, TYPE*, const int) -template -void MPI_CLASS::call_sumReduce( const TYPE *x, TYPE *y, const int n ) const -{ - NULL_USE( x ); - NULL_USE( y ); - NULL_USE( n ); +// Default instantiations of call_sumReduce(const TYPE*, TYPE*, int) +template +void MPI_CLASS::call_sumReduce(const TYPE *x, TYPE *y, int n) const { + NULL_USE(x); + NULL_USE(y); + NULL_USE(n); char message[200]; - sprintf( message, "Default instantion of sumReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of sumReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * minReduce * ************************************************************************/ -template -inline TYPE MPI_CLASS::minReduce( const TYPE value ) const -{ - if ( comm_size > 1 ) { +template inline TYPE MPI_CLASS::minReduce(const TYPE value) const { + if (comm_size > 1) { TYPE tmp = value; - call_minReduce( &tmp, 1, nullptr ); + call_minReduce(&tmp, 1, nullptr); return tmp; } else { return value; } } -template -inline void MPI_CLASS::minReduce( TYPE *x, const int n, int *rank_of_min ) const -{ - if ( comm_size > 1 ) { - call_minReduce( x, n, rank_of_min ); +template +inline void MPI_CLASS::minReduce(TYPE *x, int n, int *rank_of_min) const { + if (comm_size > 1) { + call_minReduce(x, n, rank_of_min); } else { - if ( rank_of_min != nullptr ) { - for ( int i = 0; i < n; i++ ) + if (rank_of_min != nullptr) { + for (int i = 0; i < n; i++) rank_of_min[i] = 0; } } } -template -inline void MPI_CLASS::minReduce( const TYPE *x, TYPE *y, const int n, int *rank_of_min ) const -{ - if ( comm_size > 1 ) { - call_minReduce( x, y, n, rank_of_min ); +template +inline void MPI_CLASS::minReduce(const TYPE *x, TYPE *y, int n, + int *rank_of_min) const { + if (comm_size > 1) { + call_minReduce(x, y, n, rank_of_min); } else { - for ( int i = 0; i < n; i++ ) { + for (int i = 0; i < n; i++) { y[i] = x[i]; - if ( rank_of_min != nullptr ) + if (rank_of_min != nullptr) rank_of_min[i] = 0; } } } -// Define specializations of call_minReduce(TYPE*, const int, int*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_minReduce( unsigned char *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( char *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( unsigned int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( unsigned long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( - unsigned long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( size_t *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( float *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( double *, const int, int * ) const; +// Define specializations of call_minReduce(TYPE*, int, int*) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_minReduce(unsigned char *, int, + int *) const; +template <> void MPI_CLASS::call_minReduce(char *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(unsigned int *, int, int *) const; +template <> void MPI_CLASS::call_minReduce(int *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(unsigned long int *, int, + int *) const; +template <> +void MPI_CLASS::call_minReduce(long int *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(unsigned long long int *, + int, int *) const; +template <> +void MPI_CLASS::call_minReduce(long long int *, int, + int *) const; +template <> void MPI_CLASS::call_minReduce(size_t *, int, int *) const; +template <> void MPI_CLASS::call_minReduce(float *, int, int *) const; +template <> void MPI_CLASS::call_minReduce(double *, int, int *) const; #endif -// Default instantiations of call_minReduce(TYPE*, const int, int*) -template -void MPI_CLASS::call_minReduce( TYPE *, const int, int * ) const -{ +// Default instantiations of call_minReduce(TYPE*, int, int*) +template void MPI_CLASS::call_minReduce(TYPE *, int, int *) const { char message[200]; - sprintf( message, "Default instantion of minReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of minReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } -// Define specializations of call_minReduce(const TYPE*, TYPE*, const int, int*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_minReduce( - const unsigned char *, unsigned char *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const char *, char *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( - const unsigned int *, unsigned int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const int *, int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( - const unsigned long int *, unsigned long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const long int *, long int *, const int, int * ) const; -template<> +// Define specializations of call_minReduce(const TYPE*, TYPE*, int, int*) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_minReduce(const unsigned char *, + unsigned char *, int, + int *) const; +template <> +void MPI_CLASS::call_minReduce(const char *, char *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(const unsigned int *, + unsigned int *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(const int *, int *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(const unsigned long int *, + unsigned long int *, int, + int *) const; +template <> +void MPI_CLASS::call_minReduce(const long int *, long int *, int, + int *) const; +template <> void MPI_CLASS::call_minReduce( - const unsigned long long int *, unsigned long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( - const long long int *, long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const size_t *, size_t *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const float *, float *, const int, int * ) const; -template<> -void MPI_CLASS::call_minReduce( const double *, double *, const int, int * ) const; + const unsigned long long int *, unsigned long long int *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(const long long int *, + long long int *, int, + int *) const; +template <> +void MPI_CLASS::call_minReduce(const size_t *, size_t *, int, + int *) const; +template <> +void MPI_CLASS::call_minReduce(const float *, float *, int, int *) const; +template <> +void MPI_CLASS::call_minReduce(const double *, double *, int, + int *) const; #endif -// Default instantiations of call_minReduce(const TYPE*, TYPE*, const int, int*) -template -void MPI_CLASS::call_minReduce( const TYPE *, TYPE *, const int, int * ) const -{ +// Default instantiations of call_minReduce(const TYPE*, TYPE*, int, int*) +template +void MPI_CLASS::call_minReduce(const TYPE *, TYPE *, int, int *) const { char message[200]; - sprintf( message, "Default instantion of minReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of minReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * maxReduce * ************************************************************************/ -template -inline TYPE MPI_CLASS::maxReduce( const TYPE value ) const -{ - if ( comm_size > 1 ) { +template inline TYPE MPI_CLASS::maxReduce(const TYPE value) const { + if (comm_size > 1) { TYPE tmp = value; - call_maxReduce( &tmp, 1, nullptr ); + call_maxReduce(&tmp, 1, nullptr); return tmp; } else { return value; } } -template -inline void MPI_CLASS::maxReduce( TYPE *x, const int n, int *rank_of_max ) const -{ - if ( comm_size > 1 ) { - call_maxReduce( x, n, rank_of_max ); +template +inline void MPI_CLASS::maxReduce(TYPE *x, int n, int *rank_of_max) const { + if (comm_size > 1) { + call_maxReduce(x, n, rank_of_max); } else { - if ( rank_of_max != nullptr ) { - for ( int i = 0; i < n; i++ ) + if (rank_of_max != nullptr) { + for (int i = 0; i < n; i++) rank_of_max[i] = 0; } } } -template -inline void MPI_CLASS::maxReduce( const TYPE *x, TYPE *y, const int n, int *rank_of_max ) const -{ - if ( comm_size > 1 ) { - call_maxReduce( x, y, n, rank_of_max ); +template +inline void MPI_CLASS::maxReduce(const TYPE *x, TYPE *y, int n, + int *rank_of_max) const { + if (comm_size > 1) { + call_maxReduce(x, y, n, rank_of_max); } else { - for ( int i = 0; i < n; i++ ) { + for (int i = 0; i < n; i++) { y[i] = x[i]; - if ( rank_of_max != nullptr ) + if (rank_of_max != nullptr) rank_of_max[i] = 0; } } } -// Define specializations of call_maxReduce(TYPE*, const int, int*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_maxReduce( unsigned char *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( char *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( unsigned int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( unsigned long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( - unsigned long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( size_t *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( float *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( double *, const int, int * ) const; +// Define specializations of call_maxReduce(TYPE*, int, int*) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_maxReduce(unsigned char *, int, + int *) const; +template <> void MPI_CLASS::call_maxReduce(char *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(unsigned int *, int, int *) const; +template <> void MPI_CLASS::call_maxReduce(int *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(unsigned long int *, int, + int *) const; +template <> +void MPI_CLASS::call_maxReduce(long int *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(unsigned long long int *, + int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(long long int *, int, + int *) const; +template <> void MPI_CLASS::call_maxReduce(size_t *, int, int *) const; +template <> void MPI_CLASS::call_maxReduce(float *, int, int *) const; +template <> void MPI_CLASS::call_maxReduce(double *, int, int *) const; #endif -// Default instantiations of call_maxReduce(TYPE*, const int, int*) -template -void MPI_CLASS::call_maxReduce( TYPE *, const int, int * ) const -{ +// Default instantiations of call_maxReduce(TYPE*, int, int*) +template void MPI_CLASS::call_maxReduce(TYPE *, int, int *) const { char message[200]; - sprintf( message, "Default instantion of maxReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of maxReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } -// Define specializations of call_maxReduce(const TYPE*, TYPE*, const int, int*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_maxReduce( - const unsigned char *, unsigned char *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const char *, char *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( - const unsigned int *, unsigned int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const int *, int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( - const unsigned long int *, unsigned long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const long int *, long int *, const int, int * ) const; -template<> +// Define specializations of call_maxReduce(const TYPE*, TYPE*, int, int*) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_maxReduce(const unsigned char *, + unsigned char *, int, + int *) const; +template <> +void MPI_CLASS::call_maxReduce(const char *, char *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(const unsigned int *, + unsigned int *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(const int *, int *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(const unsigned long int *, + unsigned long int *, int, + int *) const; +template <> +void MPI_CLASS::call_maxReduce(const long int *, long int *, int, + int *) const; +template <> void MPI_CLASS::call_maxReduce( - const unsigned long long int *, unsigned long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( - const long long int *, long long int *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const size_t *, size_t *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const float *, float *, const int, int * ) const; -template<> -void MPI_CLASS::call_maxReduce( const double *, double *, const int, int * ) const; + const unsigned long long int *, unsigned long long int *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(const long long int *, + long long int *, int, + int *) const; +template <> +void MPI_CLASS::call_maxReduce(const size_t *, size_t *, int, + int *) const; +template <> +void MPI_CLASS::call_maxReduce(const float *, float *, int, int *) const; +template <> +void MPI_CLASS::call_maxReduce(const double *, double *, int, + int *) const; #endif -// Default instantiations of call_maxReduce(const TYPE*, TYPE*, const int, int*) -template -void MPI_CLASS::call_maxReduce( const TYPE *, TYPE *, const int, int * ) const -{ +// Default instantiations of call_maxReduce(const TYPE*, TYPE*, int, int*) +template +void MPI_CLASS::call_maxReduce(const TYPE *, TYPE *, int, int *) const { char message[200]; - sprintf( message, "Default instantion of maxReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of maxReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * bcast * ************************************************************************/ -// Define specializations of bcast(TYPE*, const int, const int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_bcast( unsigned char *, const int, const int ) const; -template<> -void MPI_CLASS::call_bcast( char *, const int, const int ) const; -template<> -void MPI_CLASS::call_bcast( unsigned int *, const int, const int ) const; -template<> -void MPI_CLASS::call_bcast( int *, const int, const int ) const; -template<> -void MPI_CLASS::call_bcast( float *, const int, const int ) const; -template<> -void MPI_CLASS::call_bcast( double *, const int, const int ) const; +// Define specializations of bcast(TYPE*, int, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_bcast(unsigned char *, int, int) const; +template <> void MPI_CLASS::call_bcast(char *, int, int) const; +template <> +void MPI_CLASS::call_bcast(unsigned int *, int, int) const; +template <> void MPI_CLASS::call_bcast(int *, int, int) const; +template <> void MPI_CLASS::call_bcast(float *, int, int) const; +template <> void MPI_CLASS::call_bcast(double *, int, int) const; #else -template<> -void MPI_CLASS::call_bcast( char *, const int, const int ) const; +template <> void MPI_CLASS::call_bcast(char *, int, int) const; #endif -// Default instantiations of bcast(TYPE*, const int, const int) -template -void MPI_CLASS::call_bcast( TYPE *x, const int n, const int root ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - call_bcast( (char *) x, (int) n * sizeof( TYPE ), root ); +// Default instantiations of bcast(TYPE*, int, int) +template +void MPI_CLASS::call_bcast(TYPE *x, int n, int root) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + call_bcast((char *)x, (int)n * sizeof(TYPE), root); } // Specialization of bcast for std::string -template<> -inline std::string MPI_CLASS::bcast( const std::string &value, const int root ) const -{ - if ( comm_size == 1 ) +template <> +inline std::string MPI_CLASS::bcast(const std::string &value, + int root) const { + if (comm_size == 1) return value; - int length = static_cast( value.size() ); - call_bcast( &length, 1, root ); - if ( length == 0 ) + int length = static_cast(value.size()); + call_bcast(&length, 1, root); + if (length == 0) return std::string(); char *str = new char[length + 1]; - if ( root == comm_rank ) { - for ( int i = 0; i < length; i++ ) + if (root == comm_rank) { + for (int i = 0; i < length; i++) str[i] = value[i]; } - call_bcast( str, length, root ); + call_bcast(str, length, root); str[length] = 0; - std::string result( str ); + std::string result(str); delete[] str; return result; } -template<> -inline void MPI_CLASS::bcast( std::string *, const int, const int ) const -{ - MPI_CLASS_ERROR( "Cannot bcast an array of strings" ); +template <> +inline void MPI_CLASS::bcast(std::string *, int, int) const { + MPI_CLASS_ERROR("Cannot bcast an array of strings"); } // Default implimentation of bcast -template -inline TYPE MPI_CLASS::bcast( const TYPE &value, const int root ) const -{ - if ( root >= comm_size ) - MPI_CLASS_ERROR( "root cannot be >= size in bcast" ); - if ( comm_size > 1 ) { +template +inline TYPE MPI_CLASS::bcast(const TYPE &value, int root) const { + if (root >= comm_size) + MPI_CLASS_ERROR("root cannot be >= size in bcast"); + if (comm_size > 1) { TYPE tmp = value; - call_bcast( &tmp, 1, root ); + call_bcast(&tmp, 1, root); return tmp; } else { return value; } } -template -inline void MPI_CLASS::bcast( TYPE *x, const int n, const int root ) const -{ - if ( root >= comm_size ) - MPI_CLASS_ERROR( "root cannot be >= size in bcast" ); - if ( comm_size > 1 ) - call_bcast( x, n, root ); +template +inline void MPI_CLASS::bcast(TYPE *x, int n, int root) const { + if (root >= comm_size) + MPI_CLASS_ERROR("root cannot be >= size in bcast"); + if (comm_size > 1) + call_bcast(x, n, root); } - /************************************************************************ * send * ************************************************************************/ -// Define specializations of send(const TYPE*, const int, const int, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::send( const char *, const int, const int, int ) const; -template<> -void MPI_CLASS::send( const int *, int, const int, int ) const; -template<> -void MPI_CLASS::send( const float *, const int, const int, int ) const; -template<> -void MPI_CLASS::send( const double *, const int, const int, int ) const; +// Define specializations of send(const TYPE*, int, int, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> void MPI_CLASS::send(const char *, int, int, int) const; +template <> void MPI_CLASS::send(const int *, int, int, int) const; +template <> void MPI_CLASS::send(const float *, int, int, int) const; +template <> void MPI_CLASS::send(const double *, int, int, int) const; #else -template<> -void MPI_CLASS::send( const char *, const int, const int, int ) const; +template <> void MPI_CLASS::send(const char *, int, int, int) const; #endif -// Default instantiations of send(const TYPE*, const int, const int, int) -template -inline void MPI_CLASS::send( - const TYPE *buf, const int length, const int recv_proc_number, int tag ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - send( (const char *) buf, length * sizeof( TYPE ), recv_proc_number, tag ); +// Default instantiations of send(const TYPE*, int, int, int) +template +inline void MPI_CLASS::send(const TYPE *buf, int length, int recv_proc_number, + int tag) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + send((const char *)buf, length * sizeof(TYPE), recv_proc_number, tag); } - /************************************************************************ * Isend * ************************************************************************/ -// Define specializations of Isend(const TYPE*, const int, const int, const int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -MPI_Request MPI_CLASS::Isend( const char *, const int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Isend( const int *, int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Isend( const float *, const int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Isend( const double *, const int, const int, const int ) const; +// Define specializations of Isend(const TYPE*, int, int, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +MPI_Request MPI_CLASS::Isend(const char *, int, int, int) const; +template <> MPI_Request MPI_CLASS::Isend(const int *, int, int, int) const; +template <> +MPI_Request MPI_CLASS::Isend(const float *, int, int, int) const; +template <> +MPI_Request MPI_CLASS::Isend(const double *, int, int, int) const; #else -template<> -MPI_Request MPI_CLASS::Isend( const char *, const int, const int, const int ) const; +template <> +MPI_Request MPI_CLASS::Isend(const char *, int, int, int) const; #endif -// Default instantiations of Isend(const TYPE*, const int, const int, const int) -template -inline MPI_Request MPI_CLASS::Isend( - const TYPE *buf, const int length, const int recv_proc_number, const int tag ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - return Isend( (const char *) buf, length * sizeof( TYPE ), recv_proc_number, tag ); +// Default instantiations of Isend(const TYPE*, int, int, int) +template +inline MPI_Request MPI_CLASS::Isend(const TYPE *buf, int length, + int recv_proc_number, int tag) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + return Isend((const char *)buf, length * sizeof(TYPE), + recv_proc_number, tag); } - /************************************************************************ * recv * ************************************************************************/ -// Define specializations of recv(TYPE*, int&, const int, const bool, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::recv( char *, int &, const int, const bool, int ) const; -template<> -void MPI_CLASS::recv( int *, int &, const int, const bool, int ) const; -template<> -void MPI_CLASS::recv( float *, int &, const int, const bool, int ) const; -template<> -void MPI_CLASS::recv( double *, int &, const int, const bool, int ) const; +// Define specializations of recv(TYPE*, int&, int, const bool, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::recv(char *, int &, int, const bool, int) const; +template <> void MPI_CLASS::recv(int *, int &, int, const bool, int) const; +template <> +void MPI_CLASS::recv(float *, int &, int, const bool, int) const; +template <> +void MPI_CLASS::recv(double *, int &, int, const bool, int) const; #else -template<> -void MPI_CLASS::recv( char *, int &, const int, const bool, int ) const; +template <> +void MPI_CLASS::recv(char *, int &, int, const bool, int) const; #endif -// Default instantiations of recv(TYPE*, int&, const int, const bool, int) -template -inline void MPI_CLASS::recv( - TYPE *buf, int &length, const int send_proc_number, const bool get_length, int tag ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - int size = length * sizeof( TYPE ); - recv( (char *) buf, size, send_proc_number, get_length, tag ); - if ( get_length ) { - MPI_CLASS_ASSERT( size % sizeof( TYPE ) == 0 ); - length = size / sizeof( TYPE ); +// Default instantiations of recv(TYPE*, int&, int, const bool, int) +template +inline void MPI_CLASS::recv(TYPE *buf, int &length, int send_proc_number, + const bool get_length, int tag) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + int size = length * sizeof(TYPE); + recv((char *)buf, size, send_proc_number, get_length, tag); + if (get_length) { + MPI_CLASS_ASSERT(size % sizeof(TYPE) == 0); + length = size / sizeof(TYPE); } } - /************************************************************************ * Irecv * ************************************************************************/ -// Define specializations of recv(TYPE*, int&, const int, const bool, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -MPI_Request MPI_CLASS::Irecv( char *, const int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Irecv( int *, const int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Irecv( float *, const int, const int, const int ) const; -template<> -MPI_Request MPI_CLASS::Irecv( double *, const int, const int, const int ) const; +// Define specializations of recv(TYPE*, int&, int, const bool, int) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> MPI_Request MPI_CLASS::Irecv(char *, int, int, int) const; +template <> MPI_Request MPI_CLASS::Irecv(int *, int, int, int) const; +template <> MPI_Request MPI_CLASS::Irecv(float *, int, int, int) const; +template <> MPI_Request MPI_CLASS::Irecv(double *, int, int, int) const; #else -template<> -MPI_Request MPI_CLASS::Irecv( char *, const int, const int, const int ) const; +template <> MPI_Request MPI_CLASS::Irecv(char *, int, int, int) const; #endif -// Default instantiations of recv(TYPE*, int&, const int, const bool, int) -template -inline MPI_Request MPI_CLASS::Irecv( - TYPE *buf, const int length, const int send_proc, const int tag ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - return Irecv( (char *) buf, length * sizeof( TYPE ), send_proc, tag ); +// Default instantiations of recv(TYPE*, int&, int, const bool, int) +template +inline MPI_Request MPI_CLASS::Irecv(TYPE *buf, int length, int send_proc, + int tag) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + return Irecv((char *)buf, length * sizeof(TYPE), send_proc, tag); } - /************************************************************************ * sendrecv * ************************************************************************/ -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::sendrecv( const char*, int, int, int, char*, int, int, int ) const; -template<> -void MPI_CLASS::sendrecv( const int*, int, int, int, int*, int, int, int ) const; -template<> -void MPI_CLASS::sendrecv( const float*, int, int, int, float*, int, int, int ) const; -template<> -void MPI_CLASS::sendrecv( const double*, int, int, int, double*, int, int, int ) const; -template -void MPI_CLASS::sendrecv( const TYPE *sendbuf, int sendcount, int dest, int sendtag, - TYPE *recvbuf, int recvcount, int source, int recvtag ) const -{ - if ( getSize() == 1 ) { - ASSERT( dest == 0 ); - ASSERT( source == 0 ); - ASSERT( sendcount == recvcount ); - ASSERT( sendtag == recvtag ); - memcpy( recvbuf, sendbuf, sendcount * sizeof( TYPE ) ); +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::sendrecv(const char *, int, int, int, char *, int, int, + int) const; +template <> +void MPI_CLASS::sendrecv(const int *, int, int, int, int *, int, int, + int) const; +template <> +void MPI_CLASS::sendrecv(const float *, int, int, int, float *, int, int, + int) const; +template <> +void MPI_CLASS::sendrecv(const double *, int, int, int, double *, int, + int, int) const; +template +void MPI_CLASS::sendrecv(const TYPE *sendbuf, int sendcount, int dest, + int sendtag, TYPE *recvbuf, int recvcount, int source, + int recvtag) const { + if (getSize() == 1) { + ASSERT(dest == 0); + ASSERT(source == 0); + ASSERT(sendcount == recvcount); + ASSERT(sendtag == recvtag); + memcpy(recvbuf, sendbuf, sendcount * sizeof(TYPE)); } else { - ERROR( "Not implimented for " + std::string( typeid( TYPE ).name() ) ); + ERROR("Not implimented for " + std::string(typeid(TYPE).name())); } } #else -template -void MPI_CLASS::sendrecv( const TYPE *sendbuf, int sendcount, int dest, int sendtag, - TYPE *recvbuf, int recvcount, int source, int recvtag ) const -{ - ASSERT( dest == 0 ); - ASSERT( source == 0 ); - ASSERT( sendcount == recvcount ); - ASSERT( sendtag == recvtag ); - memcpy( recvbuf, sendbuf, sendcount * sizeof( TYPE ) ); +template +void MPI_CLASS::sendrecv(const TYPE *sendbuf, int sendcount, int dest, + int sendtag, TYPE *recvbuf, int recvcount, int source, + int recvtag) const { + ASSERT(dest == 0); + ASSERT(source == 0); + ASSERT(sendcount == recvcount); + ASSERT(sendtag == recvtag); + memcpy(recvbuf, sendbuf, sendcount * sizeof(TYPE)); } #endif - - /************************************************************************ * allGather * ************************************************************************/ -template -std::vector MPI_CLASS::allGather( const TYPE &x ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - if ( getSize() <= 1 ) - return std::vector( 1, x ); - std::vector data( getSize() ); - allGather( x, data.data() ); +template +std::vector MPI_CLASS::allGather(const TYPE &x) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + if (getSize() <= 1) + return std::vector(1, x); + std::vector data(getSize()); + allGather(x, data.data()); return data; } -template -std::vector MPI_CLASS::allGather( const std::vector &x ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - if ( getSize() <= 1 ) +template +std::vector MPI_CLASS::allGather(const std::vector &x) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + if (getSize() <= 1) return x; - std::vector count = allGather( x.size() ); - std::vector disp( getSize(), 0 ); + std::vector count = allGather(x.size()); + std::vector disp(getSize(), 0); size_t N = count[0]; - for ( size_t i = 1; i < count.size(); i++ ) { + for (size_t i = 1; i < count.size(); i++) { disp[i] = disp[i - 1] + count[i - 1]; N += count[i]; } - std::vector data( N ); - allGather( x.data(), x.size(), data.data(), count.data(), disp.data(), true ); + std::vector data(N); + allGather(x.data(), x.size(), data.data(), count.data(), disp.data(), + true); return data; } // Specialization of MPI_CLASS::allGather for std::string -template<> -inline void MPI_CLASS::allGather( const std::string &x_in, std::string *x_out ) const -{ +template <> +inline void MPI_CLASS::allGather(const std::string &x_in, + std::string *x_out) const { // Get the bytes recvied per processor - std::vector recv_cnt( comm_size, 0 ); - allGather( (int) x_in.size() + 1, &recv_cnt[0] ); - std::vector recv_disp( comm_size, 0 ); - for ( int i = 1; i < comm_size; i++ ) + std::vector recv_cnt(comm_size, 0); + allGather((int)x_in.size() + 1, &recv_cnt[0]); + std::vector recv_disp(comm_size, 0); + for (int i = 1; i < comm_size; i++) recv_disp[i] = recv_disp[i - 1] + recv_cnt[i - 1]; // Call the vector form of allGather for the char arrays - char *recv_data = new char[recv_disp[comm_size - 1] + recv_cnt[comm_size - 1]]; - allGather( - x_in.c_str(), (int) x_in.size() + 1, recv_data, &recv_cnt[0], &recv_disp[0], true ); - for ( int i = 0; i < comm_size; i++ ) - x_out[i] = std::string( &recv_data[recv_disp[i]] ); + char *recv_data = + new char[recv_disp[comm_size - 1] + recv_cnt[comm_size - 1]]; + allGather(x_in.c_str(), (int)x_in.size() + 1, recv_data, &recv_cnt[0], + &recv_disp[0], true); + for (int i = 0; i < comm_size; i++) + x_out[i] = std::string(&recv_data[recv_disp[i]]); delete[] recv_data; } // Default instantiation of MPI_CLASS::allGather -template -inline void MPI_CLASS::allGather( const TYPE &x_in, TYPE *x_out ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - if ( comm_size > 1 ) { +template +inline void MPI_CLASS::allGather(const TYPE &x_in, TYPE *x_out) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + if (comm_size > 1) { // We can use the vector form of allGather with a char array to ge the data we want - call_allGather( x_in, x_out ); + call_allGather(x_in, x_out); } else { // Single processor case x_out[0] = x_in; } } // Specialization of MPI_CLASS::allGather for std::string -template<> -inline int MPI_CLASS::allGather( - const std::string *, const int, std::string *, int *, int *, bool ) const -{ - MPI_CLASS_ERROR( "Cannot allGather an array of strings" ); +template <> +inline int MPI_CLASS::allGather(const std::string *, int, + std::string *, int *, int *, + bool) const { + MPI_CLASS_ERROR("Cannot allGather an array of strings"); return 0; } // Define specializations of call_allGather(const TYPE, TYPE*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_allGather( const unsigned char &, unsigned char * ) const; -template<> -void MPI_CLASS::call_allGather( const char &, char * ) const; -template<> -void MPI_CLASS::call_allGather( const unsigned int &, unsigned int * ) const; -template<> -void MPI_CLASS::call_allGather( const int &, int * ) const; -template<> -void MPI_CLASS::call_allGather( - const unsigned long int &, unsigned long int * ) const; -template<> -void MPI_CLASS::call_allGather( const long int &, long int * ) const; -template<> -void MPI_CLASS::call_allGather( const float &, float * ) const; -template<> -void MPI_CLASS::call_allGather( const double &, double * ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_allGather(const unsigned char &, + unsigned char *) const; +template <> void MPI_CLASS::call_allGather(const char &, char *) const; +template <> +void MPI_CLASS::call_allGather(const unsigned int &, + unsigned int *) const; +template <> void MPI_CLASS::call_allGather(const int &, int *) const; +template <> +void MPI_CLASS::call_allGather(const unsigned long int &, + unsigned long int *) const; +template <> +void MPI_CLASS::call_allGather(const long int &, long int *) const; +template <> void MPI_CLASS::call_allGather(const float &, float *) const; +template <> +void MPI_CLASS::call_allGather(const double &, double *) const; #endif // Default instantiation of MPI_CLASS::allGather -template -int MPI_CLASS::allGather( const TYPE *send_data, const int send_cnt, TYPE *recv_data, int *recv_cnt, - int *recv_disp, bool known_recv ) const -{ +template +int MPI_CLASS::allGather(const TYPE *send_data, int send_cnt, TYPE *recv_data, + int *recv_cnt, int *recv_disp, bool known_recv) const { // Check the inputs - if ( known_recv && ( recv_cnt == nullptr || recv_disp == nullptr ) ) - MPI_CLASS_ERROR( "Error calling allGather" ); + if (known_recv && (recv_cnt == nullptr || recv_disp == nullptr)) + MPI_CLASS_ERROR("Error calling allGather"); // Check if we are dealing with a single processor - if ( comm_size == 1 ) { - if ( send_data == nullptr && send_cnt > 0 ) { - MPI_CLASS_ERROR( "send_data is null" ); - } else if ( !known_recv ) { + if (comm_size == 1) { + if (send_data == nullptr && send_cnt > 0) { + MPI_CLASS_ERROR("send_data is null"); + } else if (!known_recv) { // We do not know the recieved sizes - for ( int i = 0; i < send_cnt; i++ ) + for (int i = 0; i < send_cnt; i++) recv_data[i] = send_data[i]; - if ( recv_cnt != nullptr ) + if (recv_cnt != nullptr) recv_cnt[0] = send_cnt; - if ( recv_disp != nullptr ) + if (recv_disp != nullptr) recv_disp[0] = 0; } else { // We know the recieved sizes - for ( int i = 0; i < send_cnt; i++ ) + for (int i = 0; i < send_cnt; i++) recv_data[i + recv_disp[0]] = send_data[i]; } return send_cnt; } // Get the sizes of the recieved data (if necessary) - int *recv_cnt2 = recv_cnt; + int *recv_cnt2 = recv_cnt; int *recv_disp2 = recv_disp; - if ( !known_recv ) { - if ( recv_cnt == nullptr ) + if (!known_recv) { + if (recv_cnt == nullptr) recv_cnt2 = new int[comm_size]; - if ( recv_disp == nullptr ) + if (recv_disp == nullptr) recv_disp2 = new int[comm_size]; - call_allGather( send_cnt, recv_cnt2 ); + call_allGather(send_cnt, recv_cnt2); recv_disp2[0] = 0; - for ( int i = 1; i < comm_size; i++ ) + for (int i = 1; i < comm_size; i++) recv_disp2[i] = recv_disp2[i - 1] + recv_cnt2[i - 1]; } int N_recv = 0; - for ( int i = 0; i < comm_size; i++ ) + for (int i = 0; i < comm_size; i++) N_recv += recv_cnt2[i]; // Send/recv the data - call_allGather( send_data, send_cnt, recv_data, recv_cnt2, recv_disp2 ); + call_allGather(send_data, send_cnt, recv_data, recv_cnt2, recv_disp2); // Delete any temporary memory - if ( recv_cnt == nullptr ) + if (recv_cnt == nullptr) delete[] recv_cnt2; - if ( recv_disp == nullptr ) + if (recv_disp == nullptr) delete[] recv_disp2; return N_recv; } // Default instantiations of call_allGather(const TYPE, TYPE*) -template -void MPI_CLASS::call_allGather( const TYPE &x_in, TYPE *x_out ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - allGather( (const char *) &x_in, (int) sizeof( TYPE ), (char *) x_out ); +template +void MPI_CLASS::call_allGather(const TYPE &x_in, TYPE *x_out) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + allGather((const char *)&x_in, (int)sizeof(TYPE), (char *)x_out); } // Define specializations of call_allGather(const TYPE*, int, TYPE*, int*, int*) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_allGather( - const unsigned char *, int, unsigned char *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( const char *, int, char *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( - const unsigned int *, int, unsigned int *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( const int *, int, int *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( - const unsigned long int *, int, unsigned long int *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( const long int *, int, long int *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( const float *, int, float *, int *, int * ) const; -template<> -void MPI_CLASS::call_allGather( const double *, int, double *, int *, int * ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_allGather(const unsigned char *, int, + unsigned char *, int *, + int *) const; +template <> +void MPI_CLASS::call_allGather(const char *, int, char *, int *, + int *) const; +template <> +void MPI_CLASS::call_allGather(const unsigned int *, int, + unsigned int *, int *, + int *) const; +template <> +void MPI_CLASS::call_allGather(const int *, int, int *, int *, + int *) const; +template <> +void MPI_CLASS::call_allGather(const unsigned long int *, + int, unsigned long int *, + int *, int *) const; +template <> +void MPI_CLASS::call_allGather(const long int *, int, long int *, + int *, int *) const; +template <> +void MPI_CLASS::call_allGather(const float *, int, float *, int *, + int *) const; +template <> +void MPI_CLASS::call_allGather(const double *, int, double *, int *, + int *) const; #else -template<> -void MPI_CLASS::call_allGather( const char *, int, char *, int *, int * ) const; +template <> +void MPI_CLASS::call_allGather(const char *, int, char *, int *, + int *) const; #endif // Default instantiations of int call_allGather(const TYPE*, int, TYPE*, int*) -template -void MPI_CLASS::call_allGather( - const TYPE *x_in, int size_in, TYPE *x_out, int *size_out, int *disp_out ) const -{ +template +void MPI_CLASS::call_allGather(const TYPE *x_in, int size_in, TYPE *x_out, + int *size_out, int *disp_out) const { int *size2 = new int[comm_size]; int *disp2 = new int[comm_size]; - for ( int i = 0; i < comm_size; i++ ) { - size2[i] = size_out[i] * sizeof( TYPE ); - disp2[i] = disp_out[i] * sizeof( TYPE ); + for (int i = 0; i < comm_size; i++) { + size2[i] = size_out[i] * sizeof(TYPE); + disp2[i] = disp_out[i] * sizeof(TYPE); } - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - call_allGather( - (const char *) x_in, (int) size_in * sizeof( TYPE ), (char *) x_out, size2, disp2 ); + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + call_allGather((const char *)x_in, (int)size_in * sizeof(TYPE), + (char *)x_out, size2, disp2); delete[] size2; delete[] disp2; } - /************************************************************************ * setGather * ************************************************************************/ -template -inline void MPI_CLASS::setGather( std::set &set ) const -{ - std::vector send_buf( set.begin(), set.end() ); - std::vector recv_cnt( this->comm_size, 0 ); - this->allGather( (int) send_buf.size(), &recv_cnt[0] ); - std::vector recv_disp( this->comm_size, 0 ); - for ( int i = 1; i < this->comm_size; i++ ) +template +inline void MPI_CLASS::setGather(std::set &set) const { + std::vector send_buf(set.begin(), set.end()); + std::vector recv_cnt(this->comm_size, 0); + this->allGather((int)send_buf.size(), &recv_cnt[0]); + std::vector recv_disp(this->comm_size, 0); + for (int i = 1; i < this->comm_size; i++) recv_disp[i] = recv_disp[i - 1] + recv_cnt[i - 1]; size_t N_recv_tot = 0; - for ( int i = 0; i < this->comm_size; i++ ) + for (int i = 0; i < this->comm_size; i++) N_recv_tot += recv_cnt[i]; - if ( N_recv_tot == 0 ) + if (N_recv_tot == 0) return; - std::vector recv_buf( N_recv_tot ); + std::vector recv_buf(N_recv_tot); TYPE *send_data = nullptr; - if ( send_buf.size() > 0 ) { + if (send_buf.size() > 0) { send_data = &send_buf[0]; } TYPE *recv_data = &recv_buf[0]; - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - this->allGather( - send_data, (int) send_buf.size(), recv_data, &recv_cnt[0], &recv_disp[0], true ); - for ( size_t i = 0; i < recv_buf.size(); i++ ) - set.insert( recv_buf[i] ); + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + this->allGather(send_data, (int)send_buf.size(), recv_data, + &recv_cnt[0], &recv_disp[0], true); + for (size_t i = 0; i < recv_buf.size(); i++) + set.insert(recv_buf[i]); } - /************************************************************************ * mapGather * ************************************************************************/ -template -inline void MPI_CLASS::mapGather( std::map &map ) const -{ +template +inline void MPI_CLASS::mapGather(std::map &map) const { std::vector send_id; std::vector send_data; - send_id.reserve( map.size() ); - send_data.reserve( map.size() ); - for ( auto it = map.begin(); it != map.end(); ++it ) { - send_id.push_back( it->first ); - send_data.push_back( it->second ); + send_id.reserve(map.size()); + send_data.reserve(map.size()); + for (auto it = map.begin(); it != map.end(); ++it) { + send_id.push_back(it->first); + send_data.push_back(it->second); } - int send_size = (int) send_id.size(); - std::vector recv_cnt( this->comm_size, 0 ); - this->allGather( send_size, &recv_cnt[0] ); - std::vector recv_disp( this->comm_size, 0 ); - for ( int i = 1; i < this->comm_size; i++ ) + int send_size = (int)send_id.size(); + std::vector recv_cnt(this->comm_size, 0); + this->allGather(send_size, &recv_cnt[0]); + std::vector recv_disp(this->comm_size, 0); + for (int i = 1; i < this->comm_size; i++) recv_disp[i] = recv_disp[i - 1] + recv_cnt[i - 1]; size_t N_recv_tot = 0; - for ( int i = 0; i < this->comm_size; i++ ) + for (int i = 0; i < this->comm_size; i++) N_recv_tot += recv_cnt[i]; - if ( N_recv_tot == 0 ) + if (N_recv_tot == 0) return; - std::vector recv_id( N_recv_tot ); - std::vector recv_data( N_recv_tot ); - KEY *send_data1 = nullptr; + std::vector recv_id(N_recv_tot); + std::vector recv_data(N_recv_tot); + KEY *send_data1 = nullptr; DATA *send_data2 = nullptr; - if ( send_id.size() > 0 ) { + if (send_id.size() > 0) { send_data1 = &send_id[0]; send_data2 = &send_data[0]; } - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - this->allGather( send_data1, send_size, &recv_id[0], &recv_cnt[0], &recv_disp[0], true ); - this->allGather( - send_data2, send_size, &recv_data[0], &recv_cnt[0], &recv_disp[0], true ); + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + this->allGather(send_data1, send_size, &recv_id[0], &recv_cnt[0], + &recv_disp[0], true); + this->allGather(send_data2, send_size, &recv_data[0], &recv_cnt[0], + &recv_disp[0], true); map = std::map(); - for ( size_t i = 0; i < N_recv_tot; i++ ) - map.insert( std::pair( recv_id[i], recv_data[i] ) ); + for (size_t i = 0; i < N_recv_tot; i++) + map.insert(std::pair(recv_id[i], recv_data[i])); } - /************************************************************************ * sumScan * ************************************************************************/ -template -inline void MPI_CLASS::sumScan( const TYPE *x, TYPE *y, const int n ) const -{ - if ( comm_size > 1 ) { - call_sumScan( x, y, n ); +template +inline void MPI_CLASS::sumScan(const TYPE *x, TYPE *y, int n) const { + if (comm_size > 1) { + call_sumScan(x, y, n); } else { - for ( int i = 0; i < n; i++ ) + for (int i = 0; i < n; i++) y[i] = x[i]; } } // Define specializations of call_sumScan(const TYPE*, TYPE*, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_sumScan( const unsigned char *, unsigned char *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const char *, char *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const unsigned int *, unsigned int *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const int *, int *, int ) const; -template<> -void MPI_CLASS::call_sumScan( - const unsigned long int *, unsigned long int *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const long int *, long int *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const size_t *, size_t *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const float *, float *, int ) const; -template<> -void MPI_CLASS::call_sumScan( const double *, double *, int ) const; -template<> -void MPI_CLASS::call_sumScan>( - const std::complex *, std::complex *, int ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_sumScan(const unsigned char *, + unsigned char *, int) const; +template <> void MPI_CLASS::call_sumScan(const char *, char *, int) const; +template <> +void MPI_CLASS::call_sumScan(const unsigned int *, unsigned int *, + int) const; +template <> void MPI_CLASS::call_sumScan(const int *, int *, int) const; +template <> +void MPI_CLASS::call_sumScan(const unsigned long int *, + unsigned long int *, int) const; +template <> +void MPI_CLASS::call_sumScan(const long int *, long int *, int) const; +template <> +void MPI_CLASS::call_sumScan(const size_t *, size_t *, int) const; +template <> +void MPI_CLASS::call_sumScan(const float *, float *, int) const; +template <> +void MPI_CLASS::call_sumScan(const double *, double *, int) const; +template <> +void MPI_CLASS::call_sumScan>(const std::complex *, + std::complex *, + int) const; #endif // Default instantiations of call_sumScan(const TYPE*, TYPE*, int) -template -void MPI_CLASS::call_sumScan( const TYPE *, TYPE *, int ) const -{ +template +void MPI_CLASS::call_sumScan(const TYPE *, TYPE *, int) const { char message[200]; - sprintf( message, "Default instantion of sumScan in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of sumScan in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * minScan * ************************************************************************/ -template -inline void MPI_CLASS::minScan( const TYPE *x, TYPE *y, const int n ) const -{ - if ( comm_size > 1 ) { - call_minScan( x, y, n ); +template +inline void MPI_CLASS::minScan(const TYPE *x, TYPE *y, int n) const { + if (comm_size > 1) { + call_minScan(x, y, n); } else { - for ( int i = 0; i < n; i++ ) + for (int i = 0; i < n; i++) y[i] = x[i]; } } // Define specializations of call_minScan(const TYPE*, TYPE*, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_minScan( const unsigned char *, unsigned char *, int ) const; -template<> -void MPI_CLASS::call_minScan( const char *, char *, int ) const; -template<> -void MPI_CLASS::call_minScan( const unsigned int *, unsigned int *, int ) const; -template<> -void MPI_CLASS::call_minScan( const int *, int *, int ) const; -template<> -void MPI_CLASS::call_minScan( - const unsigned long int *, unsigned long int *, int ) const; -template<> -void MPI_CLASS::call_minScan( const long int *, long int *, int ) const; -template<> -void MPI_CLASS::call_minScan( const size_t *, size_t *, int ) const; -template<> -void MPI_CLASS::call_minScan( const float *, float *, int ) const; -template<> -void MPI_CLASS::call_minScan( const double *, double *, int ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_minScan(const unsigned char *, + unsigned char *, int) const; +template <> void MPI_CLASS::call_minScan(const char *, char *, int) const; +template <> +void MPI_CLASS::call_minScan(const unsigned int *, unsigned int *, + int) const; +template <> void MPI_CLASS::call_minScan(const int *, int *, int) const; +template <> +void MPI_CLASS::call_minScan(const unsigned long int *, + unsigned long int *, int) const; +template <> +void MPI_CLASS::call_minScan(const long int *, long int *, int) const; +template <> +void MPI_CLASS::call_minScan(const size_t *, size_t *, int) const; +template <> +void MPI_CLASS::call_minScan(const float *, float *, int) const; +template <> +void MPI_CLASS::call_minScan(const double *, double *, int) const; #endif // Default instantiations of call_minScan(const TYPE*, TYPE*, int) -template -void MPI_CLASS::call_minScan( const TYPE *, TYPE *, int ) const -{ +template +void MPI_CLASS::call_minScan(const TYPE *, TYPE *, int) const { char message[200]; - sprintf( message, "Default instantion of minScan in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of minScan in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * maxScan * ************************************************************************/ -template -inline void MPI_CLASS::maxScan( const TYPE *x, TYPE *y, const int n ) const -{ - if ( comm_size > 1 ) { - call_maxScan( x, y, n ); +template +inline void MPI_CLASS::maxScan(const TYPE *x, TYPE *y, int n) const { + if (comm_size > 1) { + call_maxScan(x, y, n); } else { - for ( int i = 0; i < n; i++ ) + for (int i = 0; i < n; i++) y[i] = x[i]; } } // Define specializations of call_maxScan(const TYPE*, TYPE*, int) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_maxScan( const unsigned char *, unsigned char *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const char *, char *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const unsigned int *, unsigned int *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const int *, int *, int ) const; -template<> -void MPI_CLASS::call_maxScan( - const unsigned long int *, unsigned long int *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const long int *, long int *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const size_t *, size_t *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const float *, float *, int ) const; -template<> -void MPI_CLASS::call_maxScan( const double *, double *, int ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_maxScan(const unsigned char *, + unsigned char *, int) const; +template <> void MPI_CLASS::call_maxScan(const char *, char *, int) const; +template <> +void MPI_CLASS::call_maxScan(const unsigned int *, unsigned int *, + int) const; +template <> void MPI_CLASS::call_maxScan(const int *, int *, int) const; +template <> +void MPI_CLASS::call_maxScan(const unsigned long int *, + unsigned long int *, int) const; +template <> +void MPI_CLASS::call_maxScan(const long int *, long int *, int) const; +template <> +void MPI_CLASS::call_maxScan(const size_t *, size_t *, int) const; +template <> +void MPI_CLASS::call_maxScan(const float *, float *, int) const; +template <> +void MPI_CLASS::call_maxScan(const double *, double *, int) const; #endif // Default instantiations of call_maxScan(const TYPE*, TYPE*, int) -template -void MPI_CLASS::call_maxScan( const TYPE *, TYPE *, int ) const -{ +template +void MPI_CLASS::call_maxScan(const TYPE *, TYPE *, int) const { char message[200]; - sprintf( message, "Default instantion of maxReduce in parallel is not supported (%s)", - typeid( TYPE ).name() ); - MPI_CLASS_ERROR( message ); + sprintf(message, + "Default instantion of maxReduce in parallel is not supported (%s)", + typeid(TYPE).name()); + MPI_CLASS_ERROR(message); } - /************************************************************************ * allToAll * ************************************************************************/ -// Define specializations of allToAll(const int n, const char*, char* ) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::allToAll( - const int n, const unsigned char *, unsigned char * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const char *, char * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const unsigned int *, unsigned int * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const int *, int * ) const; -template<> -void MPI_CLASS::allToAll( - const int n, const unsigned long int *, unsigned long int * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const long int *, long int * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const float *, float * ) const; -template<> -void MPI_CLASS::allToAll( const int n, const double *, double * ) const; +// Define specializations of allToAll( int n, const char*, char* ) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::allToAll(int n, const unsigned char *, + unsigned char *) const; +template <> void MPI_CLASS::allToAll(int n, const char *, char *) const; +template <> +void MPI_CLASS::allToAll(int n, const unsigned int *, + unsigned int *) const; +template <> void MPI_CLASS::allToAll(int n, const int *, int *) const; +template <> +void MPI_CLASS::allToAll(int n, const unsigned long int *, + unsigned long int *) const; +template <> +void MPI_CLASS::allToAll(int n, const long int *, long int *) const; +template <> +void MPI_CLASS::allToAll(int n, const float *, float *) const; +template <> +void MPI_CLASS::allToAll(int n, const double *, double *) const; #endif -// Default instantiations of allToAll(const int n, const char*, char* ) -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template -void MPI_CLASS::allToAll( const int n, const TYPE *send_data, TYPE *recv_data ) const -{ - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - allToAll( n * sizeof( TYPE ), (char *) send_data, (char *) recv_data ); +// Default instantiations of allToAll( int n, const char*, char* ) +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template +void MPI_CLASS::allToAll(int n, const TYPE *send_data, TYPE *recv_data) const { + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + allToAll(n * sizeof(TYPE), (char *)send_data, (char *)recv_data); } #else -template -void MPI_CLASS::allToAll( const int n, const TYPE *send_data, TYPE *recv_data ) const -{ - if ( comm_size != 1 ) - MPI_CLASS_ERROR( "Invalid size for allToAll" ); - for ( int i = 0; i < n; i++ ) +template +void MPI_CLASS::allToAll(int n, const TYPE *send_data, TYPE *recv_data) const { + if (comm_size != 1) + MPI_CLASS_ERROR("Invalid size for allToAll"); + for (int i = 0; i < n; i++) recv_data[i] = send_data[i]; } #endif - /************************************************************************ * allToAll * ************************************************************************/ -template -int MPI_CLASS::allToAll( const TYPE *send_data, const int send_cnt[], const int send_disp[], - TYPE *recv_data, int *recv_cnt, int *recv_disp, bool known_recv ) const -{ +template +int MPI_CLASS::allToAll(const TYPE *send_data, const int send_cnt[], + const int send_disp[], TYPE *recv_data, int *recv_cnt, + int *recv_disp, bool known_recv) const { int N_recieved = 0; - if ( comm_size == 1 ) { + if (comm_size == 1) { // Special case for single-processor communicators - if ( known_recv ) { - if ( recv_cnt[0] != send_cnt[0] && send_cnt[0] > 0 ) - MPI_CLASS_ERROR( "Single processor send/recv are different sizes" ); + if (known_recv) { + if (recv_cnt[0] != send_cnt[0] && send_cnt[0] > 0) + MPI_CLASS_ERROR( + "Single processor send/recv are different sizes"); } else { - if ( recv_cnt != nullptr ) + if (recv_cnt != nullptr) recv_cnt[0] = send_cnt[0]; - if ( recv_disp != nullptr ) + if (recv_disp != nullptr) recv_disp[0] = send_disp[0]; } - for ( int i = 0; i < send_cnt[0]; i++ ) + for (int i = 0; i < send_cnt[0]; i++) recv_data[i + recv_disp[0]] = send_data[i + send_disp[0]]; N_recieved = send_cnt[0]; - } else if ( known_recv ) { + } else if (known_recv) { // The recieve sizes are known - MPI_CLASS_ASSERT( recv_cnt != nullptr && recv_disp != nullptr ); - call_allToAll( send_data, send_cnt, send_disp, recv_data, recv_cnt, recv_disp ); - for ( int i = 0; i < comm_size; i++ ) + MPI_CLASS_ASSERT(recv_cnt != nullptr && recv_disp != nullptr); + call_allToAll(send_data, send_cnt, send_disp, recv_data, recv_cnt, + recv_disp); + for (int i = 0; i < comm_size; i++) N_recieved += recv_cnt[i]; } else { // The recieve sizes are not known, we need to communicate that information first - int *recv_cnt2 = recv_cnt; + int *recv_cnt2 = recv_cnt; int *recv_disp2 = recv_disp; - if ( recv_cnt == nullptr ) + if (recv_cnt == nullptr) recv_cnt2 = new int[comm_size]; - if ( recv_disp == nullptr ) + if (recv_disp == nullptr) recv_disp2 = new int[comm_size]; // Communicate the size we will be recieving from each processor - allToAll( 1, send_cnt, recv_cnt2 ); + allToAll(1, send_cnt, recv_cnt2); recv_disp2[0] = 0; - for ( int i = 1; i < comm_size; i++ ) + for (int i = 1; i < comm_size; i++) recv_disp2[i] = recv_disp2[i - 1] + recv_cnt2[i - 1]; // Send the data - call_allToAll( send_data, send_cnt, send_disp, recv_data, recv_cnt2, recv_disp2 ); - for ( int i = 0; i < comm_size; i++ ) + call_allToAll(send_data, send_cnt, send_disp, recv_data, recv_cnt2, + recv_disp2); + for (int i = 0; i < comm_size; i++) N_recieved += recv_cnt2[i]; - if ( recv_cnt == nullptr ) + if (recv_cnt == nullptr) delete[] recv_cnt2; - if ( recv_disp == nullptr ) + if (recv_disp == nullptr) delete[] recv_disp2; } return N_recieved; } // Define specializations of call_allToAll -#if defined( USE_MPI ) || defined( USE_EXT_MPI ) -template<> -void MPI_CLASS::call_allToAll( const unsigned char *, const int *, const int *, - unsigned char *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( - const char *, const int *, const int *, char *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( const unsigned int *, const int *, const int *, - unsigned int *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( - const int *, const int *, const int *, int *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( const unsigned long int *, const int *, - const int *, unsigned long int *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( - const long int *, const int *, const int *, long int *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( - const float *, const int *, const int *, float *, const int *, const int * ) const; -template<> -void MPI_CLASS::call_allToAll( - const double *, const int *, const int *, double *, const int *, const int * ) const; +#if defined(USE_MPI) || defined(USE_EXT_MPI) +template <> +void MPI_CLASS::call_allToAll(const unsigned char *, const int *, + const int *, unsigned char *, + const int *, const int *) const; +template <> +void MPI_CLASS::call_allToAll(const char *, const int *, const int *, + char *, const int *, const int *) const; +template <> +void MPI_CLASS::call_allToAll(const unsigned int *, const int *, + const int *, unsigned int *, + const int *, const int *) const; +template <> +void MPI_CLASS::call_allToAll(const int *, const int *, const int *, int *, + const int *, const int *) const; +template <> +void MPI_CLASS::call_allToAll(const unsigned long int *, + const int *, const int *, + unsigned long int *, + const int *, + const int *) const; +template <> +void MPI_CLASS::call_allToAll(const long int *, const int *, + const int *, long int *, const int *, + const int *) const; +template <> +void MPI_CLASS::call_allToAll(const float *, const int *, const int *, + float *, const int *, const int *) const; +template <> +void MPI_CLASS::call_allToAll(const double *, const int *, const int *, + double *, const int *, const int *) const; #else -template<> -void MPI_CLASS::call_allToAll( - const char *, const int *, const int *, char *, const int *, const int * ) const; +template <> +void MPI_CLASS::call_allToAll(const char *, const int *, const int *, + char *, const int *, const int *) const; #endif // Default instantiations of call_allToAll -template -void MPI_CLASS::call_allToAll( const TYPE *send_data, const int send_cnt[], const int send_disp[], - TYPE *recv_data, const int *recv_cnt, const int *recv_disp ) const -{ - int *send_cnt2 = new int[comm_size]; - int *recv_cnt2 = new int[comm_size]; +template +void MPI_CLASS::call_allToAll(const TYPE *send_data, const int send_cnt[], + const int send_disp[], TYPE *recv_data, + const int *recv_cnt, const int *recv_disp) const { + int *send_cnt2 = new int[comm_size]; + int *recv_cnt2 = new int[comm_size]; int *send_disp2 = new int[comm_size]; int *recv_disp2 = new int[comm_size]; - for ( int i = 0; i < comm_size; i++ ) { - send_cnt2[i] = send_cnt[i] * sizeof( TYPE ); - send_disp2[i] = send_disp[i] * sizeof( TYPE ); - recv_cnt2[i] = recv_cnt[i] * sizeof( TYPE ); - recv_disp2[i] = recv_disp[i] * sizeof( TYPE ); + for (int i = 0; i < comm_size; i++) { + send_cnt2[i] = send_cnt[i] * sizeof(TYPE); + send_disp2[i] = send_disp[i] * sizeof(TYPE); + recv_cnt2[i] = recv_cnt[i] * sizeof(TYPE); + recv_disp2[i] = recv_disp[i] * sizeof(TYPE); } - static_assert( is_mpi_copyable(), "Object is not trivially copyable" ); - call_allToAll( - (char *) send_data, send_cnt2, send_disp2, (char *) recv_data, recv_cnt2, recv_disp2 ); + static_assert(is_mpi_copyable(), "Object is not trivially copyable"); + call_allToAll((char *)send_data, send_cnt2, send_disp2, + (char *)recv_data, recv_cnt2, recv_disp2); delete[] send_cnt2; delete[] recv_cnt2; delete[] send_disp2; delete[] recv_disp2; } - } // namespace Utilities #endif diff --git a/common/MPI.cpp b/common/MPI.cpp index e75e242e..72f5b462 100644 --- a/common/MPI.cpp +++ b/common/MPI.cpp @@ -1115,15 +1115,14 @@ bool MPI_CLASS::anyReduce(const bool value) const { template <> void MPI_CLASS::call_sumReduce(const unsigned char *send, unsigned char *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_UNSIGNED_CHAR, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } template <> -void MPI_CLASS::call_sumReduce(unsigned char *x, - const int n) const { +void MPI_CLASS::call_sumReduce(unsigned char *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new unsigned char[n]; @@ -1136,13 +1135,13 @@ void MPI_CLASS::call_sumReduce(unsigned char *x, // char template <> void MPI_CLASS::call_sumReduce(const char *send, char *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_SIGNED_CHAR, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> void MPI_CLASS::call_sumReduce(char *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(char *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new char[n]; @@ -1155,16 +1154,14 @@ template <> void MPI_CLASS::call_sumReduce(char *x, const int n) const { // unsigned int template <> void MPI_CLASS::call_sumReduce(const unsigned int *send, - unsigned int *recv, - const int n) const { + unsigned int *recv, int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_UNSIGNED, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } template <> -void MPI_CLASS::call_sumReduce(unsigned int *x, - const int n) const { +void MPI_CLASS::call_sumReduce(unsigned int *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new unsigned int[n]; @@ -1176,14 +1173,13 @@ void MPI_CLASS::call_sumReduce(unsigned int *x, } // int template <> -void MPI_CLASS::call_sumReduce(const int *send, int *recv, - const int n) const { +void MPI_CLASS::call_sumReduce(const int *send, int *recv, int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_INT, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> void MPI_CLASS::call_sumReduce(int *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(int *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new int[n]; @@ -1196,14 +1192,13 @@ template <> void MPI_CLASS::call_sumReduce(int *x, const int n) const { // long int template <> void MPI_CLASS::call_sumReduce(const long int *send, long int *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_LONG, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> -void MPI_CLASS::call_sumReduce(long int *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(long int *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new long int[n]; @@ -1217,15 +1212,14 @@ void MPI_CLASS::call_sumReduce(long int *x, const int n) const { template <> void MPI_CLASS::call_sumReduce(const unsigned long *send, unsigned long *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_UNSIGNED_LONG, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } template <> -void MPI_CLASS::call_sumReduce(unsigned long *x, - const int n) const { +void MPI_CLASS::call_sumReduce(unsigned long *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new unsigned long int[n]; @@ -1239,15 +1233,14 @@ void MPI_CLASS::call_sumReduce(unsigned long *x, #ifdef USE_WINDOWS template <> void MPI_CLASS::call_sumReduce(const size_t *send, size_t *recv, - const int n) const { + int n) const { MPI_ASSERT(MPI_SIZE_T != 0); PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_SIZE_T, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> -void MPI_CLASS::call_sumReduce(size_t *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(size_t *x, int n) const { MPI_ASSERT(MPI_SIZE_T != 0); PROFILE_START("sumReduce2", profile_level); auto send = x; @@ -1263,13 +1256,13 @@ void MPI_CLASS::call_sumReduce(size_t *x, const int n) const { // float template <> void MPI_CLASS::call_sumReduce(const float *send, float *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_FLOAT, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> void MPI_CLASS::call_sumReduce(float *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(float *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new float[n]; @@ -1282,14 +1275,13 @@ template <> void MPI_CLASS::call_sumReduce(float *x, const int n) const { // double template <> void MPI_CLASS::call_sumReduce(const double *send, double *recv, - const int n) const { + int n) const { PROFILE_START("sumReduce1", profile_level); MPI_Allreduce((void *)send, (void *)recv, n, MPI_DOUBLE, MPI_SUM, communicator); PROFILE_STOP("sumReduce1", profile_level); } -template <> -void MPI_CLASS::call_sumReduce(double *x, const int n) const { +template <> void MPI_CLASS::call_sumReduce(double *x, int n) const { PROFILE_START("sumReduce2", profile_level); auto send = x; auto recv = new double[n]; @@ -1302,7 +1294,7 @@ void MPI_CLASS::call_sumReduce(double *x, const int n) const { // std::complex template <> void MPI_CLASS::call_sumReduce>( - const std::complex *x, std::complex *y, const int n) const { + const std::complex *x, std::complex *y, int n) const { PROFILE_START("sumReduce1", profile_level); auto send = new double[2 * n]; auto recv = new double[2 * n]; @@ -1320,7 +1312,7 @@ void MPI_CLASS::call_sumReduce>( } template <> void MPI_CLASS::call_sumReduce>(std::complex *x, - const int n) const { + int n) const { PROFILE_START("sumReduce2", profile_level); auto send = new double[2 * n]; auto recv = new double[2 * n]; @@ -1345,7 +1337,7 @@ void MPI_CLASS::call_sumReduce>(std::complex *x, // unsigned char template <> void MPI_CLASS::call_minReduce(const unsigned char *send, - unsigned char *recv, const int n, + unsigned char *recv, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce1", profile_level); @@ -1363,7 +1355,7 @@ void MPI_CLASS::call_minReduce(const unsigned char *send, } } template <> -void MPI_CLASS::call_minReduce(unsigned char *x, const int n, +void MPI_CLASS::call_minReduce(unsigned char *x, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce2", profile_level); @@ -1386,7 +1378,7 @@ void MPI_CLASS::call_minReduce(unsigned char *x, const int n, } // char template <> -void MPI_CLASS::call_minReduce(const char *send, char *recv, const int n, +void MPI_CLASS::call_minReduce(const char *send, char *recv, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce1", profile_level); @@ -1404,7 +1396,7 @@ void MPI_CLASS::call_minReduce(const char *send, char *recv, const int n, } } template <> -void MPI_CLASS::call_minReduce(char *x, const int n, +void MPI_CLASS::call_minReduce(char *x, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce2", profile_level); @@ -1428,7 +1420,7 @@ void MPI_CLASS::call_minReduce(char *x, const int n, // unsigned int template <> void MPI_CLASS::call_minReduce(const unsigned int *send, - unsigned int *recv, const int n, + unsigned int *recv, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce1", profile_level); @@ -1446,7 +1438,7 @@ void MPI_CLASS::call_minReduce(const unsigned int *send, } } template <> -void MPI_CLASS::call_minReduce(unsigned int *x, const int n, +void MPI_CLASS::call_minReduce(unsigned int *x, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce2", profile_level); @@ -1469,7 +1461,7 @@ void MPI_CLASS::call_minReduce(unsigned int *x, const int n, } // int template <> -void MPI_CLASS::call_minReduce(const int *x, int *y, const int n, +void MPI_CLASS::call_minReduce(const int *x, int *y, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { @@ -1492,7 +1484,7 @@ void MPI_CLASS::call_minReduce(const int *x, int *y, const int n, PROFILE_STOP("minReduce1", profile_level); } template <> -void MPI_CLASS::call_minReduce(int *x, const int n, +void MPI_CLASS::call_minReduce(int *x, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce2", profile_level); if (comm_rank_of_min == nullptr) { @@ -1523,7 +1515,7 @@ void MPI_CLASS::call_minReduce(int *x, const int n, template <> void MPI_CLASS::call_minReduce(const unsigned long int *send, unsigned long int *recv, - const int n, + int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce1", profile_level); @@ -1541,8 +1533,7 @@ void MPI_CLASS::call_minReduce(const unsigned long int *send, } } template <> -void MPI_CLASS::call_minReduce(unsigned long int *x, - const int n, +void MPI_CLASS::call_minReduce(unsigned long int *x, int n, int *comm_rank_of_min) const { if (comm_rank_of_min == nullptr) { PROFILE_START("minReduce2", profile_level); @@ -1565,8 +1556,7 @@ void MPI_CLASS::call_minReduce(unsigned long int *x, } // long int template <> -void MPI_CLASS::call_minReduce(const long int *x, long int *y, - const int n, +void MPI_CLASS::call_minReduce(const long int *x, long int *y, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { @@ -1589,7 +1579,7 @@ void MPI_CLASS::call_minReduce(const long int *x, long int *y, PROFILE_STOP("minReduce1", profile_level); } template <> -void MPI_CLASS::call_minReduce(long int *x, const int n, +void MPI_CLASS::call_minReduce(long int *x, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce2", profile_level); if (comm_rank_of_min == nullptr) { @@ -1619,8 +1609,8 @@ void MPI_CLASS::call_minReduce(long int *x, const int n, // unsigned long long int template <> void MPI_CLASS::call_minReduce( - const unsigned long long int *send, unsigned long long int *recv, - const int n, int *comm_rank_of_min) const { + const unsigned long long int *send, unsigned long long int *recv, int n, + int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { auto x = new long long int[n]; @@ -1647,7 +1637,7 @@ void MPI_CLASS::call_minReduce( } template <> void MPI_CLASS::call_minReduce( - unsigned long long int *x, const int n, int *comm_rank_of_min) const { + unsigned long long int *x, int n, int *comm_rank_of_min) const { auto recv = new unsigned long long int[n]; call_minReduce(x, recv, n, comm_rank_of_min); for (int i = 0; i < n; i++) @@ -1657,7 +1647,7 @@ void MPI_CLASS::call_minReduce( // long long int template <> void MPI_CLASS::call_minReduce(const long long int *x, - long long int *y, const int n, + long long int *y, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { @@ -1676,7 +1666,7 @@ void MPI_CLASS::call_minReduce(const long long int *x, PROFILE_STOP("minReduce1", profile_level); } template <> -void MPI_CLASS::call_minReduce(long long int *x, const int n, +void MPI_CLASS::call_minReduce(long long int *x, int n, int *comm_rank_of_min) const { auto recv = new long long int[n]; call_minReduce(x, recv, n, comm_rank_of_min); @@ -1686,7 +1676,7 @@ void MPI_CLASS::call_minReduce(long long int *x, const int n, } // float template <> -void MPI_CLASS::call_minReduce(const float *x, float *y, const int n, +void MPI_CLASS::call_minReduce(const float *x, float *y, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { @@ -1709,7 +1699,7 @@ void MPI_CLASS::call_minReduce(const float *x, float *y, const int n, PROFILE_STOP("minReduce1", profile_level); } template <> -void MPI_CLASS::call_minReduce(float *x, const int n, +void MPI_CLASS::call_minReduce(float *x, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce2", profile_level); if (comm_rank_of_min == nullptr) { @@ -1738,7 +1728,7 @@ void MPI_CLASS::call_minReduce(float *x, const int n, } // double template <> -void MPI_CLASS::call_minReduce(const double *x, double *y, const int n, +void MPI_CLASS::call_minReduce(const double *x, double *y, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce1", profile_level); if (comm_rank_of_min == nullptr) { @@ -1762,7 +1752,7 @@ void MPI_CLASS::call_minReduce(const double *x, double *y, const int n, PROFILE_STOP("minReduce1", profile_level); } template <> -void MPI_CLASS::call_minReduce(double *x, const int n, +void MPI_CLASS::call_minReduce(double *x, int n, int *comm_rank_of_min) const { PROFILE_START("minReduce2", profile_level); if (comm_rank_of_min == nullptr) { @@ -1799,7 +1789,7 @@ void MPI_CLASS::call_minReduce(double *x, const int n, // unsigned char template <> void MPI_CLASS::call_maxReduce(const unsigned char *send, - unsigned char *recv, const int n, + unsigned char *recv, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce1", profile_level); @@ -1817,7 +1807,7 @@ void MPI_CLASS::call_maxReduce(const unsigned char *send, } } template <> -void MPI_CLASS::call_maxReduce(unsigned char *x, const int n, +void MPI_CLASS::call_maxReduce(unsigned char *x, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce2", profile_level); @@ -1840,7 +1830,7 @@ void MPI_CLASS::call_maxReduce(unsigned char *x, const int n, } // char template <> -void MPI_CLASS::call_maxReduce(const char *send, char *recv, const int n, +void MPI_CLASS::call_maxReduce(const char *send, char *recv, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce1", profile_level); @@ -1858,7 +1848,7 @@ void MPI_CLASS::call_maxReduce(const char *send, char *recv, const int n, } } template <> -void MPI_CLASS::call_maxReduce(char *x, const int n, +void MPI_CLASS::call_maxReduce(char *x, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce2", profile_level); @@ -1882,7 +1872,7 @@ void MPI_CLASS::call_maxReduce(char *x, const int n, // unsigned int template <> void MPI_CLASS::call_maxReduce(const unsigned int *send, - unsigned int *recv, const int n, + unsigned int *recv, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce1", profile_level); @@ -1900,7 +1890,7 @@ void MPI_CLASS::call_maxReduce(const unsigned int *send, } } template <> -void MPI_CLASS::call_maxReduce(unsigned int *x, const int n, +void MPI_CLASS::call_maxReduce(unsigned int *x, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce2", profile_level); @@ -1923,7 +1913,7 @@ void MPI_CLASS::call_maxReduce(unsigned int *x, const int n, } // int template <> -void MPI_CLASS::call_maxReduce(const int *x, int *y, const int n, +void MPI_CLASS::call_maxReduce(const int *x, int *y, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { @@ -1946,7 +1936,7 @@ void MPI_CLASS::call_maxReduce(const int *x, int *y, const int n, PROFILE_STOP("maxReduce1", profile_level); } template <> -void MPI_CLASS::call_maxReduce(int *x, const int n, +void MPI_CLASS::call_maxReduce(int *x, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce2", profile_level); if (comm_rank_of_max == nullptr) { @@ -1975,8 +1965,7 @@ void MPI_CLASS::call_maxReduce(int *x, const int n, } // long int template <> -void MPI_CLASS::call_maxReduce(const long int *x, long int *y, - const int n, +void MPI_CLASS::call_maxReduce(const long int *x, long int *y, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { @@ -1999,7 +1988,7 @@ void MPI_CLASS::call_maxReduce(const long int *x, long int *y, PROFILE_STOP("maxReduce1", profile_level); } template <> -void MPI_CLASS::call_maxReduce(long int *x, const int n, +void MPI_CLASS::call_maxReduce(long int *x, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce2", profile_level); if (comm_rank_of_max == nullptr) { @@ -2030,7 +2019,7 @@ void MPI_CLASS::call_maxReduce(long int *x, const int n, template <> void MPI_CLASS::call_maxReduce(const unsigned long int *send, unsigned long int *recv, - const int n, + int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce1", profile_level); @@ -2048,8 +2037,7 @@ void MPI_CLASS::call_maxReduce(const unsigned long int *send, } } template <> -void MPI_CLASS::call_maxReduce(unsigned long int *x, - const int n, +void MPI_CLASS::call_maxReduce(unsigned long int *x, int n, int *comm_rank_of_max) const { if (comm_rank_of_max == nullptr) { PROFILE_START("maxReduce2", profile_level); @@ -2073,8 +2061,8 @@ void MPI_CLASS::call_maxReduce(unsigned long int *x, // unsigned long long int template <> void MPI_CLASS::call_maxReduce( - const unsigned long long int *send, unsigned long long int *recv, - const int n, int *comm_rank_of_max) const { + const unsigned long long int *send, unsigned long long int *recv, int n, + int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { auto x = new long long int[n]; @@ -2101,7 +2089,7 @@ void MPI_CLASS::call_maxReduce( } template <> void MPI_CLASS::call_maxReduce( - unsigned long long int *x, const int n, int *comm_rank_of_max) const { + unsigned long long int *x, int n, int *comm_rank_of_max) const { auto recv = new unsigned long long int[n]; call_maxReduce(x, recv, n, comm_rank_of_max); for (int i = 0; i < n; i++) @@ -2111,7 +2099,7 @@ void MPI_CLASS::call_maxReduce( // long long int template <> void MPI_CLASS::call_maxReduce(const long long int *x, - long long int *y, const int n, + long long int *y, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { @@ -2130,7 +2118,7 @@ void MPI_CLASS::call_maxReduce(const long long int *x, PROFILE_STOP("maxReduce1", profile_level); } template <> -void MPI_CLASS::call_maxReduce(long long int *x, const int n, +void MPI_CLASS::call_maxReduce(long long int *x, int n, int *comm_rank_of_max) const { auto recv = new long long int[n]; call_maxReduce(x, recv, n, comm_rank_of_max); @@ -2140,7 +2128,7 @@ void MPI_CLASS::call_maxReduce(long long int *x, const int n, } // float template <> -void MPI_CLASS::call_maxReduce(const float *x, float *y, const int n, +void MPI_CLASS::call_maxReduce(const float *x, float *y, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { @@ -2164,7 +2152,7 @@ void MPI_CLASS::call_maxReduce(const float *x, float *y, const int n, PROFILE_STOP("maxReduce1", profile_level); } template <> -void MPI_CLASS::call_maxReduce(float *x, const int n, +void MPI_CLASS::call_maxReduce(float *x, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce2", profile_level); if (comm_rank_of_max == nullptr) { @@ -2193,7 +2181,7 @@ void MPI_CLASS::call_maxReduce(float *x, const int n, } // double template <> -void MPI_CLASS::call_maxReduce(const double *x, double *y, const int n, +void MPI_CLASS::call_maxReduce(const double *x, double *y, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce1", profile_level); if (comm_rank_of_max == nullptr) { @@ -2217,7 +2205,7 @@ void MPI_CLASS::call_maxReduce(const double *x, double *y, const int n, PROFILE_STOP("maxReduce1", profile_level); } template <> -void MPI_CLASS::call_maxReduce(double *x, const int n, +void MPI_CLASS::call_maxReduce(double *x, int n, int *comm_rank_of_max) const { PROFILE_START("maxReduce2", profile_level); if (comm_rank_of_max == nullptr) { @@ -2253,51 +2241,46 @@ void MPI_CLASS::call_maxReduce(double *x, const int n, #ifdef USE_MPI // char template <> -void MPI_CLASS::call_bcast(unsigned char *x, const int n, - const int root) const { +void MPI_CLASS::call_bcast(unsigned char *x, int n, + int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_UNSIGNED_CHAR, root, communicator); PROFILE_STOP("bcast", profile_level); } -template <> -void MPI_CLASS::call_bcast(char *x, const int n, const int root) const { +template <> void MPI_CLASS::call_bcast(char *x, int n, int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_CHAR, root, communicator); PROFILE_STOP("bcast", profile_level); } // int template <> -void MPI_CLASS::call_bcast(unsigned int *x, const int n, - const int root) const { +void MPI_CLASS::call_bcast(unsigned int *x, int n, + int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_UNSIGNED, root, communicator); PROFILE_STOP("bcast", profile_level); } -template <> -void MPI_CLASS::call_bcast(int *x, const int n, const int root) const { +template <> void MPI_CLASS::call_bcast(int *x, int n, int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_INT, root, communicator); PROFILE_STOP("bcast", profile_level); } // float -template <> -void MPI_CLASS::call_bcast(float *x, const int n, const int root) const { +template <> void MPI_CLASS::call_bcast(float *x, int n, int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_FLOAT, root, communicator); PROFILE_STOP("bcast", profile_level); } // double template <> -void MPI_CLASS::call_bcast(double *x, const int n, - const int root) const { +void MPI_CLASS::call_bcast(double *x, int n, int root) const { PROFILE_START("bcast", profile_level); MPI_Bcast(x, n, MPI_DOUBLE, root, communicator); PROFILE_STOP("bcast", profile_level); } #else // We need a concrete instantiation of bcast(x,n,root); -template <> -void MPI_CLASS::call_bcast(char *, const int, const int) const {} +template <> void MPI_CLASS::call_bcast(char *, int, int) const {} #endif /************************************************************************ @@ -2316,8 +2299,8 @@ void MPI_CLASS::barrier() const { #ifdef USE_MPI // char template <> -void MPI_CLASS::send(const char *buf, const int length, - const int recv_proc_number, int tag) const { +void MPI_CLASS::send(const char *buf, int length, int recv_proc_number, + int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); @@ -2329,8 +2312,8 @@ void MPI_CLASS::send(const char *buf, const int length, } // int template <> -void MPI_CLASS::send(const int *buf, const int length, - const int recv_proc_number, int tag) const { +void MPI_CLASS::send(const int *buf, int length, int recv_proc_number, + int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); @@ -2341,8 +2324,8 @@ void MPI_CLASS::send(const int *buf, const int length, } // float template <> -void MPI_CLASS::send(const float *buf, const int length, - const int recv_proc_number, int tag) const { +void MPI_CLASS::send(const float *buf, int length, int recv_proc_number, + int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); @@ -2354,8 +2337,8 @@ void MPI_CLASS::send(const float *buf, const int length, } // double template <> -void MPI_CLASS::send(const double *buf, const int length, - const int recv_proc_number, int tag) const { +void MPI_CLASS::send(const double *buf, int length, + int recv_proc_number, int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); @@ -2368,8 +2351,7 @@ void MPI_CLASS::send(const double *buf, const int length, #else // We need a concrete instantiation of send for use without MPI template <> -void MPI_CLASS::send(const char *buf, const int length, const int, - int tag) const { +void MPI_CLASS::send(const char *buf, int length, int, int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); PROFILE_START("send", profile_level); @@ -2391,8 +2373,8 @@ void MPI_CLASS::send(const char *buf, const int length, const int, #ifdef USE_MPI // char template <> -MPI_Request MPI_CLASS::Isend(const char *buf, const int length, - const int recv_proc, const int tag) const { +MPI_Request MPI_CLASS::Isend(const char *buf, int length, int recv_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2404,8 +2386,8 @@ MPI_Request MPI_CLASS::Isend(const char *buf, const int length, } // int template <> -MPI_Request MPI_CLASS::Isend(const int *buf, const int length, - const int recv_proc, const int tag) const { +MPI_Request MPI_CLASS::Isend(const int *buf, int length, int recv_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2417,8 +2399,8 @@ MPI_Request MPI_CLASS::Isend(const int *buf, const int length, } // float template <> -MPI_Request MPI_CLASS::Isend(const float *buf, const int length, - const int recv_proc, const int tag) const { +MPI_Request MPI_CLASS::Isend(const float *buf, int length, int recv_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2430,8 +2412,8 @@ MPI_Request MPI_CLASS::Isend(const float *buf, const int length, } // double template <> -MPI_Request MPI_CLASS::Isend(const double *buf, const int length, - const int recv_proc, const int tag) const { +MPI_Request MPI_CLASS::Isend(const double *buf, int length, + int recv_proc, int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2444,8 +2426,8 @@ MPI_Request MPI_CLASS::Isend(const double *buf, const int length, #else // We need a concrete instantiation of send for use without mpi template <> -MPI_Request MPI_CLASS::Isend(const char *buf, const int length, const int, - const int tag) const { +MPI_Request MPI_CLASS::Isend(const char *buf, int length, int, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); PROFILE_START("Isend", profile_level); @@ -2472,8 +2454,8 @@ MPI_Request MPI_CLASS::Isend(const char *buf, const int length, const int, /************************************************************************ * Send byte array to another processor. * ************************************************************************/ -void MPI_CLASS::sendBytes(const void *buf, const int number_bytes, - const int recv_proc_number, int tag) const { +void MPI_CLASS::sendBytes(const void *buf, int number_bytes, + int recv_proc_number, int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); send((const char *)buf, number_bytes, recv_proc_number, tag); @@ -2482,7 +2464,7 @@ void MPI_CLASS::sendBytes(const void *buf, const int number_bytes, /************************************************************************ * Non-blocking send byte array to another processor. * ************************************************************************/ -MPI_Request MPI_CLASS::IsendBytes(const void *buf, const int number_bytes, +MPI_Request MPI_CLASS::IsendBytes(const void *buf, int number_bytes, const int recv_proc, const int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); @@ -2496,7 +2478,7 @@ MPI_Request MPI_CLASS::IsendBytes(const void *buf, const int number_bytes, #ifdef USE_MPI // char template <> -void MPI_CLASS::recv(char *buf, int &length, const int send_proc_number, +void MPI_CLASS::recv(char *buf, int &length, int send_proc_number, const bool get_length, int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; @@ -2518,7 +2500,7 @@ void MPI_CLASS::recv(char *buf, int &length, const int send_proc_number, } // int template <> -void MPI_CLASS::recv(int *buf, int &length, const int send_proc_number, +void MPI_CLASS::recv(int *buf, int &length, int send_proc_number, const bool get_length, int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; @@ -2540,7 +2522,7 @@ void MPI_CLASS::recv(int *buf, int &length, const int send_proc_number, } // float template <> -void MPI_CLASS::recv(float *buf, int &length, const int send_proc_number, +void MPI_CLASS::recv(float *buf, int &length, int send_proc_number, const bool get_length, int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; @@ -2562,9 +2544,8 @@ void MPI_CLASS::recv(float *buf, int &length, const int send_proc_number, } // double template <> -void MPI_CLASS::recv(double *buf, int &length, - const int send_proc_number, const bool get_length, - int tag) const { +void MPI_CLASS::recv(double *buf, int &length, int send_proc_number, + const bool get_length, int tag) const { // Set the tag to 0 if it is < 0 tag = (tag >= 0) ? tag : 0; MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); @@ -2586,7 +2567,7 @@ void MPI_CLASS::recv(double *buf, int &length, #else // We need a concrete instantiation of recv for use without mpi template <> -void MPI_CLASS::recv(char *buf, int &length, const int, const bool, +void MPI_CLASS::recv(char *buf, int &length, int, const bool, int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); @@ -2609,8 +2590,8 @@ void MPI_CLASS::recv(char *buf, int &length, const int, const bool, #ifdef USE_MPI // char template <> -MPI_Request MPI_CLASS::Irecv(char *buf, const int length, - const int send_proc, const int tag) const { +MPI_Request MPI_CLASS::Irecv(char *buf, int length, int send_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2622,8 +2603,8 @@ MPI_Request MPI_CLASS::Irecv(char *buf, const int length, } // int template <> -MPI_Request MPI_CLASS::Irecv(int *buf, const int length, - const int send_proc, const int tag) const { +MPI_Request MPI_CLASS::Irecv(int *buf, int length, int send_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2635,8 +2616,8 @@ MPI_Request MPI_CLASS::Irecv(int *buf, const int length, } // float template <> -MPI_Request MPI_CLASS::Irecv(float *buf, const int length, - const int send_proc, const int tag) const { +MPI_Request MPI_CLASS::Irecv(float *buf, int length, int send_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2648,8 +2629,8 @@ MPI_Request MPI_CLASS::Irecv(float *buf, const int length, } // double template <> -MPI_Request MPI_CLASS::Irecv(double *buf, const int length, - const int send_proc, const int tag) const { +MPI_Request MPI_CLASS::Irecv(double *buf, int length, int send_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); MPI_Request request; @@ -2662,8 +2643,7 @@ MPI_Request MPI_CLASS::Irecv(double *buf, const int length, #else // We need a concrete instantiation of irecv for use without mpi template <> -MPI_Request MPI_CLASS::Irecv(char *buf, const int length, const int, - const int tag) const { +MPI_Request MPI_CLASS::Irecv(char *buf, int length, int, int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); PROFILE_START("Irecv", profile_level); @@ -2690,7 +2670,7 @@ MPI_Request MPI_CLASS::Irecv(char *buf, const int length, const int, /************************************************************************ * Recieve byte array to another processor. * ************************************************************************/ -void MPI_CLASS::recvBytes(void *buf, int &number_bytes, const int send_proc, +void MPI_CLASS::recvBytes(void *buf, int &number_bytes, int send_proc, int tag) const { recv((char *)buf, number_bytes, send_proc, false, tag); } @@ -2698,8 +2678,8 @@ void MPI_CLASS::recvBytes(void *buf, int &number_bytes, const int send_proc, /************************************************************************ * Recieve byte array to another processor. * ************************************************************************/ -MPI_Request MPI_CLASS::IrecvBytes(void *buf, const int number_bytes, - const int send_proc, const int tag) const { +MPI_Request MPI_CLASS::IrecvBytes(void *buf, int number_bytes, int send_proc, + int tag) const { MPI_INSIST(tag <= d_maxTag, "Maximum tag value exceeded"); MPI_INSIST(tag >= 0, "tag must be >= 0"); return Irecv((char *)buf, number_bytes, send_proc, tag); @@ -2913,7 +2893,7 @@ void MPI_CLASS::call_allGather(const char *, int, char *, int *, ************************************************************************/ #ifdef USE_MPI template <> -void MPI_CLASS::allToAll(const int n, const unsigned char *send, +void MPI_CLASS::allToAll(int n, const unsigned char *send, unsigned char *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_UNSIGNED_CHAR, (void *)recv, n, @@ -2921,15 +2901,14 @@ void MPI_CLASS::allToAll(const int n, const unsigned char *send, PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const char *send, - char *recv) const { +void MPI_CLASS::allToAll(int n, const char *send, char *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_CHAR, (void *)recv, n, MPI_CHAR, communicator); PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const unsigned int *send, +void MPI_CLASS::allToAll(int n, const unsigned int *send, unsigned int *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_UNSIGNED, (void *)recv, n, MPI_UNSIGNED, @@ -2937,14 +2916,14 @@ void MPI_CLASS::allToAll(const int n, const unsigned int *send, PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const int *send, int *recv) const { +void MPI_CLASS::allToAll(int n, const int *send, int *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_INT, (void *)recv, n, MPI_INT, communicator); PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, +void MPI_CLASS::allToAll(int n, const unsigned long int *send, unsigned long int *recv) const { PROFILE_START("allToAll", profile_level); @@ -2953,7 +2932,7 @@ void MPI_CLASS::allToAll(const int n, PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const long int *send, +void MPI_CLASS::allToAll(int n, const long int *send, long int *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_LONG, (void *)recv, n, MPI_LONG, @@ -2961,15 +2940,14 @@ void MPI_CLASS::allToAll(const int n, const long int *send, PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const float *send, - float *recv) const { +void MPI_CLASS::allToAll(int n, const float *send, float *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_FLOAT, (void *)recv, n, MPI_FLOAT, communicator); PROFILE_STOP("allToAll", profile_level); } template <> -void MPI_CLASS::allToAll(const int n, const double *send, +void MPI_CLASS::allToAll(int n, const double *send, double *recv) const { PROFILE_START("allToAll", profile_level); MPI_Alltoall((void *)send, n, MPI_DOUBLE, (void *)recv, n, MPI_DOUBLE, @@ -3713,4 +3691,28 @@ MPI MPI::loadBalance(double local, std::vector work) { return split(0, key[getRank()]); } + + +/**************************************************************************** + * Function Persistent Communication * + ****************************************************************************/ +template <> +std::shared_ptr MPI::Isend_init(const double *buf, int N, int proc, int tag) const +{ + std::shared_ptr obj( new MPI_Request, []( MPI_Request *req ) { MPI_Request_free( req ); delete req; } ); + MPI_Send_init( buf, N, MPI_DOUBLE, proc, tag, communicator, obj.get() ); + return obj; +} +template<> +std::shared_ptr MPI::Irecv_init(double *buf, int N, int proc, int tag) const +{ + std::shared_ptr obj( new MPI_Request, []( MPI_Request *req ) { MPI_Request_free( req ); delete req; } ); + MPI_Recv_init( buf, N, MPI_DOUBLE, proc, tag, communicator, obj.get() ); + return obj; +} +void MPI::Start( MPI_Request &request ) +{ + MPI_Start( &request ); +} + } // namespace Utilities diff --git a/common/MPI.h b/common/MPI.h index f8849aaf..d249d661 100644 --- a/common/MPI.h +++ b/common/MPI.h @@ -26,6 +26,7 @@ redistribution is prohibited. #include #include #include +#include #include #include #include @@ -173,10 +174,9 @@ public: // Member functions * */ static void - balanceProcesses(const MPI &comm = MPI(MPI_COMM_WORLD), - const int method = 1, + balanceProcesses(const MPI &comm = MPI(MPI_COMM_WORLD), int method = 1, const std::vector &procs = std::vector(), - const int N_min = 1, const int N_max = -1); + int N_min = 1, int N_max = -1); //! Query the level of thread support static ThreadSupport queryThreadSupport(); @@ -420,7 +420,7 @@ public: // Member functions * \param x The input/output array for the reduce * \param n The number of values in the array (must match on all nodes) */ - template void sumReduce(type *x, const int n = 1) const; + template void sumReduce(type *x, int n = 1) const; /** * \brief Sum Reduce @@ -432,7 +432,7 @@ public: // Member functions * \param n The number of values in the array (must match on all nodes) */ template - void sumReduce(const type *x, type *y, const int n = 1) const; + void sumReduce(const type *x, type *y, int n = 1) const; /** * \brief Min Reduce @@ -457,7 +457,7 @@ public: // Member functions * minimum value */ template - void minReduce(type *x, const int n = 1, int *rank_of_min = nullptr) const; + void minReduce(type *x, int n = 1, int *rank_of_min = nullptr) const; /** * \brief Sum Reduce @@ -475,7 +475,7 @@ public: // Member functions * minimum value */ template - void minReduce(const type *x, type *y, const int n = 1, + void minReduce(const type *x, type *y, int n = 1, int *rank_of_min = nullptr) const; /** @@ -501,7 +501,7 @@ public: // Member functions * minimum value */ template - void maxReduce(type *x, const int n = 1, int *rank_of_max = nullptr) const; + void maxReduce(type *x, int n = 1, int *rank_of_max = nullptr) const; /** * \brief Sum Reduce @@ -519,7 +519,7 @@ public: // Member functions * minimum value */ template - void maxReduce(const type *x, type *y, const int n = 1, + void maxReduce(const type *x, type *y, int n = 1, int *rank_of_max = nullptr) const; /** @@ -530,8 +530,7 @@ public: // Member functions * \param y The output array for the scan * \param n The number of values in the array (must match on all nodes) */ - template - void sumScan(const type *x, type *y, const int n = 1) const; + template void sumScan(const type *x, type *y, int n = 1) const; /** * \brief Scan Min Reduce @@ -541,8 +540,7 @@ public: // Member functions * \param y The output array for the scan * \param n The number of values in the array (must match on all nodes) */ - template - void minScan(const type *x, type *y, const int n = 1) const; + template void minScan(const type *x, type *y, int n = 1) const; /** * \brief Scan Max Reduce @@ -552,8 +550,7 @@ public: // Member functions * \param y The output array for the scan * \param n The number of values in the array (must match on all nodes) */ - template - void maxScan(const type *x, type *y, const int n = 1) const; + template void maxScan(const type *x, type *y, int n = 1) const; /** * \brief Broadcast @@ -561,7 +558,7 @@ public: // Member functions * \param value The input value for the broadcast. * \param root The processor performing the broadcast */ - template type bcast(const type &value, const int root) const; + template type bcast(const type &value, int root) const; /** * \brief Broadcast @@ -570,8 +567,7 @@ public: // Member functions * \param n The number of values in the array (must match on all nodes) * \param root The processor performing the broadcast */ - template - void bcast(type *value, const int n, const int root) const; + template void bcast(type *value, int n, int root) const; /** * Perform a global barrier across all processors. @@ -595,8 +591,7 @@ public: // Member functions * The matching recv must share this tag. */ template - void send(const type *buf, const int length, const int recv, - int tag = 0) const; + void send(const type *buf, int length, int recv, int tag = 0) const; /*! * @brief This function sends an MPI message with an array of bytes @@ -611,8 +606,7 @@ public: // Member functions * to be sent with this message. Default tag is 0. * The matching recv must share this tag. */ - void sendBytes(const void *buf, const int N_bytes, const int recv, - int tag = 0) const; + void sendBytes(const void *buf, int N_bytes, int recv, int tag = 0) const; /*! * @brief This function sends an MPI message with an array @@ -627,8 +621,8 @@ public: // Member functions * to be sent with this message. */ template - MPI_Request Isend(const type *buf, const int length, const int recv_proc, - const int tag) const; + MPI_Request Isend(const type *buf, int length, int recv_proc, + int tag) const; /*! * @brief This function sends an MPI message with an array of bytes @@ -642,8 +636,8 @@ public: // Member functions * @param tag Integer argument specifying an integer tag * to be sent with this message. */ - MPI_Request IsendBytes(const void *buf, const int N_bytes, - const int recv_proc, const int tag) const; + MPI_Request IsendBytes(const void *buf, int N_bytes, int recv_proc, + int tag) const; /*! * @brief This function receives an MPI message with a data @@ -662,7 +656,7 @@ public: // Member functions * by the tag of the incoming message. Default tag is 0. */ template - inline void recv(type *buf, int length, const int send, int tag) const { + inline void recv(type *buf, int length, int send, int tag) const { int length2 = length; recv(buf, length2, send, false, tag); } @@ -687,7 +681,7 @@ public: // Member functions * by the tag of the incoming message. Default tag is 0. */ template - void recv(type *buf, int &length, const int send, const bool get_length, + void recv(type *buf, int &length, int send, const bool get_length, int tag) const; /*! @@ -703,7 +697,7 @@ public: // Member functions * must be matched by the tag of the incoming message. Default * tag is 0. */ - void recvBytes(void *buf, int &N_bytes, const int send, int tag = 0) const; + void recvBytes(void *buf, int &N_bytes, int send, int tag = 0) const; /*! * @brief This function receives an MPI message with a data @@ -716,8 +710,7 @@ public: // Member functions * be matched by the tag of the incoming message. */ template - MPI_Request Irecv(type *buf, const int length, const int send_proc, - const int tag) const; + MPI_Request Irecv(type *buf, int length, int send_proc, int tag) const; /*! * @brief This function receives an MPI message with an array of @@ -731,8 +724,8 @@ public: // Member functions * @param tag Integer argument specifying a tag which must * be matched by the tag of the incoming message. */ - MPI_Request IrecvBytes(void *buf, const int N_bytes, const int send_proc, - const int tag) const; + MPI_Request IrecvBytes(void *buf, int N_bytes, int send_proc, + int tag) const; /*! * @brief This function sends and recieves data using a blocking call @@ -741,6 +734,39 @@ public: // Member functions void sendrecv(const type *sendbuf, int sendcount, int dest, int sendtag, type *recvbuf, int recvcount, int source, int recvtag) const; + /*! + * @brief This function sets up an Isend call (see MPI_Send_init) + * @param buf Pointer to array buffer with length integers. + * @param length Number of integers in buf that we want to send. + * @param recv_proc Receiving processor number. + * @param tag Tag to send + * @return Returns an MPI_Request. + * Note this returns a unique pointer so the user does not + * need to manually free the request + */ + template + std::shared_ptr Isend_init(const type *buf, int length, int recv_proc, + int tag) const; + + /*! + * @brief This function sets up an Irecv call (see MPI_Recv_init) + * @param buf Pointer to integer array buffer with capacity of length integers. + * @param length Maximum number of values that can be stored in buf. + * @param send_proc Processor number of sender. + * @param tag Tag to match + * @return Returns an MPI_Request. + * Note this returns a unique pointer so the user does not + * need to manually free the request + */ + template + std::shared_ptr Irecv_init(type *buf, int length, int send_proc, int tag) const; + + /*! + * @brief Start the MPI communication + * @param request Request to start + */ + void Start( MPI_Request &request ); + /*! * Each processor sends every other processor a single value. * @param[in] x Input value for allGather @@ -792,7 +818,7 @@ public: // Member functions * and the sizes and displacements will be returned (if desired). */ template - int allGather(const type *send_data, const int send_cnt, type *recv_data, + int allGather(const type *send_data, int send_cnt, type *recv_data, int *recv_cnt = nullptr, int *recv_disp = nullptr, bool known_recv = false) const; @@ -822,7 +848,7 @@ public: // Member functions * @param recv_data Output array of received values (nxN) */ template - void allToAll(const int n, const type *send_data, type *recv_data) const; + void allToAll(int n, const type *send_data, type *recv_data) const; /*! * Each processor sends an array of data to the different processors. @@ -995,23 +1021,20 @@ public: // Member functions MPI loadBalance(double localPerformance, std::vector work); private: // Private helper functions for templated MPI operations; - template void call_sumReduce(type *x, const int n = 1) const; + template void call_sumReduce(type *x, int n = 1) const; template - void call_sumReduce(const type *x, type *y, const int n = 1) const; + void call_sumReduce(const type *x, type *y, int n = 1) const; template - void call_minReduce(type *x, const int n = 1, + void call_minReduce(type *x, int n = 1, int *rank_of_min = nullptr) const; + template + void call_minReduce(const type *x, type *y, int n = 1, int *rank_of_min = nullptr) const; template - void call_minReduce(const type *x, type *y, const int n = 1, - int *rank_of_min = nullptr) const; + void call_maxReduce(type *x, int n = 1, int *rank_of_max = nullptr) const; template - void call_maxReduce(type *x, const int n = 1, + void call_maxReduce(const type *x, type *y, int n = 1, int *rank_of_max = nullptr) const; - template - void call_maxReduce(const type *x, type *y, const int n = 1, - int *rank_of_max = nullptr) const; - template - void call_bcast(type *x, const int n, const int root) const; + template void call_bcast(type *x, int n, int root) const; template void call_allGather(const type &x_in, type *x_out) const; template diff --git a/common/Membrane.cpp b/common/Membrane.cpp new file mode 100644 index 00000000..ccc530cf --- /dev/null +++ b/common/Membrane.cpp @@ -0,0 +1,850 @@ +/* Membrane class for lattice Boltzmann models */ + +#include "common/Membrane.h" +#include "analysis/distance.h" + +Membrane::Membrane(std::shared_ptr sComm, int *dvcNeighborList, int Nsites) { + + Np = Nsites; + initialNeighborList = new int[18*Np]; + ScaLBL_AllocateDeviceMemory((void **)&NeighborList, 18*Np*sizeof(int)); + Lock=false; // unlock the communicator + //...................................................................................... + // Create a separate copy of the communicator for the device + MPI_COMM_SCALBL = sComm->MPI_COMM_SCALBL.dup(); + + ScaLBL_CopyToHost(initialNeighborList, dvcNeighborList, 18*Np*sizeof(int)); + sComm->MPI_COMM_SCALBL.barrier(); + ScaLBL_CopyToDevice(NeighborList, initialNeighborList, 18*Np*sizeof(int)); + + /* Copy communication lists */ + //...................................................................................... + //Lock=false; // unlock the communicator + //...................................................................................... + // Create a separate copy of the communicator for the device + //MPI_COMM_SCALBL = sComm->Comm.dup(); + //...................................................................................... + // Copy the domain size and communication information directly from sComm + Nx = sComm->Nx; + Ny = sComm->Ny; + Nz = sComm->Nz; + N = Nx*Ny*Nz; + //next=0; + rank=sComm->rank; + rank_x=sComm->rank_x; + rank_y=sComm->rank_y; + rank_z=sComm->rank_z; + rank_X=sComm->rank_X; + rank_Y=sComm->rank_Y; + rank_Z=sComm->rank_Z; + + if (rank == 0){ + printf("**** Creating membrane data structure ****** \n"); + printf(" Number of active lattice sites (rank = %i): %i \n",rank, Np); + } + + sendCount_x=sComm->sendCount_x; + sendCount_y=sComm->sendCount_y; + sendCount_z=sComm->sendCount_z; + sendCount_X=sComm->sendCount_X; + sendCount_Y=sComm->sendCount_Y; + sendCount_Z=sComm->sendCount_Z; + + recvCount_x=sComm->recvCount_x; + recvCount_y=sComm->recvCount_y; + recvCount_z=sComm->recvCount_z; + recvCount_X=sComm->recvCount_X; + recvCount_Y=sComm->recvCount_Y; + recvCount_Z=sComm->recvCount_Z; + + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_x, recvCount_x*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_y, recvCount_y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_z, recvCount_z*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_X, recvCount_X*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_Y, recvCount_Y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcSendList_Z, recvCount_Z*sizeof(int)); + + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_x, recvCount_x*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_y, recvCount_y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_z, recvCount_z*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_X, recvCount_X*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_Y, recvCount_Y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvLinks_Z, recvCount_Z*sizeof(int)); + + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_x, recvCount_x*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_y, recvCount_y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_z, recvCount_z*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_X, recvCount_X*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_Y, recvCount_Y*sizeof(int)); + ScaLBL_AllocateZeroCopy((void **) &dvcRecvDist_Z, recvCount_Z*sizeof(int)); + + ScaLBL_AllocateZeroCopy((void **) &sendbuf_x, sendCount_x*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &sendbuf_y, sendCount_y*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &sendbuf_z, sendCount_z*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &sendbuf_X, sendCount_X*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &sendbuf_Y, sendCount_Y*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &sendbuf_Z, sendCount_Z*sizeof(double)); + + ScaLBL_AllocateZeroCopy((void **) &recvbuf_x, recvCount_x*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &recvbuf_y, recvCount_y*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &recvbuf_z, recvCount_z*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &recvbuf_X, recvCount_X*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &recvbuf_Y, recvCount_Y*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &recvbuf_Z, recvCount_Z*sizeof(double)); + + sendCount_x=sComm->copySendList("x", dvcSendList_x); + sendCount_y=sComm->copySendList("y", dvcSendList_y); + sendCount_z=sComm->copySendList("z", dvcSendList_z); + sendCount_X=sComm->copySendList("X", dvcSendList_X); + sendCount_Y=sComm->copySendList("Y", dvcSendList_Y); + sendCount_Z=sComm->copySendList("Z", dvcSendList_Z); + + recvCount_x=sComm->copyRecvList("x", dvcRecvDist_x); + recvCount_y=sComm->copyRecvList("y", dvcRecvDist_y); + recvCount_z=sComm->copyRecvList("z", dvcRecvDist_z); + recvCount_X=sComm->copyRecvList("X", dvcRecvDist_X); + recvCount_Y=sComm->copyRecvList("Y", dvcRecvDist_Y); + recvCount_Z=sComm->copyRecvList("Z", dvcRecvDist_Z); + +} + +Membrane::~Membrane() { + + delete [] initialNeighborList; + delete [] membraneLinks; + delete [] membraneTag; + delete [] membraneDist; + + ScaLBL_FreeDeviceMemory( coefficient_x ); + ScaLBL_FreeDeviceMemory( coefficient_X ); + ScaLBL_FreeDeviceMemory( coefficient_y ); + ScaLBL_FreeDeviceMemory( coefficient_Y ); + ScaLBL_FreeDeviceMemory( coefficient_z ); + ScaLBL_FreeDeviceMemory( coefficient_Z ); + + ScaLBL_FreeDeviceMemory( NeighborList ); + ScaLBL_FreeDeviceMemory( MembraneLinks ); + ScaLBL_FreeDeviceMemory( MembraneCoef ); + ScaLBL_FreeDeviceMemory( MembraneDistance ); + + ScaLBL_FreeDeviceMemory( sendbuf_x ); + ScaLBL_FreeDeviceMemory( sendbuf_X ); + ScaLBL_FreeDeviceMemory( sendbuf_y ); + ScaLBL_FreeDeviceMemory( sendbuf_Y ); + ScaLBL_FreeDeviceMemory( sendbuf_z ); + ScaLBL_FreeDeviceMemory( sendbuf_Z ); +/* ScaLBL_FreeDeviceMemory( sendbuf_xy ); + ScaLBL_FreeDeviceMemory( sendbuf_xY ); + ScaLBL_FreeDeviceMemory( sendbuf_Xy ); + ScaLBL_FreeDeviceMemory( sendbuf_XY ); + ScaLBL_FreeDeviceMemory( sendbuf_xz ); + ScaLBL_FreeDeviceMemory( sendbuf_xZ ); + ScaLBL_FreeDeviceMemory( sendbuf_Xz ); + ScaLBL_FreeDeviceMemory( sendbuf_XZ ); + ScaLBL_FreeDeviceMemory( sendbuf_yz ); + ScaLBL_FreeDeviceMemory( sendbuf_yZ ); + ScaLBL_FreeDeviceMemory( sendbuf_Yz ); + ScaLBL_FreeDeviceMemory( sendbuf_YZ ); + */ + ScaLBL_FreeDeviceMemory( recvbuf_x ); + ScaLBL_FreeDeviceMemory( recvbuf_X ); + ScaLBL_FreeDeviceMemory( recvbuf_y ); + ScaLBL_FreeDeviceMemory( recvbuf_Y ); + ScaLBL_FreeDeviceMemory( recvbuf_z ); + ScaLBL_FreeDeviceMemory( recvbuf_Z ); + /* + ScaLBL_FreeDeviceMemory( recvbuf_xy ); + ScaLBL_FreeDeviceMemory( recvbuf_xY ); + ScaLBL_FreeDeviceMemory( recvbuf_Xy ); + ScaLBL_FreeDeviceMemory( recvbuf_XY ); + ScaLBL_FreeDeviceMemory( recvbuf_xz ); + ScaLBL_FreeDeviceMemory( recvbuf_xZ ); + ScaLBL_FreeDeviceMemory( recvbuf_Xz ); + ScaLBL_FreeDeviceMemory( recvbuf_XZ ); + ScaLBL_FreeDeviceMemory( recvbuf_yz ); + ScaLBL_FreeDeviceMemory( recvbuf_yZ ); + ScaLBL_FreeDeviceMemory( recvbuf_Yz ); + ScaLBL_FreeDeviceMemory( recvbuf_YZ ); + */ + ScaLBL_FreeDeviceMemory( dvcSendList_x ); + ScaLBL_FreeDeviceMemory( dvcSendList_X ); + ScaLBL_FreeDeviceMemory( dvcSendList_y ); + ScaLBL_FreeDeviceMemory( dvcSendList_Y ); + ScaLBL_FreeDeviceMemory( dvcSendList_z ); + ScaLBL_FreeDeviceMemory( dvcSendList_Z ); + /* + ScaLBL_FreeDeviceMemory( dvcSendList_xy ); + ScaLBL_FreeDeviceMemory( dvcSendList_xY ); + ScaLBL_FreeDeviceMemory( dvcSendList_Xy ); + ScaLBL_FreeDeviceMemory( dvcSendList_XY ); + ScaLBL_FreeDeviceMemory( dvcSendList_xz ); + ScaLBL_FreeDeviceMemory( dvcSendList_xZ ); + ScaLBL_FreeDeviceMemory( dvcSendList_Xz ); + ScaLBL_FreeDeviceMemory( dvcSendList_XZ ); + ScaLBL_FreeDeviceMemory( dvcSendList_yz ); + ScaLBL_FreeDeviceMemory( dvcSendList_yZ ); + ScaLBL_FreeDeviceMemory( dvcSendList_Yz ); + ScaLBL_FreeDeviceMemory( dvcSendList_YZ ); + ScaLBL_FreeDeviceMemory( dvcRecvList_x ); + ScaLBL_FreeDeviceMemory( dvcRecvList_X ); + ScaLBL_FreeDeviceMemory( dvcRecvList_y ); + ScaLBL_FreeDeviceMemory( dvcRecvList_Y ); + ScaLBL_FreeDeviceMemory( dvcRecvList_z ); + ScaLBL_FreeDeviceMemory( dvcRecvList_Z ); + ScaLBL_FreeDeviceMemory( dvcRecvList_xy ); + ScaLBL_FreeDeviceMemory( dvcRecvList_xY ); + ScaLBL_FreeDeviceMemory( dvcRecvList_Xy ); + ScaLBL_FreeDeviceMemory( dvcRecvList_XY ); + ScaLBL_FreeDeviceMemory( dvcRecvList_xz ); + ScaLBL_FreeDeviceMemory( dvcRecvList_xZ ); + ScaLBL_FreeDeviceMemory( dvcRecvList_Xz ); + ScaLBL_FreeDeviceMemory( dvcRecvList_XZ ); + ScaLBL_FreeDeviceMemory( dvcRecvList_yz ); + ScaLBL_FreeDeviceMemory( dvcRecvList_yZ ); + ScaLBL_FreeDeviceMemory( dvcRecvList_Yz ); + ScaLBL_FreeDeviceMemory( dvcRecvList_YZ ); + */ + ScaLBL_FreeDeviceMemory( dvcRecvLinks_x ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_X ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_y ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_Y ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_z ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_Z ); + /* + ScaLBL_FreeDeviceMemory( dvcRecvLinks_xy ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_xY ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_Xy ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_XY ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_xz ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_xZ ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_Xz ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_XZ ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_yz ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_yZ ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_Yz ); + ScaLBL_FreeDeviceMemory( dvcRecvLinks_YZ ); + */ + ScaLBL_FreeDeviceMemory( dvcRecvDist_x ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_X ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_y ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_Y ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_z ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_Z ); + /* + ScaLBL_FreeDeviceMemory( dvcRecvDist_xy ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_xY ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_Xy ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_XY ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_xz ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_xZ ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_Xz ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_XZ ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_yz ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_yZ ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_Yz ); + ScaLBL_FreeDeviceMemory( dvcRecvDist_YZ ); + */ +} + +int Membrane::Create(DoubleArray &Distance, IntArray &Map){ + int mlink = 0; + int i,j,k; + + int idx, neighbor; + double dist, locdist; + + if (rank == 0) printf(" Copy initial neighborlist... \n"); + int * neighborList = new int[18*Np]; + /* Copy neighborList */ + for (int idx=0; idx 7){ + neighbor=Map(i-1,j-1,k); + dist=Distance(i-1,j-1,k); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[6*Np+idx]=idx + 8*Np; + } + + neighbor=Map(i+1,j+1,k); + dist=Distance(i+1,j+1,k); + if (dist*locdist < 0.0){ + neighborList[7*Np+idx]=idx + 7*Np; + mlink++; + } + + neighbor=Map(i-1,j+1,k); + dist=Distance(i-1,j+1,k); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[8*Np+idx]=idx + 10*Np; + } + + neighbor=Map(i+1,j-1,k); + dist=Distance(i+1,j-1,k); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[9*Np+idx]=idx + 9*Np; + mlink++; + } + + neighbor=Map(i-1,j,k-1); + dist=Distance(i-1,j,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[10*Np+idx]=idx + 12*Np; + } + + neighbor=Map(i+1,j,k+1); + dist=Distance(i+1,j,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[11*Np+idx]=idx + 11*Np; + mlink++; + } + + neighbor=Map(i-1,j,k+1); + dist=Distance(i-1,j,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[12*Np+idx]=idx + 14*Np; + } + + neighbor=Map(i+1,j,k-1); + dist=Distance(i+1,j,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[13*Np+idx]=idx + 13*Np; + mlink++; + } + + neighbor=Map(i,j-1,k-1); + dist=Distance(i,j-1,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[14*Np+idx]=idx + 16*Np; + } + + neighbor=Map(i,j+1,k+1); + dist=Distance(i,j+1,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[15*Np+idx]=idx + 15*Np; + mlink++; + } + + neighbor=Map(i,j-1,k+1); + dist=Distance(i,j-1,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[16*Np+idx]=idx + 18*Np; + } + + neighbor=Map(i,j+1,k-1); + dist=Distance(i,j+1,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + neighborList[17*Np+idx]=idx + 17*Np; + mlink++; + } + } + } + } + } + } + + /* allocate memory */ + membraneTag = new int [mlink]; + membraneLinks = new int [2*mlink]; + membraneDist = new double [2*mlink]; + membraneLinkCount = mlink; + + if (rank == 0) printf(" (cut %i links crossing membrane) \n",mlink); + + /* construct the membrane*/ + /* * + * Sites inside the membrane (negative distance) -- store at 2*mlink + * Sites outside the membrane (positive distance) -- store at 2*mlink+1 + */ + if (rank == 0) printf(" Construct membrane data structures... \n"); + mlink = 0; + int localSite = 0; int neighborSite = 0; + for (k=1;k 7){ + + neighbor=Map(i+1,j+1,k); + dist=Distance(i+1,j+1,k); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 7*Np; + membraneLinks[neighborSite] = neighbor+8*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + + neighbor=Map(i+1,j-1,k); + dist=Distance(i+1,j-1,k); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 9*Np; + membraneLinks[neighborSite] = neighbor + 10*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + + neighbor=Map(i+1,j,k+1); + dist=Distance(i+1,j,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 11*Np; + membraneLinks[neighborSite] = neighbor + 12*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + + neighbor=Map(i+1,j,k-1); + dist=Distance(i+1,j,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 13*Np; + membraneLinks[neighborSite] = neighbor + 14*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + + neighbor=Map(i,j+1,k+1); + dist=Distance(i,j+1,k+1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 15*Np; + membraneLinks[neighborSite] = neighbor + 16*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + + neighbor=Map(i,j+1,k-1); + dist=Distance(i,j+1,k-1); + if (dist*locdist < 0.0 && !(neighbor<0)){ + if (locdist < 0.0){ + localSite = 2*mlink; + neighborSite = 2*mlink+1; + } + else{ + localSite = 2*mlink+1; + neighborSite = 2*mlink; + } + membraneLinks[localSite] = idx + 17*Np; + membraneLinks[neighborSite] = neighbor + 18*Np; + membraneDist[localSite] = locdist; + membraneDist[neighborSite] = dist; + mlink++; + } + } + } + } + } + } + + if (rank == 0) printf(" Create device data structures... \n"); + /* Create device copies of data structures */ + ScaLBL_AllocateDeviceMemory((void **)&MembraneLinks, 2*mlink*sizeof(int)); + ScaLBL_AllocateDeviceMemory((void **)&MembraneCoef, 2*mlink*sizeof(double)); + //ScaLBL_AllocateDeviceMemory((void **)&MembraneDistance, 2*mlink*sizeof(double)); + ScaLBL_AllocateDeviceMemory((void **)&MembraneDistance, Nx*Ny*Nz*sizeof(double)); + + ScaLBL_CopyToDevice(NeighborList, neighborList, 18*Np*sizeof(int)); + ScaLBL_CopyToDevice(MembraneLinks, membraneLinks, 2*mlink*sizeof(int)); + //ScaLBL_CopyToDevice(MembraneDistance, membraneDist, 2*mlink*sizeof(double)); + ScaLBL_CopyToDevice(MembraneDistance, Distance.data(), Nx*Ny*Nz*sizeof(double)); + + + int *dvcTmpMap; + ScaLBL_AllocateDeviceMemory((void **)&dvcTmpMap, sizeof(int) * Np); + int *TmpMap; + TmpMap = new int[Np]; + for (int k = 1; k < Nz - 1; k++) { + for (int j = 1; j < Ny - 1; j++) { + for (int i = 1; i < Nx - 1; i++) { + int idx = Map(i, j, k); + if (!(idx < 0)) + TmpMap[idx] = k * Nx * Ny + j * Nx + i; + } + } + } + ScaLBL_CopyToDevice(dvcTmpMap, TmpMap, sizeof(int) * Np); + + //int Membrane::D3Q7_MapRecv(int Cqx, int Cqy, int Cqz, int *d3q19_recvlist, + // int count, int *membraneRecvLabels, DoubleArray &Distance, int *dvcMap){ + if (rank == 0) printf(" Construct communication data structures... \n"); + /* Re-organize communication based on membrane structure*/ + //...dvcMap recieve list for the X face: q=2,8,10,12,14 ................................. + linkCount_X[0] = D3Q7_MapRecv(-1,0,0, dvcRecvDist_X,recvCount_X,dvcRecvLinks_X,Distance,dvcTmpMap); + //................................................................................... + //...dvcMap recieve list for the x face: q=1,7,9,11,13.................................. + linkCount_x[0] = D3Q7_MapRecv(1,0,0, dvcRecvDist_x,recvCount_x,dvcRecvLinks_x,Distance,dvcTmpMap); + //................................................................................... + //...dvcMap recieve list for the y face: q=4,8,9,16,18 ................................... + linkCount_Y[0] = D3Q7_MapRecv(0,-1,0, dvcRecvDist_Y,recvCount_Y,dvcRecvLinks_Y,Distance,dvcTmpMap); + //................................................................................... + //...dvcMap recieve list for the Y face: q=3,7,10,15,17 .................................. + linkCount_y[0] = D3Q7_MapRecv(0,1,0, dvcRecvDist_y,recvCount_y,dvcRecvLinks_y,Distance,dvcTmpMap); + //................................................................................... + //...dvcMap recieve list for the z face<<<6,12,13,16,17).............................................. + linkCount_Z[0] = D3Q7_MapRecv(0,0,-1, dvcRecvDist_Z,recvCount_Z,dvcRecvLinks_Z,Distance,dvcTmpMap); + + //...dvcMap recieve list for the Z face<<<5,11,14,15,18).............................................. + linkCount_z[0] = D3Q7_MapRecv(0,0,1, dvcRecvDist_z,recvCount_z,dvcRecvLinks_z,Distance,dvcTmpMap); + //.................................................................................. + + //...................................................................................... + MPI_COMM_SCALBL.barrier(); + ScaLBL_DeviceBarrier(); + //....................................................................... + SendCount = sendCount_x+sendCount_X+sendCount_y+sendCount_Y+sendCount_z+sendCount_Z; + RecvCount = recvCount_x+recvCount_X+recvCount_y+recvCount_Y+recvCount_z+recvCount_Z; + CommunicationCount = SendCount+RecvCount; + //...................................................................................... + + //...................................................................................... + // Allocate membrane coefficient buffers (for d3q7 recv) + ScaLBL_AllocateZeroCopy((void **) &coefficient_x, 2*(recvCount_x )*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &coefficient_X, 2*(recvCount_X)*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &coefficient_y, 2*(recvCount_y)*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &coefficient_Y, 2*(recvCount_Y)*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &coefficient_z, 2*(recvCount_z)*sizeof(double)); + ScaLBL_AllocateZeroCopy((void **) &coefficient_Z, 2*(recvCount_Z)*sizeof(double)); + //...................................................................................... + + ScaLBL_FreeDeviceMemory (dvcTmpMap); + delete [] neighborList; + delete [] TmpMap; + return mlink; +} + +int Membrane::D3Q7_MapRecv(int Cqx, int Cqy, int Cqz, int *d3q19_recvlist, + int count, int *membraneRecvLabels, DoubleArray &Distance, int *dvcMap){ + + int i,j,k,n,nn,idx; + double distanceNonLocal,distanceLocal; + int * ReturnLabels; + ReturnLabels=new int [count]; + int * list; + list=new int [count]; + ScaLBL_CopyToHost(list, d3q19_recvlist, count*sizeof(int)); + + int *TmpMap; + TmpMap=new int [Np]; + ScaLBL_CopyToHost(TmpMap, dvcMap, Np*sizeof(int)); + + int countMembraneLinks=0; + for (idx=0; idx 0.0) ReturnLabels[idx] = 1; + else ReturnLabels[idx] = 2; + countMembraneLinks++; + } + } + // Return updated version to the device + ScaLBL_CopyToDevice(membraneRecvLabels, ReturnLabels, count*sizeof(int)); + + // clean up the work arrays + delete [] ReturnLabels; + delete [] TmpMap; + delete [] list; + return countMembraneLinks; +} + +void Membrane::SendD3Q7AA(double *dist){ + + if (Lock==true){ + ERROR("Membrane Error (SendD3Q7): Membrane communicator is locked -- did you forget to match Send/Recv calls?"); + } + else{ + Lock=true; + } + // assign tag of 37 to D3Q7 communication + sendtag = recvtag = 37; + ScaLBL_DeviceBarrier(); + // Pack the distributions + //...Packing for x face(q=2)................................ + ScaLBL_D3Q19_Pack(2,dvcSendList_x,0,sendCount_x,sendbuf_x,dist,Np); + req2[0] = MPI_COMM_SCALBL.Irecv(recvbuf_X, recvCount_X,rank_X,recvtag); + req1[0] = MPI_COMM_SCALBL.Isend(sendbuf_x, sendCount_x,rank_x,sendtag); + //...Packing for X face(q=1)................................ + ScaLBL_D3Q19_Pack(1,dvcSendList_X,0,sendCount_X,sendbuf_X,dist,Np); + req2[1] = MPI_COMM_SCALBL.Irecv(recvbuf_x, recvCount_x,rank_x,recvtag); + req1[1] = MPI_COMM_SCALBL.Isend(sendbuf_X, sendCount_X,rank_X,sendtag); + //for (int idx=0; idx db){ +void Membrane::AssignCoefficients(int *Map, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, + double ThresholdMassFractionOut){ + /* Assign mass transfer coefficients to the membrane data structure */ + + if (membraneLinkCount > 0) + ScaLBL_D3Q7_Membrane_AssignLinkCoef(MembraneLinks, Map, MembraneDistance, Psi, MembraneCoef, + Threshold, MassFractionIn, MassFractionOut, ThresholdMassFractionIn, ThresholdMassFractionOut, + membraneLinkCount, Nx, Ny, Nz, Np); + + if (linkCount_X[0] < recvCount_X) + ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo(-1,0,0,Map,MembraneDistance,Psi,Threshold, + MassFractionIn,MassFractionOut,ThresholdMassFractionIn,ThresholdMassFractionOut, + dvcRecvDist_X,dvcRecvLinks_X,coefficient_X,0,linkCount_X[0],recvCount_X, + Np,Nx,Ny,Nz); + + if (linkCount_x[0] < recvCount_x) + ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo(1,0,0,Map,MembraneDistance,Psi,Threshold, + MassFractionIn,MassFractionOut,ThresholdMassFractionIn,ThresholdMassFractionOut, + dvcRecvDist_x,dvcRecvLinks_x,coefficient_x,0,linkCount_x[0],recvCount_x, + Np,Nx,Ny,Nz); + + if (linkCount_Y[0] < recvCount_Y) + ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo(0,-1,0,Map,MembraneDistance,Psi,Threshold, + MassFractionIn,MassFractionOut,ThresholdMassFractionIn,ThresholdMassFractionOut, + dvcRecvDist_Y,dvcRecvLinks_Y,coefficient_Y,0,linkCount_Y[0],recvCount_Y, + Np,Nx,Ny,Nz); + + if (linkCount_y[0] +#include +#include +#include +#include +#include +#include + +#include "common/ScaLBL.h" + +/** +* \brief Unpack D3Q19 distributions after communication using links determined based on membrane location +* @param q - index for distribution based on D3Q19 discrete velocity structure +* @param list - list of distributions to communicate +* @param recvbuf - memory buffer where recieved values have been stored +* @param count - number of values to unppack +* @param dist - memory buffer to hold the distributions +* @param N - size of the distributions (derived from Domain structure) +*/ +extern "C" void ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef); + +/** +* \brief Set custom link rules for D3Q19 distribution based on membrane location +* @param q - index for distribution based on D3Q19 discrete velocity structure +* @param list - list of distributions to communicate +* @param links - list of active links based on the membrane location +* @param coef - coefficient to determine the local mass transport for each membrane link +* @param start - index to start parsing the list +* @param offset - offset to start reading membrane links +* @param count - number of values to unppack +* @param recvbuf - memory buffer where recieved values have been stored +* @param dist - memory buffer to hold the distributions +* @param N - size of the distributions (derived from Domain structure) +*/ +extern "C" void Membrane_D3Q19_Transport(int q, int *list, int *links, double *coef, int start, int offset, + int linkCount, double *recvbuf, double *dist, int N); + +/** + * \class Membrane + * @brief + * The Membrane class operates on ScaLBL data structures to insert membrane + * + */ + +class Membrane { +public: + int Np; + int Nx,Ny,Nz,N; + int membraneLinkCount; + + int *initialNeighborList; // original neighborlist + int *NeighborList; // modified neighborlist + + /* host data structures */ + int *membraneLinks; // D3Q7 links that cross membrane + int *membraneTag; // label each link in the membrane + double *membraneDist; // distance to membrane for each linked site + double *membraneOrientation; // distance to membrane for each linked site + + /* + * Device data structures + */ + int *MembraneLinks; + double *MembraneCoef; // mass transport coefficient for the membrane + double *MembraneDistance; + double *MembraneOrientation; + + /** + * \brief Create a flow adaptor to operate on the LB model + * @param ScaLBL - originating data structures + * @param neighborList - list of neighbors for each site + */ + //Membrane(std::shared_ptr Dm, int *initialNeighborList, int Nsites); + Membrane(std::shared_ptr sComm, int *dvcNeighborList, int Nsites); + + /** + * \brief Destructor + */ + ~Membrane(); + + /** + * \brief Create membrane + * \details Create membrane structure from signed distance function + * @param Dm - domain structure + * @param Distance - signed distance to membrane + * @param Map - mapping between regular layout and compact layout + */ + int Create(DoubleArray &Distance, IntArray &Map); + + void SendD3Q7AA(double *dist); + void RecvD3Q7AA(double *dist); + void AssignCoefficients(int *Map, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, + double ThresholdMassFractionOut); + void IonTransport(double *dist, double *den); + //...................................................................................... + // Buffers to store data sent and recieved by this MPI process + double *sendbuf_x, *sendbuf_y, *sendbuf_z, *sendbuf_X, *sendbuf_Y, *sendbuf_Z; + double *recvbuf_x, *recvbuf_y, *recvbuf_z, *recvbuf_X, *recvbuf_Y, *recvbuf_Z; + //...................................................................................... + +private: + bool Lock; // use Lock to make sure only one call at a time to protect data in transit + int sendtag, recvtag; + int iproc,jproc,kproc; + int nprocx,nprocy,nprocz; + // Give the object it's own MPI communicator + RankInfoStruct rank_info; + Utilities::MPI MPI_COMM_SCALBL; // MPI Communicator for this domain + MPI_Request req1[18],req2[18]; + /** + * \brief Set up membrane communication + * \details associate p2p communication links to membrane where necessary + * returns the number of membrane links + * regular communications are stored in the first part of the list + * membrane communications are stored in the last part of the list + * @param Cqx - discrete velocity (x) + * @param Cqy - discrete velocity (y) + * @param Cqz - discrete velocity (z) + * @param d3q19_recvlist - device array with the saved list + * @param count - number recieved values + * @param membraneRecvLabels - sorted list with regular and membrane links + * @param Distance - signed distance to membrane + * @param dvcMap - data structure used to define mapping between dense and sparse representation + * @param Np - number of sites in dense representation + * */ + int D3Q7_MapRecv(int Cqx, int Cqy, int Cqz, int *d3q19_recvlist, + int count, int *membraneRecvLabels, DoubleArray &Distance, int *dvcMap); + //...................................................................................... + // MPI ranks for all 18 neighbors + //...................................................................................... + // These variables are all private to prevent external things from modifying them!! + //...................................................................................... + int rank; + int rank_x,rank_y,rank_z,rank_X,rank_Y,rank_Z; + int rank_xy,rank_XY,rank_xY,rank_Xy; + int rank_xz,rank_XZ,rank_xZ,rank_Xz; + int rank_yz,rank_YZ,rank_yZ,rank_Yz; + //...................................................................................... + int SendCount, RecvCount, CommunicationCount; + //...................................................................................... + int sendCount_x, sendCount_y, sendCount_z, sendCount_X, sendCount_Y, sendCount_Z; + //...................................................................................... + int recvCount_x, recvCount_y, recvCount_z, recvCount_X, recvCount_Y, recvCount_Z; + //...................................................................................... + int linkCount_x[5], linkCount_y[5], linkCount_z[5], linkCount_X[5], linkCount_Y[5], linkCount_Z[5]; + int linkCount_xy, linkCount_yz, linkCount_xz, linkCount_Xy, linkCount_Yz, linkCount_xZ; + int linkCount_xY, linkCount_yZ, linkCount_Xz, linkCount_XY, linkCount_YZ, linkCount_XZ; + //...................................................................................... + // Send buffers that reside on the compute device + int *dvcSendList_x, *dvcSendList_y, *dvcSendList_z, *dvcSendList_X, *dvcSendList_Y, *dvcSendList_Z; + //int *dvcSendList_xy, *dvcSendList_yz, *dvcSendList_xz, *dvcSendList_Xy, *dvcSendList_Yz, *dvcSendList_xZ; + //int *dvcSendList_xY, *dvcSendList_yZ, *dvcSendList_Xz, *dvcSendList_XY, *dvcSendList_YZ, *dvcSendList_XZ; + // Recieve buffers that reside on the compute device + int *dvcRecvList_x, *dvcRecvList_y, *dvcRecvList_z, *dvcRecvList_X, *dvcRecvList_Y, *dvcRecvList_Z; + //int *dvcRecvList_xy, *dvcRecvList_yz, *dvcRecvList_xz, *dvcRecvList_Xy, *dvcRecvList_Yz, *dvcRecvList_xZ; + //int *dvcRecvList_xY, *dvcRecvList_yZ, *dvcRecvList_Xz, *dvcRecvList_XY, *dvcRecvList_YZ, *dvcRecvList_XZ; + // Link lists that reside on the compute device + int *dvcRecvLinks_x, *dvcRecvLinks_y, *dvcRecvLinks_z, *dvcRecvLinks_X, *dvcRecvLinks_Y, *dvcRecvLinks_Z; + //int *dvcRecvLinks_xy, *dvcRecvLinks_yz, *dvcRecvLinks_xz, *dvcRecvLinks_Xy, *dvcRecvLinks_Yz, *dvcRecvLinks_xZ; + //int *dvcRecvLinks_xY, *dvcRecvLinks_yZ, *dvcRecvLinks_Xz, *dvcRecvLinks_XY, *dvcRecvLinks_YZ, *dvcRecvLinks_XZ; + // Recieve buffers for the distributions + int *dvcRecvDist_x, *dvcRecvDist_y, *dvcRecvDist_z, *dvcRecvDist_X, *dvcRecvDist_Y, *dvcRecvDist_Z; + //int *dvcRecvDist_xy, *dvcRecvDist_yz, *dvcRecvDist_xz, *dvcRecvDist_Xy, *dvcRecvDist_Yz, *dvcRecvDist_xZ; + //int *dvcRecvDist_xY, *dvcRecvDist_yZ, *dvcRecvDist_Xz, *dvcRecvDist_XY, *dvcRecvDist_YZ, *dvcRecvDist_XZ; + //...................................................................................... + // mass transfer coefficient arrays + double *coefficient_x, *coefficient_X, *coefficient_y, *coefficient_Y, *coefficient_z, *coefficient_Z; + //...................................................................................... + +}; +#endif \ No newline at end of file diff --git a/common/ScaLBL.cpp b/common/ScaLBL.cpp index e3c3baf7..f83bcb85 100644 --- a/common/ScaLBL.cpp +++ b/common/ScaLBL.cpp @@ -1,5 +1,5 @@ /* - Copyright 2013--2018 James E. McClure, Virginia Polytechnic & State University + Copyright 2013--2022 James E. McClure, Virginia Polytechnic & State University Copyright Equnior ASA This file is part of the Open Porous Media project (OPM). @@ -15,10 +15,8 @@ along with OPM. If not, see . */ #include "common/ScaLBL.h" - #include - ScaLBL_Communicator::ScaLBL_Communicator(std::shared_ptr Dm){ //...................................................................................... Lock=false; // unlock the communicator @@ -309,12 +307,12 @@ ScaLBL_Communicator::ScaLBL_Communicator(std::shared_ptr Dm){ MPI_COMM_SCALBL.barrier(); ScaLBL_DeviceBarrier(); //...................................................................................... - SendCount = sendCount_x+sendCount_X+sendCount_y+sendCount_Y+sendCount_z+sendCount_Z+ + SendCount = 5*(sendCount_x+sendCount_X+sendCount_y+sendCount_Y+sendCount_z+sendCount_Z)+ sendCount_xy+sendCount_Xy+sendCount_xY+sendCount_XY+ sendCount_xZ+sendCount_Xz+sendCount_xZ+sendCount_XZ+ sendCount_yz+sendCount_Yz+sendCount_yZ+sendCount_YZ; - RecvCount = recvCount_x+recvCount_X+recvCount_y+recvCount_Y+recvCount_z+recvCount_Z+ + RecvCount = 5*(recvCount_x+recvCount_X+recvCount_y+recvCount_Y+recvCount_z+recvCount_Z)+ recvCount_xy+recvCount_Xy+recvCount_xY+recvCount_XY+ recvCount_xZ+recvCount_Xz+recvCount_xZ+recvCount_XZ+ recvCount_yz+recvCount_Yz+recvCount_yZ+recvCount_YZ; @@ -322,8 +320,49 @@ ScaLBL_Communicator::ScaLBL_Communicator(std::shared_ptr Dm){ CommunicationCount = SendCount+RecvCount; //...................................................................................... -} + //................................................................................... + // Set up the persistent communication for D3Q19AA (use tags 130-145) + //................................................................................... + req_D3Q19AA.clear(); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_x, 5*sendCount_x, rank_x, 130 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_X, 5*recvCount_X, rank_X, 130 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_X, 5*sendCount_X, rank_X, 131 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_x, 5*recvCount_x, rank_x, 131 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_y, 5*sendCount_y, rank_y, 132 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_Y, 5*recvCount_Y, rank_Y, 132 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_Y, 5*sendCount_Y, rank_Y, 133 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_y, 5*recvCount_y, rank_y, 133 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_z, 5*sendCount_z, rank_z, 134 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_Z, 5*recvCount_Z, rank_Z, 134 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_Z, 5*sendCount_Z, rank_Z, 135 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_z, 5*recvCount_z, rank_z, 135 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_xy, sendCount_xy, rank_xy, 136 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_XY, recvCount_XY, rank_XY, 136 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_XY, sendCount_XY, rank_XY, 137 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_xy, recvCount_xy, rank_xy, 137 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_Xy, sendCount_Xy, rank_Xy, 138 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_xY, recvCount_xY, rank_xY, 138 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_xY, sendCount_xY, rank_xY, 139 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_Xy, recvCount_Xy, rank_Xy, 139 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_xz, sendCount_xz, rank_xz, 140 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_XZ, recvCount_XZ, rank_XZ, 140 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_xZ, sendCount_xZ, rank_xZ, 143 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_Xz, recvCount_Xz, rank_Xz, 143 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_Xz, sendCount_Xz, rank_Xz, 142 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_xZ, recvCount_xZ, rank_xZ, 142 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_XZ, sendCount_XZ, rank_XZ, 141 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_xz, recvCount_xz, rank_xz, 141 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_yz, sendCount_yz, rank_yz, 144 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_YZ, recvCount_YZ, rank_YZ, 144 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_yZ, sendCount_yZ, rank_yZ, 147 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_Yz, recvCount_Yz, rank_Yz, 147 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_Yz, sendCount_Yz, rank_Yz, 146 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_yZ, recvCount_yZ, rank_yZ, 146 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Isend_init( sendbuf_YZ, sendCount_YZ, rank_YZ, 145 ) ); + req_D3Q19AA.push_back( MPI_COMM_SCALBL.Irecv_init( recvbuf_yz, recvCount_yz, rank_yz, 145 ) ); + +} ScaLBL_Communicator::~ScaLBL_Communicator() { @@ -419,6 +458,180 @@ ScaLBL_Communicator::~ScaLBL_Communicator() ScaLBL_FreeDeviceMemory( dvcRecvDist_Yz ); ScaLBL_FreeDeviceMemory( dvcRecvDist_YZ ); } + + +void ScaLBL_Communicator::start( std::vector>& requests ) +{ + for ( auto& req : requests ) + MPI_COMM_SCALBL.Start( *req ); +} +void ScaLBL_Communicator::wait( std::vector>& requests ) +{ + std::vector request2; + for ( auto& req : requests ) + request2.push_back( *req ); + MPI_COMM_SCALBL.waitAll( request2.size(), request2.data() ); +} + + +/******************************************************** + * Get send/recv lists * + ********************************************************/ +int ScaLBL_Communicator::copyRecvList(const char *dir, int *buffer) { + if (dir[0] == 'x') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_x]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_x,recvCount_x*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_x*sizeof(int)); + delete [] TempBuffer; + return recvCount_x; + } + else if (dir[1] == 'y') + return recvCount_xy; + else if (dir[1] == 'Y') + return recvCount_xY; + else if (dir[1] == 'z') + return recvCount_xz; + else if (dir[1] == 'Z') + return recvCount_xZ; + } else if (dir[0] == 'y') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_y]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_y,recvCount_y*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_y*sizeof(int)); + delete [] TempBuffer; + return recvCount_y; + } + else if (dir[1] == 'z') + return recvCount_yz; + else if (dir[1] == 'Z') + return recvCount_yZ; + } else if (dir[0] == 'z') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_z]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_z,recvCount_z*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_z*sizeof(int)); + delete [] TempBuffer; + return recvCount_z; + } + } else if (dir[0] == 'X') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_X]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_X,recvCount_X*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_X*sizeof(int)); + delete [] TempBuffer; + return recvCount_X; + } + else if (dir[1] == 'y') + return recvCount_Xy; + else if (dir[1] == 'Y') + return recvCount_XY; + else if (dir[1] == 'z') + return recvCount_Xz; + else if (dir[1] == 'Z') + return recvCount_XZ; + } else if (dir[0] == 'Y') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_Y]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_Y,recvCount_Y*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_Y*sizeof(int)); + delete [] TempBuffer; + return recvCount_Y; + } + else if (dir[1] == 'z') + return recvCount_Yz; + else if (dir[1] == 'Z') + return recvCount_YZ; + } else if (dir[0] == 'Z') { + if (dir[1] == 0){ + int *TempBuffer = new int [recvCount_Z]; + ScaLBL_CopyToHost(TempBuffer,dvcRecvDist_Z,recvCount_Z*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,recvCount_Z*sizeof(int)); + delete [] TempBuffer; + return recvCount_Z; + } + } + throw std::logic_error("Internal error"); +} + +int ScaLBL_Communicator::copySendList(const char *dir, int *buffer) { + if (dir[0] == 'x') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_x]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_x,sendCount_x*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_x*sizeof(int)); + delete [] TempBuffer; + return sendCount_x; + } + else if (dir[1] == 'y') + return sendCount_xy; + else if (dir[1] == 'Y') + return sendCount_xY; + else if (dir[1] == 'z') + return sendCount_xz; + else if (dir[1] == 'Z') + return sendCount_xZ; + } else if (dir[0] == 'y') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_y]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_y,sendCount_y*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_y*sizeof(int)); + delete [] TempBuffer; + return sendCount_y; + } + else if (dir[1] == 'z') + return sendCount_yz; + else if (dir[1] == 'Z') + return sendCount_yZ; + } else if (dir[0] == 'z') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_z]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_z,sendCount_z*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_z*sizeof(int)); + delete [] TempBuffer; + return sendCount_z; + } + } else if (dir[0] == 'X') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_X]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_X,sendCount_X*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_X*sizeof(int)); + delete [] TempBuffer; + return sendCount_X; + } + else if (dir[1] == 'y') + return sendCount_Xy; + else if (dir[1] == 'Y') + return sendCount_XY; + else if (dir[1] == 'z') + return sendCount_Xz; + else if (dir[1] == 'Z') + return sendCount_XZ; + } else if (dir[0] == 'Y') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_Y]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_Y,sendCount_Y*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_Y*sizeof(int)); + delete [] TempBuffer; + return sendCount_Y; + } + else if (dir[1] == 'z') + return sendCount_Yz; + else if (dir[1] == 'Z') + return sendCount_YZ; + } else if (dir[0] == 'Z') { + if (dir[1] == 0){ + int *TempBuffer = new int [sendCount_Z]; + ScaLBL_CopyToHost(TempBuffer,dvcSendList_Z,sendCount_Z*sizeof(int)); + ScaLBL_CopyToZeroCopy(buffer,TempBuffer,sendCount_Z*sizeof(int)); + delete [] TempBuffer; + return sendCount_Z; + } + } + throw std::logic_error("Internal error"); +} + + double ScaLBL_Communicator::GetPerformance(int *NeighborList, double *fq, int Np){ /* EACH MPI PROCESS GETS ITS OWN MEASUREMENT*/ /* use MRT kernels to check performance without communication / synchronization */ @@ -830,8 +1043,6 @@ int ScaLBL_Communicator::MemoryOptimizedLayoutAA(IntArray &Map, int *neighborLis idx=Map(n); //if (rank == 0) printf("r: mapped n=%d\n",idx); TempBuffer[i]=idx; - - } ScaLBL_CopyToDevice(dvcRecvDist_x,TempBuffer,5*recvCount_x*sizeof(int)); @@ -988,7 +1199,6 @@ int ScaLBL_Communicator::MemoryOptimizedLayoutAA(IntArray &Map, int *neighborLis return(Np); } - void ScaLBL_Communicator::SetupBounceBackList(IntArray &Map, signed char *id, int Np, bool SlippingVelBC) { @@ -1062,302 +1272,313 @@ void ScaLBL_Communicator::SetupBounceBackList(IntArray &Map, signed char *id, in } } } + if (local_count > 0){ - int *bb_dist_tmp = new int [local_count]; - int *bb_interactions_tmp = new int [local_count]; - ScaLBL_AllocateDeviceMemory((void **) &bb_dist, sizeof(int)*local_count); - ScaLBL_AllocateDeviceMemory((void **) &bb_interactions, sizeof(int)*local_count); - int *fluid_boundary_tmp; - double *lattice_weight_tmp; - float *lattice_cx_tmp; - float *lattice_cy_tmp; - float *lattice_cz_tmp; - /* allocate memory for bounce-back sites */ - fluid_boundary_tmp = new int [local_count]; - lattice_weight_tmp = new double [local_count]; - lattice_cx_tmp = new float [local_count]; - lattice_cy_tmp = new float [local_count]; - lattice_cz_tmp = new float [local_count]; - ScaLBL_AllocateDeviceMemory((void **) &fluid_boundary, sizeof(int)*local_count); - ScaLBL_AllocateDeviceMemory((void **) &lattice_weight, sizeof(double)*local_count); - ScaLBL_AllocateDeviceMemory((void **) &lattice_cx, sizeof(float)*local_count); - ScaLBL_AllocateDeviceMemory((void **) &lattice_cy, sizeof(float)*local_count); - ScaLBL_AllocateDeviceMemory((void **) &lattice_cz, sizeof(float)*local_count); + int *bb_dist_tmp = new int [local_count]; + int *bb_interactions_tmp = new int [local_count]; + ScaLBL_AllocateDeviceMemory((void **) &bb_dist, sizeof(int)*local_count); + ScaLBL_AllocateDeviceMemory((void **) &bb_interactions, sizeof(int)*local_count); + int *fluid_boundary_tmp; + double *lattice_weight_tmp; + float *lattice_cx_tmp; + float *lattice_cy_tmp; + float *lattice_cz_tmp; + /* allocate memory for bounce-back sites */ + fluid_boundary_tmp = new int [local_count]; + lattice_weight_tmp = new double [local_count]; + lattice_cx_tmp = new float [local_count]; + lattice_cy_tmp = new float [local_count]; + lattice_cz_tmp = new float [local_count]; + ScaLBL_AllocateDeviceMemory((void **) &fluid_boundary, sizeof(int)*local_count); + ScaLBL_AllocateDeviceMemory((void **) &lattice_weight, sizeof(double)*local_count); + ScaLBL_AllocateDeviceMemory((void **) &lattice_cx, sizeof(float)*local_count); + ScaLBL_AllocateDeviceMemory((void **) &lattice_cy, sizeof(float)*local_count); + ScaLBL_AllocateDeviceMemory((void **) &lattice_cz, sizeof(float)*local_count); - local_count=0; - for (k=1;k> req_D3Q19AA; + std::vector> req_BiD3Q19AA; + std::vector> req_TriD3Q19AA; + void start( std::vector>& requests ); + void wait( std::vector>& requests ); + //...................................................................................... int *bb_dist; int *bb_interactions; int *fluid_boundary; diff --git a/common/Utilities.cpp b/common/Utilities.cpp index ddcc9e8d..70ca19a4 100644 --- a/common/Utilities.cpp +++ b/common/Utilities.cpp @@ -69,7 +69,7 @@ void Utilities::startup(int argc, char **argv, bool multiple) { "thread support, thread support will be disabled" << std::endl; } - StackTrace::globalCallStackInitialize(MPI_COMM_WORLD); + //StackTrace::globalCallStackInitialize(MPI_COMM_WORLD); } else { MPI_Init(&argc, &argv); } @@ -86,7 +86,7 @@ void Utilities::shutdown() { int rank = 0; #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &rank); - StackTrace::globalCallStackFinalize(); + //StackTrace::globalCallStackFinalize(); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); #endif diff --git a/common/UtilityMacros.h b/common/UtilityMacros.h index 1e1d6fed..631b0836 100644 --- a/common/UtilityMacros.h +++ b/common/UtilityMacros.h @@ -173,8 +173,7 @@ _Pragma( "GCC diagnostic ignored \"-Wunused-local-typedefs\"" ) \ _Pragma( "GCC diagnostic ignored \"-Woverloaded-virtual\"" ) \ _Pragma( "GCC diagnostic ignored \"-Wunused-parameter\"" ) \ - _Pragma( "GCC diagnostic ignored \"-Warray-bounds\"" ) \ - _Pragma( "GCC diagnostic ignored \"-Wterminate\"" ) + _Pragma( "GCC diagnostic ignored \"-Warray-bounds\"" ) #define ENABLE_WARNINGS _Pragma( "GCC diagnostic pop" ) #else #define DISABLE_WARNINGS diff --git a/cpu/D3Q19.cpp b/cpu/D3Q19.cpp index bc2dd70f..153f497b 100644 --- a/cpu/D3Q19.cpp +++ b/cpu/D3Q19.cpp @@ -48,6 +48,7 @@ extern "C" void ScaLBL_D3Q19_Unpack(int q, int *list, int start, int count, } } + extern "C" void ScaLBL_D3Q19_AA_Init(double *f_even, double *f_odd, int Np) { int n; for (n = 0; n < Np; n++) { @@ -1883,7 +1884,7 @@ extern "C" void ScaLBL_D3Q19_AAodd_MRT(int *neighborList, double *dist, } } -extern "C" void ScaLBL_D3Q19_AAeven_Compact(char *ID, double *dist, int Np) { +extern "C" void ScaLBL_D3Q19_AAeven_Compact(double *dist, int Np) { for (int n = 0; n < Np; n++) { @@ -1941,7 +1942,7 @@ extern "C" void ScaLBL_D3Q19_AAeven_Compact(char *ID, double *dist, int Np) { } } -extern "C" void ScaLBL_D3Q19_AAodd_Compact(char *ID, int *neighborList, +extern "C" void ScaLBL_D3Q19_AAodd_Compact(int *neighborList, double *dist, int Np) { int nread; diff --git a/cpu/Ion.cpp b/cpu/Ion.cpp index eb5e3ef5..d77af87b 100644 --- a/cpu/Ion.cpp +++ b/cpu/Ion.cpp @@ -1,4 +1,144 @@ #include +#include + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef(int *membrane, int *Map, double *Distance, double *Psi, double *coef, + double Threshold, double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int memLinks, int Nx, int Ny, int Nz, int Np){ + + int link,iq,ip,nq,np,nqm,npm; + double aq, ap, membranePotential; + //double dq, dp, dist, orientation; + /* Interior Links */ + for (link=0; link Threshold){ + aq = ThresholdMassFractionIn; ap = ThresholdMassFractionOut; + } + + /* Save the mass transfer coefficients */ + //coef[2*link] = aq*orientation; coef[2*link+1] = ap*orientation; + coef[2*link] = aq; coef[2*link+1] = ap; + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo( + const int Cqx, const int Cqy, int const Cqz, + int *Map, double *Distance, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int *d3q7_recvlist, int *d3q7_linkList, double *coef, int start, int nlinks, int count, + const int N, const int Nx, const int Ny, const int Nz) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, label, nqm, npm, i, j, k; + double distanceLocal;//, distanceNonlocal; + double psiLocal, psiNonlocal, membranePotential; + double ap,aq; // coefficient + + + for (idx = 0; idx < count; idx++) { + n = d3q7_recvlist[idx]; + label = d3q7_linkList[idx]; + ap = 1.0; // regular streaming rule + aq = 1.0; + if (label > 0 && !(n < 0)){ + nqm = Map[n]; + distanceLocal = Distance[nqm]; + psiLocal = Psi[nqm]; + + // Get the 3-D indices from the send process + k = nqm/(Nx*Ny); j = (nqm-Nx*Ny*k)/Nx; i = nqm-Nx*Ny*k-Nx*j; + // Streaming link the non-local distribution + i -= Cqx; j -= Cqy; k -= Cqz; + npm = k*Nx*Ny + j*Nx + i; + //distanceNonlocal = Distance[npm]; + psiNonlocal = Psi[npm]; + + membranePotential = psiLocal - psiNonlocal; + aq = MassFractionIn; + ap = MassFractionOut; + + /* link is inside membrane */ + if (distanceLocal > 0.0){ + if (membranePotential < Threshold*(-1.0)){ + ap = MassFractionIn; + aq = MassFractionOut; + } + else { + ap = ThresholdMassFractionIn; + aq = ThresholdMassFractionOut; + } + } + else if (membranePotential > Threshold){ + aq = ThresholdMassFractionIn; + ap = ThresholdMassFractionOut; + } + } + coef[2*idx]=aq; + coef[2*idx+1]=ap; + } +} + + +extern "C" void ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx; + double fq,fp,fqq,ap,aq; // coefficient + /* First unpack the regular links */ + for (idx = 0; idx < count; idx++) { + n = d3q7_recvlist[idx]; + // update link based on mass transfer coefficients + if (!(n < 0)){ + aq = coef[2*idx]; + ap = coef[2*idx+1]; + fq = dist[q * N + n]; + fp = recvbuf[idx]; + fqq = (1-aq)*fq+ap*fp; + dist[q * N + n] = fqq; + } + //printf(" LINK: site=%i, index=%i \n", n, idx); + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_IonTransport(int *membrane, double *coef, + double *dist, double *Den, int memLinks, int Np){ + + int link,iq,ip,nq,np; + double aq, ap, fq, fp, fqq, fpp, Cq, Cp; + for (link=0; link 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + // q=2 + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + // q=4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + // q=6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // compute diffusive flux + Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + Den[n] = Ci; + + /* use logistic function to prevent negative distributions*/ + //X = 4.0 * (ux + uEPx); + //Y = 4.0 * (uy + uEPy); + //Z = 4.0 * (uz + uEPz); + //factor_x = X / sqrt(1 + X*X); + //factor_y = Y / sqrt(1 + Y*Y); + //factor_z = Z / sqrt(1 + Z*Z); + + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; + + // q = 1 + dist[nr2] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); + // f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + + + // q=2 + dist[nr1] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + // f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + + // q = 3 + dist[nr4] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + // f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y ); + + // q = 4 + dist[nr3] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + // f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + + // q = 5 + dist[nr6] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + // f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + + // q = 6 + dist[nr5] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uz + uEPz)); + // f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); + + } +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion( + double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, + double *FluxElectrical, double *Velocity, double *ElectricField, double Di, + int zi, double rlx, double Vt, int start, int finish, int Np) { + int n; + double Ci; + double ux, uy, uz; + double uEPx, uEPy, uEPz; //electrochemical induced velocity + double Ex, Ey, Ez; //electrical field + double flux_diffusive_x, flux_diffusive_y, flux_diffusive_z; + double f0, f1, f2, f3, f4, f5, f6; + //double X,Y,Z, factor_x, factor_y, factor_z; + + for (n = start; n < finish; n++) { + + //Load data + //Ci = Den[n]; + Ex = ElectricField[n + 0 * Np]; + Ey = ElectricField[n + 1 * Np]; + Ez = ElectricField[n + 2 * Np]; + ux = Velocity[n + 0 * Np]; + uy = Velocity[n + 1 * Np]; + uz = Velocity[n + 2 * Np]; + uEPx = zi * Di / Vt * Ex; + uEPy = zi * Di / Vt * Ey; + uEPz = zi * Di / Vt * Ez; + + f0 = dist[n]; + f1 = dist[2 * Np + n]; + f2 = dist[1 * Np + n]; + f3 = dist[4 * Np + n]; + f4 = dist[3 * Np + n]; + f5 = dist[6 * Np + n]; + f6 = dist[5 * Np + n]; + + // compute diffusive flux + Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + Den[n] = Ci; + + /* use logistic function to prevent negative distributions*/ + //X = 4.0 * (ux + uEPx); + //Y = 4.0 * (uy + uEPy); + //Z = 4.0 * (uz + uEPz); + //factor_x = X / sqrt(1 + X*X); + //factor_y = Y / sqrt(1 + Y*Y); + //factor_z = Z / sqrt(1 + Z*Z); + + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; + + // q = 1 + dist[1 * Np + n] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); + // f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + + // q=2 + dist[2 * Np + n] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + // f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + + // q = 3 + dist[3 * Np + n] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + // f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y); + + // q = 4 + dist[4 * Np + n] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + // f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + + // q = 5 + dist[5 * Np + n] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + // f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + + // q = 6 + dist[6 * Np + n] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uz + uEPz)); + // f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); } } @@ -289,7 +676,7 @@ extern "C" void ScaLBL_D3Q7_Ion_Init_FromFile(double *dist, double *Den, extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity, - int IonValence, int ion_component, + double IonValence, int ion_component, int start, int finish, int Np) { int n; diff --git a/cpu/MembraneHelper.cpp b/cpu/MembraneHelper.cpp new file mode 100644 index 00000000..960e2f20 --- /dev/null +++ b/cpu/MembraneHelper.cpp @@ -0,0 +1,42 @@ + +extern "C" void Membrane_D3Q19_Unpack(int q, int *list, int *links, int start, int linkCount, + double *recvbuf, double *dist, int N) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, link; + for (link=0; link extern "C" void ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(int *neighborList, int *Map, @@ -444,34 +445,503 @@ extern "C" void ScaLBL_D3Q7_PoissonResidualError( // } //} -//extern "C" void ScaLBL_D3Q7_Poisson_getElectricField(double *dist, double *ElectricField, double tau, int Np){ -// int n; -// // distributions -// double f1,f2,f3,f4,f5,f6; -// double Ex,Ey,Ez; -// double rlx=1.0/tau; -// -// for (n=0; n 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + + // q = 4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + + // q = 6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // q=7 + nr7 = neighborList[n + 6 * Np]; + f7 = dist[nr7]; + + // q = 8 + nr8 = neighborList[n + 7 * Np]; + f8 = dist[nr8]; + + // q=9 + nr9 = neighborList[n + 8 * Np]; + f9 = dist[nr9]; + + // q = 10 + nr10 = neighborList[n + 9 * Np]; + f10 = dist[nr10]; + + // q=11 + nr11 = neighborList[n + 10 * Np]; + f11 = dist[nr11]; + + // q=12 + nr12 = neighborList[n + 11 * Np]; + f12 = dist[nr12]; + + // q=13 + nr13 = neighborList[n + 12 * Np]; + f13 = dist[nr13]; + + // q=14 + nr14 = neighborList[n + 13 * Np]; + f14 = dist[nr14]; + + // q=15 + nr15 = neighborList[n + 14 * Np]; + f15 = dist[nr15]; + + // q=16 + nr16 = neighborList[n + 15 * Np]; + f16 = dist[nr16]; + + // q=17 + //fq = dist[18*Np+n]; + nr17 = neighborList[n + 16 * Np]; + f17 = dist[nr17]; + + // q=18 + nr18 = neighborList[n + 17 * Np]; + f18 = dist[nr18]; + + psi = f0 + f2 + f1 + f4 + f3 + f6 + f5 + f8 + f7 + f10 + f9 + f12 + + f11 + f14 + f13 + f16 + f15 + f18 + f17; + + idx = Map[n]; + + Psi[idx] = psi - 0.5*rho_e; + } +} + +extern "C" void ScaLBL_D3Q19_AAeven_Poisson_ElectricPotential( + int *Map, double *dist, double *Den_charge, double *Psi, double epsilon_LB, bool UseSlippingVelBC, int start, int finish, int Np) { + int n; + double psi; //electric potential + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + //double Gs; + int idx; + + for (n = start; n < finish; n++) { + rho_e = (UseSlippingVelBC==1) ? 0.0 : Den_charge[n] / epsilon_LB; + + //........................................................................ + // q=0 + f0 = dist[n]; + f1 = dist[2 * Np + n]; + f2 = dist[1 * Np + n]; + f3 = dist[4 * Np + n]; + f4 = dist[3 * Np + n]; + f5 = dist[6 * Np + n]; + f6 = dist[5 * Np + n]; + f7 = dist[8 * Np + n]; + f8 = dist[7 * Np + n]; + f9 = dist[10 * Np + n]; + f10 = dist[9 * Np + n]; + f11 = dist[12 * Np + n]; + f12 = dist[11 * Np + n]; + f13 = dist[14 * Np + n]; + f14 = dist[13 * Np + n]; + f15 = dist[16 * Np + n]; + f16 = dist[15 * Np + n]; + f17 = dist[18 * Np + n]; + f18 = dist[17 * Np + n]; + + psi = f0 + f2 + f1 + f4 + f3 + f6 + f5 + f8 + f7 + f10 + f9 + f12 + + f11 + f14 + f13 + f16 + f15 + f18 + f17; + + idx = Map[n]; + + Psi[idx] = psi - 0.5*rho_e; + } +} + +extern "C" void ScaLBL_D3Q19_AAodd_Poisson(int *neighborList, int *Map, + double *dist, double *Den_charge, + double *Psi, double *ElectricField, + double tau, double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + int n; + double psi; //electric potential + double Ex, Ey, Ez; //electric field + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + int nr1, nr2, nr3, nr4, nr5, nr6, nr7, nr8, nr9, nr10, nr11, nr12, nr13, + nr14, nr15, nr16, nr17, nr18; + double sum_q; + double rlx = 1.0 / tau; + int idx; + + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + + for (n = start; n < finish; n++) { + + //Load data + //When Helmholtz-Smoluchowski slipping velocity BC is used, the bulk fluid is considered as electroneutral + //and thus the net space charge density is zero. + rho_e = (UseSlippingVelBC==1) ? 0.0 : Den_charge[n] / epsilon_LB; + + // q=0 + f0 = dist[n]; + // q=1 + nr1 = neighborList[n]; // neighbor 2 ( > 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + + // q = 4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + + // q = 6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // q=7 + nr7 = neighborList[n + 6 * Np]; + f7 = dist[nr7]; + + // q = 8 + nr8 = neighborList[n + 7 * Np]; + f8 = dist[nr8]; + + // q=9 + nr9 = neighborList[n + 8 * Np]; + f9 = dist[nr9]; + + // q = 10 + nr10 = neighborList[n + 9 * Np]; + f10 = dist[nr10]; + + // q=11 + nr11 = neighborList[n + 10 * Np]; + f11 = dist[nr11]; + + // q=12 + nr12 = neighborList[n + 11 * Np]; + f12 = dist[nr12]; + + // q=13 + nr13 = neighborList[n + 12 * Np]; + f13 = dist[nr13]; + + // q=14 + nr14 = neighborList[n + 13 * Np]; + f14 = dist[nr14]; + + // q=15 + nr15 = neighborList[n + 14 * Np]; + f15 = dist[nr15]; + + // q=16 + nr16 = neighborList[n + 15 * Np]; + f16 = dist[nr16]; + + // q=17 + //fq = dist[18*Np+n]; + nr17 = neighborList[n + 16 * Np]; + f17 = dist[nr17]; + + // q=18 + nr18 = neighborList[n + 17 * Np]; + f18 = dist[nr18]; + + sum_q = f1+f2+f3+f4+f5+f6+f7+f8+f9+f10+f11+f12+f13+f14+f15+f16+f17+f18; + //error = 8.0*(sum_q - f0) + rho_e; + + psi = 2.0*(f0*(1.0 - rlx) + rlx*(sum_q + 0.125*rho_e)); + + idx = Map[n]; + Psi[idx] = psi; + + Ex = (f1 - f2 + 0.5*(f7 - f8 + f9 - f10 + f11 - f12 + f13 - f14))*4.0; //NOTE the unit of electric field here is V/lu + Ey = (f3 - f4 + 0.5*(f7 - f8 - f9 + f10 + f15 - f16 + f17 - f18))*4.0; + Ez = (f5 - f6 + 0.5*(f11 - f12 - f13 + f14 + f15 - f16 - f17 + f18))*4.0; + ElectricField[n + 0 * Np] = Ex; + ElectricField[n + 1 * Np] = Ey; + ElectricField[n + 2 * Np] = Ez; + + // q = 0 + dist[n] = W0*psi; //f0 * (1.0 - rlx) - (1.0-0.5*rlx)*W0*rho_e; + + // q = 1 + dist[nr2] = W1*psi; //f1 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 2 + dist[nr1] = W1*psi; //f2 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 3 + dist[nr4] = W1*psi; //f3 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 4 + dist[nr3] = W1*psi; //f4 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 5 + dist[nr6] = W1*psi; //f5 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 6 + dist[nr5] = W1*psi; //f6 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + //........................................................................ + + // q = 7 + dist[nr8] = W2*psi; //f7 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 8 + dist[nr7] = W2*psi; //f8 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 9 + dist[nr10] = W2*psi; //f9 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 10 + dist[nr9] = W2*psi; //f10 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 11 + dist[nr12] = W2*psi; //f11 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 12 + dist[nr11] = W2*psi; //f12 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 13 + dist[nr14] = W2*psi; //f13 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q= 14 + dist[nr13] = W2*psi; //f14 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 15 + dist[nr16] = W2*psi; //f15 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 16 + dist[nr15] = W2*psi; //f16 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 17 + dist[nr18] = W2*psi; //f17 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 18 + dist[nr17] = W2*psi; //f18 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + } +} + +extern "C" void ScaLBL_D3Q19_AAeven_Poisson(int *Map, double *dist, + double *Den_charge, double *Psi, + double *ElectricField, double *Error, double tau, + double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + int n; + double psi; //electric potential + double Ex, Ey, Ez; //electric field + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + double error,sum_q; + double rlx = 1.0 / tau; + int idx; + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + + for (n = start; n < finish; n++) { + + //Load data + //When Helmholtz-Smoluchowski slipping velocity BC is used, the bulk fluid is considered as electroneutral + //and thus the net space charge density is zero. + rho_e = (UseSlippingVelBC==1) ? 0.0 : Den_charge[n] / epsilon_LB; + + f0 = dist[n]; + f1 = dist[2 * Np + n]; + f2 = dist[1 * Np + n]; + f3 = dist[4 * Np + n]; + f4 = dist[3 * Np + n]; + f5 = dist[6 * Np + n]; + f6 = dist[5 * Np + n]; + + f7 = dist[8 * Np + n]; + f8 = dist[7 * Np + n]; + f9 = dist[10 * Np + n]; + f10 = dist[9 * Np + n]; + f11 = dist[12 * Np + n]; + f12 = dist[11 * Np + n]; + f13 = dist[14 * Np + n]; + f14 = dist[13 * Np + n]; + f15 = dist[16 * Np + n]; + f16 = dist[15 * Np + n]; + f17 = dist[18 * Np + n]; + f18 = dist[17 * Np + n]; + + /* Ex = (f1 - f2) * rlx * + 4.0; //NOTE the unit of electric field here is V/lu + Ey = (f3 - f4) * rlx * + 4.0; //factor 4.0 is D3Q7 lattice squared speed of sound + Ez = (f5 - f6) * rlx * 4.0; + */ + Ex = (f1 - f2 + 0.5*(f7 - f8 + f9 - f10 + f11 - f12 + f13 - f14))*4.0; //NOTE the unit of electric field here is V/lu + Ey = (f3 - f4 + 0.5*(f7 - f8 - f9 + f10 + f15 - f16 + f17 - f18))*4.0; + Ez = (f5 - f6 + 0.5*(f11 - f12 - f13 + f14 + f15 - f16 - f17 + f18))*4.0; + ElectricField[n + 0 * Np] = Ex; + ElectricField[n + 1 * Np] = Ey; + ElectricField[n + 2 * Np] = Ez; + + sum_q = f1+f2+f3+f4+f5+f6+f7+f8+f9+f10+f11+f12+f13+f14+f15+f16+f17+f18; + error = 8.0*(sum_q - f0) + rho_e; + + Error[n] = error; + + psi = 2.0*(f0*(1.0 - rlx) + rlx*(sum_q + 0.125*rho_e)); + + idx = Map[n]; + Psi[idx] = psi; + + // q = 0 + dist[n] = W0*psi;// + + // q = 1 + dist[1 * Np + n] = W1*psi;//f1 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 2 + dist[2 * Np + n] = W1*psi;//f2 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 3 + dist[3 * Np + n] = W1*psi;//f3 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 4 + dist[4 * Np + n] = W1*psi;//f4 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 5 + dist[5 * Np + n] = W1*psi;//f5 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 6 + dist[6 * Np + n] = W1*psi;//f6 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + dist[7 * Np + n] = W2*psi;//f7 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[8 * Np + n] = W2*psi;//f8* (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[9 * Np + n] = W2*psi;//f9 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[10 * Np + n] = W2*psi;//f10 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[11 * Np + n] = W2*psi;//f11 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[12 * Np + n] = W2*psi;//f12 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[13 * Np + n] = W2*psi;//f13 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[14 * Np + n] = W2*psi;//f14 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[15 * Np + n] = W2*psi;//f15 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[16 * Np + n] = W2*psi;//f16 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[17 * Np + n] = W2*psi;//f17 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + dist[18 * Np + n] = W2*psi;//f18 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + //........................................................................ + } +} + +extern "C" void ScaLBL_D3Q19_Poisson_Init(int *Map, double *dist, double *Psi, + int start, int finish, int Np) { + int n; + int ijk; + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + for (n = start; n < finish; n++) { + ijk = Map[n]; + dist[0 * Np + n] = W0 * Psi[ijk];//3333333333333333* Psi[ijk]; + dist[1 * Np + n] = W1 * Psi[ijk]; + dist[2 * Np + n] = W1 * Psi[ijk]; + dist[3 * Np + n] = W1 * Psi[ijk]; + dist[4 * Np + n] = W1 * Psi[ijk]; + dist[5 * Np + n] = W1 * Psi[ijk]; + dist[6 * Np + n] = W1 * Psi[ijk]; + dist[7 * Np + n] = W2* Psi[ijk]; + dist[8 * Np + n] = W2* Psi[ijk]; + dist[9 * Np + n] = W2* Psi[ijk]; + dist[10 * Np + n] = W2* Psi[ijk]; + dist[11 * Np + n] = W2* Psi[ijk]; + dist[12 * Np + n] = W2* Psi[ijk]; + dist[13 * Np + n] = W2* Psi[ijk]; + dist[14 * Np + n] = W2* Psi[ijk]; + dist[15 * Np + n] = W2* Psi[ijk]; + dist[16 * Np + n] = W2* Psi[ijk]; + dist[17 * Np + n] = W2* Psi[ijk]; + dist[18 * Np + n] = W2* Psi[ijk]; + } +} diff --git a/cuda/BGK.cu b/cuda/BGK.cu index d9206a4f..07022ecb 100644 --- a/cuda/BGK.cu +++ b/cuda/BGK.cu @@ -1,7 +1,7 @@ #include #define NBLOCKS 1024 -#define NTHREADS 256 +#define NTHREADS 512 __global__ void dvc_ScaLBL_D3Q19_AAeven_BGK(double *dist, int start, int finish, int Np, double rlx, double Fx, double Fy, double Fz){ int n; diff --git a/cuda/D3Q19.cu b/cuda/D3Q19.cu index 5e56922a..afd870bf 100644 --- a/cuda/D3Q19.cu +++ b/cuda/D3Q19.cu @@ -290,7 +290,7 @@ __global__ void dvc_ScaLBL_D3Q19_Swap_Compact(int *neighborList, double *distev //__launch_bounds__(512,4) __global__ void -dvc_ScaLBL_AAodd_Compact(char * ID, int *d_neighborList, double *dist, int Np) { +dvc_ScaLBL_AAodd_Compact(int *d_neighborList, double *dist, int Np) { int n; double f0,f1,f2,f3,f4,f5,f6,f7,f8,f9; @@ -1321,7 +1321,7 @@ dvc_ScaLBL_AAeven_MRT(double *dist, int start, int finish, int Np, double rlx_se //__launch_bounds__(512,4) -__global__ void dvc_ScaLBL_AAeven_Compact(char * ID, double *dist, int Np) { +__global__ void dvc_ScaLBL_AAeven_Compact( double *dist, int Np) { int n; double f0,f1,f2,f3,f4,f5,f6,f7,f8,f9; @@ -2390,18 +2390,18 @@ extern "C" void ScaLBL_D3Q19_Swap_Compact(int *neighborList, double *disteven, d } } -extern "C" void ScaLBL_D3Q19_AAeven_Compact(char * ID, double *d_dist, int Np) { +extern "C" void ScaLBL_D3Q19_AAeven_Compact( double *d_dist, int Np) { cudaFuncSetCacheConfig(dvc_ScaLBL_AAeven_Compact, cudaFuncCachePreferL1); - dvc_ScaLBL_AAeven_Compact<<>>(ID, d_dist, Np); + dvc_ScaLBL_AAeven_Compact<<>>(d_dist, Np); cudaError_t err = cudaGetLastError(); if (cudaSuccess != err){ printf("CUDA error in ScaLBL_D3Q19_Init: %s \n",cudaGetErrorString(err)); } } -extern "C" void ScaLBL_D3Q19_AAodd_Compact(char * ID, int *d_neighborList, double *d_dist, int Np) { +extern "C" void ScaLBL_D3Q19_AAodd_Compact( int *d_neighborList, double *d_dist, int Np) { cudaFuncSetCacheConfig(dvc_ScaLBL_AAodd_Compact, cudaFuncCachePreferL1); - dvc_ScaLBL_AAodd_Compact<<>>(ID,d_neighborList, d_dist,Np); + dvc_ScaLBL_AAodd_Compact<<>>(d_neighborList, d_dist,Np); cudaError_t err = cudaGetLastError(); if (cudaSuccess != err){ printf("CUDA error in ScaLBL_D3Q19_Init: %s \n",cudaGetErrorString(err)); diff --git a/cuda/D3Q7BC.cu b/cuda/D3Q7BC.cu index 0651694d..d60b4bfb 100644 --- a/cuda/D3Q7BC.cu +++ b/cuda/D3Q7BC.cu @@ -6,6 +6,16 @@ #define NTHREADS 256 +#define CHECK_ERROR(KERNEL) \ + do { \ + auto err = cudaGetLastError(); \ + if ( cudaSuccess != err ){ \ + auto errString = cudaGetErrorString(err); \ + printf("error in %s (kernel): %s \n",KERNEL,errString); \ + } \ + } while(0) + + __global__ void dvc_ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count) { @@ -740,28 +750,19 @@ __global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z(int *d_neighbor extern "C" void ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ int GRID = count / 512 + 1; dvc_ScaLBL_Solid_Dirichlet_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Solid_Dirichlet_D3Q7 (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Solid_Dirichlet_D3Q7"); } extern "C" void ScaLBL_Solid_Neumann_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ int GRID = count / 512 + 1; dvc_ScaLBL_Solid_Neumann_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Solid_Neumann_D3Q7 (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Solid_Neumann_D3Q7"); } extern "C" void ScaLBL_Solid_DirichletAndNeumann_D3Q7(double *dist, double *BoundaryValue,int *BoundaryLabel, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ int GRID = count / 512 + 1; dvc_ScaLBL_Solid_DirichletAndNeumann_D3Q7<<>>(dist, BoundaryValue, BoundaryLabel, BounceBackDist_list, BounceBackSolid_list, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Solid_DirichletAndNeumann_D3Q7 (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Solid_DirichletAndNeumann_D3Q7"); } extern "C" void ScaLBL_Solid_SlippingVelocityBC_D3Q19(double *dist, double *zeta_potential, double *ElectricField, double *SolidGrad, @@ -775,211 +776,142 @@ extern "C" void ScaLBL_Solid_SlippingVelocityBC_D3Q19(double *dist, double *zeta BounceBackDist_list, BounceBackSolid_list, FluidBoundary_list, lattice_weight, lattice_cx, lattice_cy, lattice_cz, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Solid_SlippingVelocityBC_D3Q19 (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Solid_SlippingVelocityBC_D3Q19"); } extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z(int *list, double *dist, double Vin, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z<<>>(list, dist, Vin, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z"); } extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z(int *list, double *dist, double Vout, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z<<>>(list, dist, Vout, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z(int *d_neighborList, int *list, double *dist, double Vin, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z<<>>(d_neighborList, list, dist, Vin, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z"); } extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z(int *d_neighborList, int *list, double *dist, double Vout, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z<<>>(d_neighborList, list, dist, Vout, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z"); } extern "C" void ScaLBL_Poisson_D3Q7_BC_z(int *list, int *Map, double *Psi, double Vin, int count){ int GRID = count / 512 + 1; dvc_ScaLBL_Poisson_D3Q7_BC_z<<>>(list, Map, Psi, Vin, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Poisson_D3Q7_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Poisson_D3Q7_BC_z"); } extern "C" void ScaLBL_Poisson_D3Q7_BC_Z(int *list, int *Map, double *Psi, double Vout, int count){ int GRID = count / 512 + 1; dvc_ScaLBL_Poisson_D3Q7_BC_Z<<>>(list, Map, Psi, Vout, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_Poisson_D3Q7_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_Poisson_D3Q7_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z(int *list, double *dist, double Cin, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z<<>>(list, dist, Cin, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z"); } extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z(int *list, double *dist, double Cout, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z<<>>(list, dist, Cout, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z(int *d_neighborList, int *list, double *dist, double Cin, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z<<>>(d_neighborList, list, dist, Cin, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z(int *d_neighborList, int *list, double *dist, double Cout, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z<<>>(d_neighborList, list, dist, Cout, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z"); } //------------Diff----------------- extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z"); } extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z"); } //----------DiffAdvc------------- extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z"); } extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z"); } //----------DiffAdvcElec------------- extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, double Di, double zi, double Vt, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z"); } extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, double Di, double zi, double Vt, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, double Di, double zi, double Vt, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z"); } extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, double Di, double zi, double Vt, int count, int Np){ int GRID = count / 512 + 1; dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("CUDA error in ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z (kernel): %s \n",cudaGetErrorString(err)); - } + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z"); } //------------------------------- diff --git a/cuda/Ion.cu b/cuda/Ion.cu index 49d0b80a..559ef7ec 100644 --- a/cuda/Ion.cu +++ b/cuda/Ion.cu @@ -3,7 +3,207 @@ //#include #define NBLOCKS 1024 -#define NTHREADS 256 +#define NTHREADS 512 + +extern "C" void Membrane_D3Q19_Unpack(int q, int *list, int *links, int start, int linkCount, + double *recvbuf, double *dist, int N) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, link; + for (link=0; link Threshold){ + aq = ThresholdMassFractionIn; ap = ThresholdMassFractionOut; + } + + /* Save the mass transfer coefficients */ + coef[2*link] = aq; coef[2*link+1] = ap; + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo( + const int Cqx, const int Cqy, int const Cqz, + int *Map, double *Distance, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int *d3q7_recvlist, int *d3q7_linkList, double *coef, int start, int nlinks, int count, + const int N, const int Nx, const int Ny, const int Nz) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, nqm, npm, label, i, j, k; + double distanceLocal, distanceNonlocal; + double psiLocal, psiNonlocal, membranePotential; + double ap,aq; // coefficient + + /* second enforce custom rule for membrane links */ + int S = (count-nlinks)/NBLOCKS/NTHREADS + 1; + for (int s=0; s 0 && !(n < 0)){ + nqm = Map[n]; + distanceLocal = Distance[nqm]; + psiLocal = Psi[nqm]; + + // Get the 3-D indices from the send process + k = nqm/(Nx*Ny); j = (nqm-Nx*Ny*k)/Nx; i = nqm-Nx*Ny*k-Nx*j; + // Streaming link the non-local distribution + i -= Cqx; j -= Cqy; k -= Cqz; + npm = k*Nx*Ny + j*Nx + i; + distanceNonlocal = Distance[npm]; + psiNonlocal = Psi[npm]; + + membranePotential = psiLocal - psiNonlocal; + aq = MassFractionIn; + ap = MassFractionOut; + + /* link is inside membrane */ + if (distanceLocal > 0.0){ + if (membranePotential < Threshold*(-1.0)){ + ap = MassFractionIn; + aq = MassFractionOut; + } + else { + ap = ThresholdMassFractionIn; + aq = ThresholdMassFractionOut; + } + } + else if (membranePotential > Threshold){ + aq = ThresholdMassFractionIn; + ap = ThresholdMassFractionOut; + } + } + coef[2*idx]=aq; + coef[2*idx+1]=ap; + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, link; + double fq,fp,fqq,ap,aq; // coefficient + + /* second enforce custom rule for membrane links */ + int S = count/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) - f1 = dist[nr1]; // reading the f1 data into register fq - // q=2 - nr2 = neighborList[n+Np]; // neighbor 1 ( < 10Np => even part of dist) - f2 = dist[nr2]; // reading the f2 data into register fq - // q=3 - nr3 = neighborList[n+2*Np]; // neighbor 4 - f3 = dist[nr3]; - // q=4 - nr4 = neighborList[n+3*Np]; // neighbor 3 - f4 = dist[nr4]; - // q=5 - nr5 = neighborList[n+4*Np]; - f5 = dist[nr5]; - // q=6 - nr6 = neighborList[n+5*Np]; - f6 = dist[nr6]; - - // compute diffusive flux - flux_diffusive_x = (1.0-0.5*rlx)*((f1-f2)-ux*Ci); - flux_diffusive_y = (1.0-0.5*rlx)*((f3-f4)-uy*Ci); - flux_diffusive_z = (1.0-0.5*rlx)*((f5-f6)-uz*Ci); - FluxDiffusive[n+0*Np] = flux_diffusive_x; - FluxDiffusive[n+1*Np] = flux_diffusive_y; - FluxDiffusive[n+2*Np] = flux_diffusive_z; - FluxAdvective[n+0*Np] = ux*Ci; - FluxAdvective[n+1*Np] = uy*Ci; - FluxAdvective[n+2*Np] = uz*Ci; - FluxElectrical[n+0*Np] = uEPx*Ci; - FluxElectrical[n+1*Np] = uEPy*Ci; - FluxElectrical[n+2*Np] = uEPz*Ci; + // q=0 + f0 = dist[n]; + // q=1 + nr1 = neighborList[n]; // neighbor 2 ( > 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + // q=2 + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + // q=4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + // q=6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; - // q=0 - dist[n] = f0*(1.0-rlx)+rlx*0.25*Ci; - //dist[n] = f0*(1.0-rlx)+rlx*0.25*Ci*(1.0 - 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); + // compute diffusive flux + Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + Den[n] = Ci; - // q = 1 - dist[nr2] = f1*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(ux+uEPx)); - //dist[nr2] = f1*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(ux+uEPx)+8.0*(ux+uEPx)*(ux+uEPx)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); + /* use logistic function to prevent negative distributions*/ + X = 4.0 * (ux + uEPx); + Y = 4.0 * (uy + uEPy); + Z = 4.0 * (uz + uEPz); + factor_x = X / sqrt(1 + X*X); + factor_y = Y / sqrt(1 + Y*Y); + factor_z = Z / sqrt(1 + Z*Z); - // q=2 - dist[nr1] = f2*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(ux+uEPx)); - //dist[nr1] = f2*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(ux+uEPx)+8.0*(ux+uEPx)*(ux+uEPx)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; - // q = 3 - dist[nr4] = f3*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uy+uEPy)); - //dist[nr4] = f3*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uy+uEPy)+8.0*(uy+uEPy)*(uy+uEPy)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); + // q = 1 + dist[nr2] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + //f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); - // q = 4 - dist[nr3] = f4*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uy+uEPy)); - //dist[nr3] = f4*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uy+uEPy)+8.0*(uy+uEPy)*(uy+uEPy)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - // q = 5 - dist[nr6] = f5*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uz+uEPz)); - //dist[nr6] = f5*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uz+uEPz)+8.0*(uz+uEPz)*(uz+uEPz)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); + // q=2 + dist[nr1] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + //f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + + // q = 3 + dist[nr4] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y ); + //f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + + // q = 4 + dist[nr3] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + //f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + + // q = 5 + dist[nr6] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + //f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + + // q = 6 + dist[nr5] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); - // q = 6 - dist[nr5] = f6*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uz+uEPz)); - //dist[nr5] = f6*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uz+uEPz)+8.0*(uz+uEPz)*(uz+uEPz)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); } } } @@ -201,6 +418,7 @@ __global__ void dvc_ScaLBL_D3Q7_AAeven_Ion(double *dist, double *Den, double *F double Ex,Ey,Ez;//electrical field double flux_diffusive_x,flux_diffusive_y,flux_diffusive_z; double f0,f1,f2,f3,f4,f5,f6; + double X,Y,Z,factor_x,factor_y,factor_z; int S = Np/NBLOCKS/NTHREADS + 1; for (int s=0; s0) + CD_tmp; + ChargeDensity[n] = CD + CD_tmp; + + // Ci = Den[n+ion_component*Np]; + // CD = ChargeDensity[n]; + // CD_tmp = F*IonValence*Ci; + // ChargeDensity[n] = CD*(ion_component>0) + CD_tmp; } } } +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_v0(int *neighborList, double *dist, + double *Den, double *FluxDiffusive, + double *FluxAdvective, + double *FluxElectrical, double *Velocity, + double *ElectricField, double Di, int zi, + double rlx, double Vt, int start, + int finish, int Np) { + int n; + double Ci; + double ux, uy, uz; + double uEPx, uEPy, uEPz; //electrochemical induced velocity + double Ex, Ey, Ez; //electrical field + double flux_diffusive_x, flux_diffusive_y, flux_diffusive_z; + double f0, f1, f2, f3, f4, f5, f6; + //double X,Y,Z,factor_x, factor_y, factor_z; + int nr1, nr2, nr3, nr4, nr5, nr6; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + // q=2 + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + // q=4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + // q=6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // compute diffusive flux + //Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + //Den[n] = Ci; + + /* use logistic function to prevent negative distributions*/ + //X = 4.0 * (ux + uEPx); + //Y = 4.0 * (uy + uEPy); + //Z = 4.0 * (uz + uEPz); + //factor_x = X / sqrt(1 + X*X); + //factor_y = Y / sqrt(1 + Y*Y); + //factor_z = Z / sqrt(1 + Z*Z); + + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; + + // q = 1 + dist[nr2] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); + // f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + + + // q=2 + dist[nr1] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + // f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + + // q = 3 + dist[nr4] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + // f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y ); + + // q = 4 + dist[nr3] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + // f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + + // q = 5 + dist[nr6] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + // f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + + // q = 6 + dist[nr5] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uz + uEPz)); + // f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); + + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_v0( + double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, + double *FluxElectrical, double *Velocity, double *ElectricField, double Di, + int zi, double rlx, double Vt, int start, int finish, int Np) { + int n; + double Ci; + double ux, uy, uz; + double uEPx, uEPy, uEPz; //electrochemical induced velocity + double Ex, Ey, Ez; //electrical field + double flux_diffusive_x, flux_diffusive_y, flux_diffusive_z; + double f0, f1, f2, f3, f4, f5, f6; + //double X,Y,Z, factor_x, factor_y, factor_z; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s>>(dist, + Den, FluxDiffusive, FluxAdvective, + FluxElectrical, Velocity, + ElectricField, Di, zi, + rlx, Vt, start, finish, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("cuda error in dvc_ScaLBL_D3Q7_AAeven_Ion_v0: %s \n",cudaGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_v0(int *neighborList, double *dist, + double *Den, double *FluxDiffusive, + double *FluxAdvective, + double *FluxElectrical, double *Velocity, + double *ElectricField, double Di, int zi, + double rlx, double Vt, int start, + int finish, int Np) { + + dvc_ScaLBL_D3Q7_AAodd_Ion_v0<<>>(neighborList, dist, + Den, FluxDiffusive, FluxAdvective, + FluxElectrical, Velocity, + ElectricField, Di, zi, + rlx, Vt, start, + finish, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("cuda error in dvc_ScaLBL_D3Q7_AAodd_Ion_v0: %s \n",cudaGetErrorString(err)); + } +} + extern "C" void ScaLBL_D3Q7_AAodd_IonConcentration(int *neighborList, double *dist, double *Den, int start, int finish, int Np){ @@ -408,7 +907,7 @@ extern "C" void ScaLBL_D3Q7_Ion_Init_FromFile(double *dist, double *Den, int Np) //cudaProfilerStop(); } -extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity, int IonValence, int ion_component, int start, int finish, int Np){ +extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity, double IonValence, int ion_component, int start, int finish, int Np){ //cudaProfilerStart(); dvc_ScaLBL_D3Q7_Ion_ChargeDensity<<>>(Den,ChargeDensity,IonValence,ion_component,start,finish,Np); @@ -419,3 +918,65 @@ extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity } //cudaProfilerStop(); } + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef(int *membrane, int *Map, double *Distance, double *Psi, double *coef, + double Threshold, double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int memLinks, int Nx, int Ny, int Nz, int Np){ + + dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef<<>>(membrane, Map, Distance, Psi, coef, + Threshold, MassFractionIn, MassFractionOut, ThresholdMassFractionIn, ThresholdMassFractionOut, + memLinks, Nx, Ny, Nz, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef: %s \n",cudaGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo( + const int Cqx, const int Cqy, int const Cqz, + int *Map, double *Distance, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int *d3q7_recvlist, int *d3q7_linkList, double *coef, int start, int nlinks, int count, + const int N, const int Nx, const int Ny, const int Nz) { + + int GRID = count / NTHREADS + 1; + + dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo<<>>( + Cqx, Cqy, Cqz, Map, Distance, Psi, Threshold, + MassFractionIn, MassFractionOut, ThresholdMassFractionIn, ThresholdMassFractionOut, + d3q7_recvlist, d3q7_linkList, coef, start, nlinks, count, N, Nx, Ny, Nz); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo: %s \n",cudaGetErrorString(err)); + } +} + + +extern "C" void ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef){ + + int GRID = count / NTHREADS + 1; + + dvc_ScaLBL_D3Q7_Membrane_Unpack<<>>(q, d3q7_recvlist, recvbuf,count, + dist, N, coef); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_Unpack: %s \n",cudaGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_IonTransport(int *membrane, double *coef, + double *dist, double *Den, int memLinks, int Np){ + + dvc_ScaLBL_D3Q7_Membrane_IonTransport<<>>(membrane, coef, dist, Den, memLinks, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_IonTransport: %s \n",cudaGetErrorString(err)); + } +} + diff --git a/cuda/MRT.cu b/cuda/MRT.cu index e79a41d1..d2508cea 100644 --- a/cuda/MRT.cu +++ b/cuda/MRT.cu @@ -4,8 +4,8 @@ //************************************************************************* #include -#define NBLOCKS 560 -#define NTHREADS 128 +#define NBLOCKS 1024 +#define NTHREADS 512 __global__ void INITIALIZE(char *ID, double *f_even, double *f_odd, int Nx, int Ny, int Nz) { diff --git a/cuda/Poisson.cu b/cuda/Poisson.cu index dd0cb4bc..1e3b75a1 100644 --- a/cuda/Poisson.cu +++ b/cuda/Poisson.cu @@ -271,6 +271,413 @@ __global__ void dvc_ScaLBL_D3Q7_Poisson_Init(int *Map, double *dist, double *Ps } } +__global__ void dvc_ScaLBL_D3Q19_AAeven_Poisson_ElectricPotential( + int *Map, double *dist, double *Den_charge, double *Psi, double epsilon_LB, bool UseSlippingVelBC, int start, int finish, int Np) { + int n; + double psi,sum; //electric potential + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + double Gs; + int idx; + + for (n = start; n < finish; n++) { + rho_e = (UseSlippingVelBC==1) ? 0.0 : Den_charge[n] / epsilon_LB; + + //........................................................................ + // q=0 + f0 = dist[n]; + f1 = dist[2 * Np + n]; + f2 = dist[1 * Np + n]; + f3 = dist[4 * Np + n]; + f4 = dist[3 * Np + n]; + f5 = dist[6 * Np + n]; + f6 = dist[5 * Np + n]; + f7 = dist[8 * Np + n]; + f8 = dist[7 * Np + n]; + f9 = dist[10 * Np + n]; + f10 = dist[9 * Np + n]; + f11 = dist[12 * Np + n]; + f12 = dist[11 * Np + n]; + f13 = dist[14 * Np + n]; + f14 = dist[13 * Np + n]; + f15 = dist[16 * Np + n]; + f16 = dist[15 * Np + n]; + f17 = dist[18 * Np + n]; + f18 = dist[17 * Np + n]; + + psi = f0 + f2 + f1 + f4 + f3 + f6 + f5 + f8 + f7 + f10 + f9 + f12 + + f11 + f14 + f13 + f16 + f15 + f18 + f17; + + idx = Map[n]; + + Psi[idx] = psi - 0.5*rho_e; + } +} + +__global__ void dvc_ScaLBL_D3Q19_AAodd_Poisson(int *neighborList, int *Map, + double *dist, double *Den_charge, + double *Psi, double *ElectricField, + double tau, double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + int n; + double psi; //electric potential + double Ex, Ey, Ez; //electric field + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + int nr1, nr2, nr3, nr4, nr5, nr6, nr7, nr8, nr9, nr10, nr11, nr12, nr13, + nr14, nr15, nr16, nr17, nr18; + double error,sum_q; + double rlx = 1.0 / tau; + int idx; + + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + + // q = 4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + + // q = 6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // q=7 + nr7 = neighborList[n + 6 * Np]; + f7 = dist[nr7]; + + // q = 8 + nr8 = neighborList[n + 7 * Np]; + f8 = dist[nr8]; + + // q=9 + nr9 = neighborList[n + 8 * Np]; + f9 = dist[nr9]; + + // q = 10 + nr10 = neighborList[n + 9 * Np]; + f10 = dist[nr10]; + + // q=11 + nr11 = neighborList[n + 10 * Np]; + f11 = dist[nr11]; + + // q=12 + nr12 = neighborList[n + 11 * Np]; + f12 = dist[nr12]; + + // q=13 + nr13 = neighborList[n + 12 * Np]; + f13 = dist[nr13]; + + // q=14 + nr14 = neighborList[n + 13 * Np]; + f14 = dist[nr14]; + + // q=15 + nr15 = neighborList[n + 14 * Np]; + f15 = dist[nr15]; + + // q=16 + nr16 = neighborList[n + 15 * Np]; + f16 = dist[nr16]; + + // q=17 + //fq = dist[18*Np+n]; + nr17 = neighborList[n + 16 * Np]; + f17 = dist[nr17]; + + // q=18 + nr18 = neighborList[n + 17 * Np]; + f18 = dist[nr18]; + + sum_q = f1+f2+f3+f4+f5+f6+f7+f8+f9+f10+f11+f12+f13+f14+f15+f16+f17+f18; + error = 8.0*(sum_q - f0) + rho_e; + + psi = 2.0*(f0*(1.0 - rlx) + rlx*(sum_q + 0.125*rho_e)); + + idx = Map[n]; + Psi[idx] = psi; + + Ex = (f1 - f2 + 0.5*(f7 - f8 + f9 - f10 + f11 - f12 + f13 - f14))*4.0; //NOTE the unit of electric field here is V/lu + Ey = (f3 - f4 + 0.5*(f7 - f8 - f9 + f10 + f15 - f16 + f17 - f18))*4.0; + Ez = (f5 - f6 + 0.5*(f11 - f12 - f13 + f14 + f15 - f16 - f17 + f18))*4.0; + ElectricField[n + 0 * Np] = Ex; + ElectricField[n + 1 * Np] = Ey; + ElectricField[n + 2 * Np] = Ez; + + // q = 0 + dist[n] = W0*psi; //f0 * (1.0 - rlx) - (1.0-0.5*rlx)*W0*rho_e; + + // q = 1 + dist[nr2] = W1*psi; //f1 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 2 + dist[nr1] = W1*psi; //f2 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 3 + dist[nr4] = W1*psi; //f3 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 4 + dist[nr3] = W1*psi; //f4 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 5 + dist[nr6] = W1*psi; //f5 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 6 + dist[nr5] = W1*psi; //f6 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + //........................................................................ + + // q = 7 + dist[nr8] = W2*psi; //f7 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 8 + dist[nr7] = W2*psi; //f8 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 9 + dist[nr10] = W2*psi; //f9 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 10 + dist[nr9] = W2*psi; //f10 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 11 + dist[nr12] = W2*psi; //f11 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 12 + dist[nr11] = W2*psi; //f12 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 13 + dist[nr14] = W2*psi; //f13 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q= 14 + dist[nr13] = W2*psi; //f14 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 15 + dist[nr16] = W2*psi; //f15 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 16 + dist[nr15] = W2*psi; //f16 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 17 + dist[nr18] = W2*psi; //f17 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 18 + dist[nr17] = W2*psi; //f18 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + } + } +} + +__global__ void dvc_ScaLBL_D3Q19_AAeven_Poisson(int *Map, double *dist, + double *Den_charge, double *Psi, + double *ElectricField, double *Error, double tau, + double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + int n; + double psi; //electric potential + double Ex, Ey, Ez; //electric field + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + double error,sum_q; + double rlx = 1.0 / tau; + int idx; + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s>>(neighborList, Map, + dist, Den_charge, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, start, finish, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q19_AAodd_Poisson: %s \n",cudaGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q19_AAeven_Poisson(int *Map, double *dist, + double *Den_charge, double *Psi, + double *ElectricField, double *Error, double tau, + double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + + dvc_ScaLBL_D3Q19_AAeven_Poisson<<>>( Map, dist, Den_charge, Psi, + ElectricField, Error, tau, epsilon_LB, UseSlippingVelBC, start, finish, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q19_AAeven_Poisson: %s \n",cudaGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q19_Poisson_Init(int *Map, double *dist, double *Psi, + int start, int finish, int Np){ + //cudaProfilerStart(); + + dvc_ScaLBL_D3Q19_Poisson_Init<<>>(Map, dist, Psi, start, finish, Np); + + cudaError_t err = cudaGetLastError(); + if (cudaSuccess != err){ + printf("CUDA error in ScaLBL_D3Q19_Poisson_Init: %s \n",cudaGetErrorString(err)); + } + +} + extern "C" void ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(int *neighborList,int *Map, double *dist, double *Psi, int start, int finish, int Np){ //cudaProfilerStart(); diff --git a/example/Bubble/CreateBubble.py b/example/Bubble/CreateBubble.py new file mode 100644 index 00000000..76fbefab --- /dev/null +++ b/example/Bubble/CreateBubble.py @@ -0,0 +1,18 @@ +import numpy as np +import matplotlib.pylab as plt + +D=np.ones((40,40,40),dtype="uint8") + +cx = 20 +cy = 20 +cz = 20 + +for i in range(0, 40): + for j in range (0, 40): + for k in range (0,40): + dist = np.sqrt((i-cx)*(i-cx) + (j-cx)*(j-cx) + (k-cz)*(k-cz)) + if (dist < 12.5 ) : + D[i,j,k] = 2 + +D.tofile("bubble_40x40x40.raw") + diff --git a/example/Bubble/CreateCell.py b/example/Bubble/CreateCell.py new file mode 100644 index 00000000..ac62863f --- /dev/null +++ b/example/Bubble/CreateCell.py @@ -0,0 +1,77 @@ +import numpy as np +import matplotlib.pylab as plt + +D=np.ones((40,40,40),dtype="uint8") + +cx = 20 +cy = 20 +cz = 20 + +for i in range(0, 40): + for j in range (0, 40): + for k in range (0,40): + dist = np.sqrt((i-cx)*(i-cx) + (j-cx)*(j-cx) + (k-cz)*(k-cz)) + if (dist < 15.5 ) : + D[i,j,k] = 2 + +D.tofile("cell_40x40x40.raw") + + +C1=np.zeros((40,40,40),dtype="double") +C2=np.zeros((40,40,40),dtype="double") +C3=np.zeros((40,40,40),dtype="double") +C4=np.zeros((40,40,40),dtype="double") +C5=np.zeros((40,40,40),dtype="double") +C6=np.zeros((40,40,40),dtype="double") + +for i in range(0, 40): + for j in range (0, 40): + for k in range (0,40): + #outside the cell + C1[i,j,k] = 4.0e-6 # K + C2[i,j,k] = 150.0e-6 # Na + C3[i,j,k] = 116.0e-6 # Cl + C4[i,j,k] = 29.0e-6 # HC03 + #C5[i,j,k] = 2.4e-6 # Ca + dist = np.sqrt((i-cx)*(i-cx) + (j-cx)*(j-cx) + (k-cz)*(k-cz)) + # inside the cell + if (dist < 15.5 ) : + C1[i,j,k] = 145.0e-6 + C2[i,j,k] = 12.0e-6 + C3[i,j,k] = 4.0e-6 + C4[i,j,k] = 12.0e-6 # 12 mmol / L + #C5[i,j,k] = 0.10e-6 # 100 nmol / L + + +# add up the total charge to make sure it is zero +TotalCharge = 0 +for i in range(0, 40): + for j in range (0, 40): + for k in range (0,40): + TotalCharge += C1[i,j,k] + C2[i,j,k] - C3[i,j,k] - C4[i,j,k] + +TotalCharge /= (40*40*40) + +print("Total charge " + str(TotalCharge)) + + +for i in range(0, 40): + for j in range (0, 40): + for k in range (0,40): + if TotalCharge < 0 : + # need more cation + C5[i,j,k] = abs(TotalCharge) + C6[i,j,k] = 0.0 + else : + # need more anion + C5[i,j,k] = 0.0 + C6[i,j,k] = abs(TotalCharge) + + +C1.tofile("cell_concentration_K_40x40x40.raw") +C2.tofile("cell_concentration_Na_40x40x40.raw") +C3.tofile("cell_concentration_Cl_40x40x40.raw") +C4.tofile("cell_concentration_HCO3_40x40x40.raw") +C5.tofile("cell_concentration_cation_40x40x40.raw") +C6.tofile("cell_concentration_anion_40x40x40.raw") + diff --git a/example/Bubble/cell.db b/example/Bubble/cell.db new file mode 100644 index 00000000..ccbf37e6 --- /dev/null +++ b/example/Bubble/cell.db @@ -0,0 +1,75 @@ +MultiphysController { + timestepMax = 60 + num_iter_Ion_List = 2 + analysis_interval = 50 + tolerance = 1.0e-9 + visualization_interval = 100 // Frequency to write visualization data + analysis_interval = 50 // Frequency to perform analysis +} +Stokes { + tau = 1.0 + F = 0, 0, 0 + ElectricField = 0, 0, 0 //body electric field; user-input unit: [V/m] + nu_phys = 0.889e-6 //fluid kinematic viscosity; user-input unit: [m^2/sec] +} +Ions { + IonConcentrationFile = "cell_concentration_K_40x40x40.raw", "double", "cell_concentration_Na_40x40x40.raw", "double", "cell_concentration_Cl_40x40x40.raw", "double", "cell_concentration_HCO3_40x40x40.raw", "double", "cell_concentration_anion_40x40x40.raw", "double", "cell_concentration_cation_40x40x40.raw", "double" + temperature = 293.15 //unit [K] + number_ion_species = 6 //number of ions + tauList = 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 + IonDiffusivityList = 1.0e-9, 1.0e-9, 1.0e-9, 1.0e-9, 1.0e-9, 1.0e-9 //user-input unit: [m^2/sec] + IonValenceList = 1, 1, -1, -1, 1, -1 //valence charge of ions; dimensionless; positive/negative integer + IonConcentrationList = 1.0e-6, 1.0e-6, 1.0e-6, 1.0e-6, 1.0e-6, 1.0e-6 //user-input unit: [mol/m^3] + BC_Solid = 0 //solid boundary condition; 0=non-flux BC; 1=surface ion concentration + //SolidLabels = 0 //solid labels for assigning solid boundary condition; ONLY for BC_Solid=1 + //SolidValues = 1.0e-5 // user-input surface ion concentration unit: [mol/m^2]; ONLY for BC_Solid=1 + FluidVelDummy = 0.0, 0.0, 1.0e-2 // dummy fluid velocity for debugging +} +Poisson { + epsilonR = 78.5 //fluid dielectric constant [dimensionless] + BC_Inlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + BC_Outlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + //-------------------------------------------------------------------------- + //-------------------------------------------------------------------------- + BC_Solid = 2 //solid boundary condition; 1=surface potential; 2=surface charge density + SolidLabels = 0 //solid labels for assigning solid boundary condition + SolidValues = 0 //if surface potential, unit=[V]; if surface charge density, unit=[C/m^2] + WriteLog = true //write convergence log for LB-Poisson solver + // ------------------------------- Testing Utilities ---------------------------------------- + // ONLY for code debugging; the followings test sine/cosine voltage BCs; disabled by default + TestPeriodic = false + TestPeriodicTime = 1.0 //unit:[sec] + TestPeriodicTimeConv = 0.01 //unit:[sec] + TestPeriodicSaveInterval = 0.2 //unit:[sec] + //------------------------------ advanced setting ------------------------------------ + timestepMax = 100000 //max timestep for obtaining steady-state electrical potential + analysis_interval = 200 //timestep checking steady-state convergence + tolerance = 1.0e-6 //stopping criterion for steady-state solution +} +Domain { + Filename = "cell_40x40x40.raw" + nproc = 1, 1, 1 // Number of processors (Npx,Npy,Npz) + n = 40, 40, 40 // Size of local domain (Nx,Ny,Nz) + N = 40, 40, 40 // size of the input image + voxel_length = 1.0 //resolution; user-input unit: [um] + BC = 0 // Boundary condition type + ReadType = "8bit" + ReadValues = 0, 1, 2 + WriteValues = 0, 1, 2 +} +Analysis { + analysis_interval = 100 + subphase_analysis_interval = 50 // Frequency to perform analysis + restart_interval = 5000 // Frequency to write restart data + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 4 // Number of threads to use + load_balance = "independent" // Load balance method to use: "none", "default", "independent" +} +Visualization { + save_electric_potential = true + save_concentration = true + save_velocity = true +} +Membrane { + MembraneLabels = 2 +} diff --git a/example/PlaneMembrane/plane_membrane.db b/example/PlaneMembrane/plane_membrane.db new file mode 100644 index 00000000..3414af24 --- /dev/null +++ b/example/PlaneMembrane/plane_membrane.db @@ -0,0 +1,117 @@ +MultiphysController { + timestepMax = 20000 + visualization_interval = 1000 // Frequency to write visualization data + analysis_interval = 20 // Frequency to perform analysis +} +Stokes { + epsilonR = 78.5 //fluid dielectric constant [dimensionless] + tau = 1.0 + F = 0, 0, 0 + rho_phys = 998.2 + nu_phys = 1.003e-6 //fluid kinematic viscosity; user-input unit: [m^2/sec] + BC = 3 // Pressure constant BC + din = 1.0 // Inlet pressure + dout = 1.0 // Outlet pressure + UseElectroosmoticVelocityBC = true + SolidLabels = 0, -1 + ZetaPotentialSolidList = -0.005, -0.03 // unit [v] +} +Ions { + temperature = 310.15 //unit [K] + //number_ion_species = 5 //number of ions + //tauList = 1.0, 1.0, 1.0, 1.0, 1.0 // H+, OH-, Na+, Cl-, Fe3+ + //IonDiffusivityList = 9.3e-9, 5.3e-9, 1.3e-9, 2.0e-9, 0.604e-9 //user-input unit: [m^2/sec] + //IonValenceList = 1, -1, 1, -1, 3 //valence charge of ions; dimensionless; positive/negative integer + //IonConcentrationList = 1.0e-4, 1.0e-4, 100, 100, 0 //user-input unit: [mol/m^3] + number_ion_species = 2 //number of ions + //IonConcentrationFile = "Pseudo3D_plane_membrane_concentration_Na_z192_xy64.raw", "double", "Pseudo3D_plane_membrane_concentration_Na_z192_xy64.raw", "double" + tauList = 1.0,1.0 // Na+, anion + IonDiffusivityList = 1e-9,1e-9 //user-input unit: [m^2/sec] + IonValenceList = 1,-1 //valence charge of ions; dimensionless; positive/negative integer + IonConcentrationList = 145e-3,145e-3 //user-input unit: [mol/m^3] + MembraneIonConcentrationList = 15e-3, 15e-3 + BC_InletList = 0,0 //boundary condition for inlet; 0=periodic; 1=ion concentration; 2=ion flux + BC_OutletList = 0,0 //boundary condition for outlet; 0=periodic; 1=ion concentration; 2=ion flux + InletValueList = 15e-3, 15e-3 //if ion concentration unit=[mol/m^3]; if flux (inward) unit=[mol/m^2/sec] + OutletValueList = 145e-3, 145e-3 //if ion concentration unit=[mol/m^3]; if flux (inward) unit=[mol/m^2/sec] + BC_Solid = 0 //solid boundary condition; 0=non-flux BC; 1=surface ion concentration + //SolidLabels = 0 olid labels for assigning solid boundary condition; ONLY for BC_Solid=1 + //SolidValues = 1.0e-5 // user-input surface ion concentration unit: [mol/m^2]; ONLY for BC_Solid=1 + FluidVelDummy = 0.0, 0.0, 0.0 // dummy fluid velocity for debugging +} +Poisson { + epsilonR = 80.4 //fluid dielectric constant [dimensionless] + tau = 4.5 + BC_Inlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + BC_Outlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + InitialValueLabels = 1,2//a list of labels of fluid nodes + InitialValues = 60.6e-3, 0 //unit: [V] + //------- Boundary Voltage for BC = 1 (Inlet & Outlet) --------------------- + Vin = 60.6e-3 //ONLY for BC_Inlet = 1; electrical potential at inlet + Vout = 0 //ONLY for BC_Outlet = 1; electrical potential at outlet + //-------------------------------------------------------------------------- + //------- Boundary Voltage for BC = 2 (Inlet & Outlet) --------------------- + //Vin0 = 0.01 //(ONLY for BC_Inlet = 2); unit:[Volt] + //freqIn = 1.0 //(ONLY for BC_Inlet = 2); unit:[Hz] + //t0_In = 0.0 //(ONLY for BC_Inlet = 1); unit:[sec] + //Vin_Type = 1 //(ONLY for BC_Inlet = 1); 1->sin(); 2->cos() + //Vout0 = 0.01 //(ONLY for BC_Outlet = 1); unit:[Volt] + //freqOut = 1.0 //(ONLY for BC_Outlet = 1); unit:[Hz] + //t0_Out = 0.0 //(ONLY for BC_Outlet = 1); unit:[sec] + //Vout_Type = 1 //(ONLY for BC_Outlet = 1); 1->sin(); 2->cos() + //-------------------------------------------------------------------------- + BC_SolidList = 1 //solid boundary condition; 1=surface potential; 2=surface charge density + SolidLabels = 0 //solid labels for assigning solid boundary condition + SolidValues = -0.001 //if surface potential, unit=[V]; if surface charge density, unit=[C/m^2] + WriteLog = true //write convergence log for LB-Poisson solver + // ------------------------------- Testing Utilities ---------------------------------------- + // ONLY for code debugging; the followings test sine/cosine voltage BCs; disabled by default + TestPeriodic = false + TestPeriodicTime = 1.0 //unit:[sec] + TestPeriodicTimeConv = 0.01 //unit:[sec] + TestPeriodicSaveInterval = 0.2 //unit:[sec] + //------------------------------ advanced setting ------------------------------------ + timestepMax = 10000 //max timestep for obtaining steady-state electrical potential + analysis_interval = 200 //timestep checking steady-state convergence + tolerance = 1.0e-6 //stopping criterion for steady-state solution +} +Membrane { + MembraneLabels = 1 + VoltageThreshold = 100.0, 100.0 + MassFractionIn = 1,0 + MassFractionOut = 1,0 + ThresholdMassFractionIn = 1, 0 + ThresholdMassFractionOut = 1, 0 + +} +Domain { + Filename = "Pseudo3D_double_plane_membrane_z192_xy64_InsideLabel1_OutsideLabel2.raw" + nproc = 1, 1, 3 // Number of processors (Npx,Npy,Npz) + n = 64, 64, 64 // Size of local domain (Nx,Ny,Nz) + N = 64, 64, 192 // size of the input image + voxel_length = 0.01 //resolution; user-input unit: [um] + BC = 0 // Boundary condition type0 + ReadType = "8bit" + ReadValues = 2, 1 + WriteValues = 2, 1 + //InletLayers = 0, 0, 1 + //OutletLayers = 0, 0, 1 + //InletLayersPhase = 1 + //OutletLayersPhase = 1 + //checkerSize = 3 // size of the checker to use +} +Analysis { +} +Visualization { + save_electric_potential = true + save_concentration = true + #save_velocity = true + #save_pressure = true + save_8bit_raw = true +} + + + + + + diff --git a/example/PlaneMembrane/plane_membrane.py b/example/PlaneMembrane/plane_membrane.py new file mode 100755 index 00000000..a525dda0 --- /dev/null +++ b/example/PlaneMembrane/plane_membrane.py @@ -0,0 +1,90 @@ +import numpy as np +import math +import matplotlib.pyplot as plt + +#physical constant +k_B_const = 1.380649e-23 #[J/K] +N_A_const = 6.02214076e23 #[1/mol] +e_const = 1.602176634e-19 #[C] +epsilon0_const = 8.85418782e-12 #[C/V/m] + +#other material property parameters +epsilonr_water = 80.4 +T=310.15 #[K] + +#input ion concentration +C_Na_in = 15e-3 #[mol/m^3] +C_Na_out = 145e-3 #[mol/m^3] +C_K_in = 150e-3 #[mol/m^3] +C_K_out = 4e-3 #[mol/m^3] +C_Cl_in = 10e-3 #[mol/m^3] +C_Cl_out = 110e-3 #[mol/m^3] + +#calculating Debye length +#For the definition of Debye lenght in electrolyte solution, see: +#DOI:10.1016/j.cnsns.2014.03.005 +#Eq(42) in Yoshida etal., Coupled LB method for simulator electrokinetic flows +prefactor= math.sqrt(epsilonr_water*epsilon0_const*k_B_const*T/2.0/N_A_const/e_const**2) +debye_length_in = prefactor*np.sqrt(np.array([1.0/C_Na_in,1.0/C_K_in,1.0/C_Cl_in])) +debye_length_out = prefactor*np.sqrt(np.array([1.0/C_Na_out,1.0/C_K_out,1.0/C_Cl_out])) +print("Debye length inside membrane in [m]") +print(debye_length_in) +print("Debye length outside membrane in [m]") +print(debye_length_out) + + +#setup domain +cube_length_z = 192 +cube_length_xy = 64 +#set LBPM domain resoluiton +h=0.01 #[um] +print("Image resolution = %.6g [um] (= %.6g [m])"%(h,h*1e-6)) + +domain=2*np.ones((cube_length_z,cube_length_xy,cube_length_xy),dtype=np.int8) +zgrid,ygrid,xgrid=np.meshgrid(np.arange(cube_length_z),np.arange(cube_length_xy),np.arange(cube_length_xy),indexing='ij') +domain_centre=cube_length_xy/2 +make_bubble = np.logical_and(zgrid>=cube_length_z/4,zgrid<=cube_length_z*0.75) +domain[make_bubble]=1 + +##save domain +file_name= "Pseudo3D_double_plane_membrane_z192_xy64_InsideLabel1_OutsideLabel2.raw" +domain.tofile(file_name) +print("save file: "+file_name) + +#debug plot +#plt.figure(1) +#plt.pcolormesh(domain[:,int(domain_centre),:]) +#plt.colorbar() +#plt.axis("equal") +#plt.show() + +##generate initial ion concentration - 3D +#domain_Na = C_Na_out*np.ones_like(domain,dtype=np.float64) +#domain_Na[make_bubble] = C_Na_in +#domain_K = C_K_out*np.ones_like(domain,dtype=np.float64) +#domain_K[make_bubble] = C_K_in +#domain_Cl = C_Cl_out*np.ones_like(domain,dtype=np.float64) +#domain_Cl[make_bubble] = C_Cl_in +# +#domain_Na.tofile("Pseudo3D_plane_membrane_concentration_Na_z192_xy64.raw") +#domain_K.tofile("Pseudo3D_plane_membrane_concentration_K_z192_xy64.raw") +#domain_Cl.tofile("Pseudo3D_plane_membrane_concentration_Cl_z192_xy64.raw") + +##debug plot +#plt.figure(2) +#plt.subplot(1,3,1) +#plt.title("Na concentration") +#plt.pcolormesh(domain_Na[:,int(bubble_centre),:]) +#plt.colorbar() +#plt.axis("equal") +#plt.subplot(1,3,2) +#plt.title("K concentration") +#plt.pcolormesh(domain_K[:,int(bubble_centre),:]) +#plt.colorbar() +#plt.axis("equal") +#plt.subplot(1,3,3) +#plt.title("Cl concentration") +#plt.pcolormesh(domain_Cl[:,int(bubble_centre),:]) +#plt.colorbar() +#plt.axis("equal") +#plt.show() diff --git a/example/SingleCell/Bacterium.db b/example/SingleCell/Bacterium.db new file mode 100644 index 00000000..2df4f1bd --- /dev/null +++ b/example/SingleCell/Bacterium.db @@ -0,0 +1,86 @@ +MultiphysController { + timestepMax = 25000 + num_iter_Ion_List = 4 + analysis_interval = 100 + tolerance = 1.0e-9 + visualization_interval = 1000 // Frequency to write visualization data +} +Stokes { + tau = 1.0 + F = 0, 0, 0 + ElectricField = 0, 0, 0 //body electric field; user-input unit: [V/m] + nu_phys = 0.889e-6 //fluid kinematic viscosity; user-input unit: [m^2/sec] +} +Ions { + MembraneIonConcentrationList = 150.0e-3, 10.0e-3, 15.0e-3, 155.0e-3 //user-input unit: [mol/m^3] + temperature = 293.15 //unit [K] + number_ion_species = 4 //number of ions + tauList = 1.0, 1.0, 1.0, 1.0 + IonDiffusivityList = 1.0e-9, 1.0e-9, 1.0e-9, 1.0e-9 //user-input unit: [m^2/sec] + IonValenceList = 1, -1, 1, -1 //valence charge of ions; dimensionless; positive/negative integer + IonConcentrationList = 4.0e-3, 20.0e-3, 16.0e-3, 0.0e-3 //user-input unit: [mol/m^3] + BC_Solid = 0 //solid boundary condition; 0=non-flux BC; 1=surface ion concentration + //SolidLabels = 0 //solid labels for assigning solid boundary condition; ONLY for BC_Solid=1 + //SolidValues = 1.0e-5 // user-input surface ion concentration unit: [mol/m^2]; ONLY for BC_Solid=1 + FluidVelDummy = 0.0, 0.0, 0.0 // dummy fluid velocity for debugging + BC_InletList = 0, 0, 0, 0 + BC_OutletList = 0, 0, 0, 0 + +} +Poisson { + lattice_scheme = "D3Q19" + epsilonR = 78.5 //fluid dielectric constant [dimensionless] + BC_Inlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + BC_Outlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + //-------------------------------------------------------------------------- + //-------------------------------------------------------------------------- + BC_Solid = 2 //solid boundary condition; 1=surface potential; 2=surface charge density + SolidLabels = 0 //solid labels for assigning solid boundary condition + SolidValues = 0 //if surface potential, unit=[V]; if surface charge density, unit=[C/m^2] + WriteLog = true //write convergence log for LB-Poisson solver + // ------------------------------- Testing Utilities ---------------------------------------- + // ONLY for code debugging; the followings test sine/cosine voltage BCs; disabled by default + TestPeriodic = false + TestPeriodicTime = 1.0 //unit:[sec] + TestPeriodicTimeConv = 0.01 //unit:[sec] + TestPeriodicSaveInterval = 0.2 //unit:[sec] + //------------------------------ advanced setting ------------------------------------ + timestepMax = 4000 //max timestep for obtaining steady-state electrical potential + analysis_interval = 25 //timestep checking steady-state convergence + tolerance = 1.0e-10 //stopping criterion for steady-state solution + InitialValueLabels = 1, 2 + InitialValues = 0.0, 0.0 + +} +Domain { + Filename = "Bacterium.swc" + nproc = 2, 1, 1 // Number of processors (Npx,Npy,Npz) + n = 64, 64, 64 // Size of local domain (Nx,Ny,Nz) + N = 128, 64, 64 // size of the input image + voxel_length = 0.01 //resolution; user-input unit: [um] + BC = 0 // Boundary condition type + ReadType = "swc" + ReadValues = 0, 1, 2 + WriteValues = 0, 1, 2 +} +Analysis { + analysis_interval = 100 + subphase_analysis_interval = 50 // Frequency to perform analysis + restart_interval = 5000 // Frequency to write restart data + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 4 // Number of threads to use + load_balance = "independent" // Load balance method to use: "none", "default", "independent" +} +Visualization { + save_electric_potential = true + save_concentration = true + save_velocity = false +} +Membrane { + MembraneLabels = 2 + VoltageThreshold = 0.0, 0.0, 0.0, 0.0 + MassFractionIn = 1e-1, 1.0, 5e-3, 0.0 + MassFractionOut = 1e-1, 1.0, 5e-3, 0.0 + ThresholdMassFractionIn = 1e-1, 1.0, 5e-3, 0.0 + ThresholdMassFractionOut = 1e-1, 1.0, 5e-3, 0.0 +} diff --git a/example/SingleCell/Bacterium.swc b/example/SingleCell/Bacterium.swc new file mode 100644 index 00000000..18103720 --- /dev/null +++ b/example/SingleCell/Bacterium.swc @@ -0,0 +1,8 @@ +# id,type,x,y,z,r,pid +1 1 0.30 0.32 0.32 0.15 -1 +2 1 0.35 0.32 0.32 0.16 1 +3 1 0.43 0.32 0.32 0.17 2 +4 1 0.60 0.32 0.32 0.18 3 +5 1 0.77 0.32 0.32 0.17 4 +6 1 0.85 0.32 0.32 0.16 5 +7 1 0.90 0.32 0.32 0.15 6 diff --git a/example/SingleCell/NaCl-cell.py b/example/SingleCell/NaCl-cell.py new file mode 100644 index 00000000..dcb0adbf --- /dev/null +++ b/example/SingleCell/NaCl-cell.py @@ -0,0 +1,41 @@ +import numpy as np +import matplotlib.pylab as plt + +Nx = 64 +Ny = 64 +Nz = 64 +cx = Nx/2 +cy = Ny/2 +cz = Nz/2 +radius = 12 + +D=np.ones((Nx,Ny,Nz),dtype="uint8") + +for i in range(0, Nx): + for j in range (0, Ny): + for k in range (0,Nz): + dist = np.sqrt((i-cx)*(i-cx) + (j-cx)*(j-cx) + (k-cz)*(k-cz)) + if (dist < radius ) : + D[i,j,k] = 2 + +D.tofile("cell_64x64x64.raw") + + +C1=np.zeros((Nx,Ny,Nz),dtype="double") +C2=np.zeros((Nx,Ny,Nz),dtype="double") + +for i in range(0, Nx): + for j in range (0, Ny): + for k in range (0,Nz): + #outside the cell + C1[i,j,k] = 125.0e-6 # Na + C2[i,j,k] = 125.0e-6 # Cl + dist = np.sqrt((i-cx)*(i-cx) + (j-cx)*(j-cx) + (k-cz)*(k-cz)) + # inside the cell + if (dist < radius ) : + C1[i,j,k] = 110.0e-6 + C2[i,j,k] = 110.0e-6 + +C1.tofile("cell_concentration_Na_64x64x64.raw") +C2.tofile("cell_concentration_Cl_64x64x64.raw") + diff --git a/example/SingleCell/NaCl.db b/example/SingleCell/NaCl.db new file mode 100644 index 00000000..95d222e0 --- /dev/null +++ b/example/SingleCell/NaCl.db @@ -0,0 +1,80 @@ +MultiphysController { + timestepMax = 2000 + num_iter_Ion_List = 2 + analysis_interval = 40 + tolerance = 1.0e-9 + visualization_interval = 40 // Frequency to write visualization data +} +Stokes { + tau = 1.0 + F = 0, 0, 0 + ElectricField = 0, 0, 0 //body electric field; user-input unit: [V/m] + nu_phys = 0.889e-6 //fluid kinematic viscosity; user-input unit: [m^2/sec] +} +Ions { + IonConcentrationFile = "cell_concentration_Na_64x64x64.raw", "double", "cell_concentration_Cl_64x64x64.raw", "double" + temperature = 293.15 //unit [K] + number_ion_species = 2 //number of ions + tauList = 1.0, 1.0 + IonDiffusivityList = 1.0e-9, 1.0e-9 //user-input unit: [m^2/sec] + IonValenceList = 1, -1 //valence charge of ions; dimensionless; positive/negative integer + IonConcentrationList = 1.0e-6, 1.0e-6 //user-input unit: [mol/m^3] + BC_Solid = 0 //solid boundary condition; 0=non-flux BC; 1=surface ion concentration + //SolidLabels = 0 //solid labels for assigning solid boundary condition; ONLY for BC_Solid=1 + //SolidValues = 1.0e-5 // user-input surface ion concentration unit: [mol/m^2]; ONLY for BC_Solid=1 + FluidVelDummy = 0.0, 0.0, 0.0 // dummy fluid velocity for debugging +} +Poisson { + epsilonR = 78.5 //fluid dielectric constant [dimensionless] + BC_Inlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + BC_Outlet = 0 // ->1: fixed electric potential; ->2: sine/cosine periodic electric potential + //-------------------------------------------------------------------------- + //-------------------------------------------------------------------------- + BC_Solid = 2 //solid boundary condition; 1=surface potential; 2=surface charge density + SolidLabels = 0 //solid labels for assigning solid boundary condition + SolidValues = 0 //if surface potential, unit=[V]; if surface charge density, unit=[C/m^2] + WriteLog = true //write convergence log for LB-Poisson solver + // ------------------------------- Testing Utilities ---------------------------------------- + // ONLY for code debugging; the followings test sine/cosine voltage BCs; disabled by default + TestPeriodic = false + TestPeriodicTime = 1.0 //unit:[sec] + TestPeriodicTimeConv = 0.01 //unit:[sec] + TestPeriodicSaveInterval = 0.2 //unit:[sec] + //------------------------------ advanced setting ------------------------------------ + timestepMax = 4000 //max timestep for obtaining steady-state electrical potential + analysis_interval = 25 //timestep checking steady-state convergence + tolerance = 1.0e-10 //stopping criterion for steady-state solution +} +Domain { + Filename = "cell_64x64x64.raw" + nproc = 1, 1, 1 // Number of processors (Npx,Npy,Npz) + n = 64, 64, 64 // Size of local domain (Nx,Ny,Nz) + N = 64, 64, 64 // size of the input image + voxel_length = 0.1 //resolution; user-input unit: [um] + BC = 0 // Boundary condition type + ReadType = "8bit" + ReadValues = 0, 1, 2 + WriteValues = 0, 1, 2 +} +Analysis { + analysis_interval = 100 + subphase_analysis_interval = 50 // Frequency to perform analysis + restart_interval = 5000 // Frequency to write restart data + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 4 // Number of threads to use + load_balance = "independent" // Load balance method to use: "none", "default", "independent" +} +Visualization { + save_electric_potential = true + save_concentration = true + save_velocity = true +} +Membrane { + MembraneLabels = 2 + VoltageThreshold = 0.0, 0.0 + MassFractionIn = 1e-2, 1e-8 + MassFractionOut = 1e-2, 1e-8 + ThresholdMassFractionIn = 1e-2, 1e-8 + ThresholdMassFractionOut = 1e-2, 1e-8 + +} diff --git a/example/systems/crusher/A2-8GPU/Color.sh b/example/systems/crusher/A2-8GPU/Color.sh new file mode 100644 index 00000000..f8acf3b2 --- /dev/null +++ b/example/systems/crusher/A2-8GPU/Color.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +#SBATCH -A CSC380 +#SBATCH -J Color-dense +#SBATCH -o %x-%j.out +#SBATCH -t 0:10:00 +#SBATCH -p batch +#SBATCH -N 1 +#SBATCH --exclusive + +# MODULE ENVIRONMENT +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" +export MPICH_GPU_SUPPORT_ENABLED=1 + +#export MPL_MBX_SIZE=1024000000 + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +export LBPM_BIN=/ccs/proj/csc380/mcclurej/crusher/LBPM/bin + +echo "Running Color LBM" + +MYCPUBIND="--cpu-bind=verbose,map_cpu:57,33,25,1,9,17,41,49" + +srun --verbose -N1 -n8 --cpus-per-gpu=8 --gpus-per-task=1 --gpu-bind=closest ${MYCPUBIND} $LBPM_BIN/lbpm_color_simulator input.db +#srun --verbose -N1 -n2 --mem-per-gpu=8g --cpus-per-gpu=1 --gpus-per-node=2 --gpu-bind=closest $LBPM_BIN/lbpm_permeability_simulator input.db + +exit; diff --git a/example/systems/crusher/A2-8GPU/MRT.sh b/example/systems/crusher/A2-8GPU/MRT.sh new file mode 100644 index 00000000..1da58a06 --- /dev/null +++ b/example/systems/crusher/A2-8GPU/MRT.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +#SBATCH -A CSC380 +#SBATCH -J MRT-a2 +#SBATCH -o %x-%j.out +#SBATCH -t 0:10:00 +#SBATCH -p batch +#SBATCH -N 1 +#SBATCH --exclusive + +# MODULE ENVIRONMENT +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" +export MPICH_GPU_SUPPORT_ENABLED=1 + +#export MPL_MBX_SIZE=1024000000 + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +export LBPM_BIN=/ccs/proj/csc380/mcclurej/crusher/LBPM/bin + +echo "Running Color LBM" + +MYCPUBIND="--cpu-bind=verbose,map_cpu:57,33,25,1,9,17,41,49" + +srun --verbose -N1 -n8 --cpus-per-gpu=8 --gpus-per-task=1 --gpu-bind=closest ${MYCPUBIND} $LBPM_BIN/lbpm_permeability_simulator input.db +#srun --verbose -N1 -n2 --mem-per-gpu=8g --cpus-per-gpu=1 --gpus-per-node=2 --gpu-bind=closest $LBPM_BIN/lbpm_permeability_simulator input.db + +exit; diff --git a/example/systems/crusher/A2-8GPU/input.db b/example/systems/crusher/A2-8GPU/input.db new file mode 100644 index 00000000..1d2322e7 --- /dev/null +++ b/example/systems/crusher/A2-8GPU/input.db @@ -0,0 +1,69 @@ +MRT { + timestepMax = 10000 + analysis_interval = 20000 + tau = 0.7 + F = 0, 0, 5.0e-5 + Restart = false + din = 1.0 + dout = 1.0 + flux = 0.0 +} + +Color { + tauA = 0.7; + tauB = 0.7; + rhoA = 1.0; + rhoB = 1.0; + alpha = 1e-2; + beta = 0.95; + F = 0, 0, 1.0e-5 + Restart = false + flux = 0.0 // voxels per timestep + timestepMax = 10000 + // rescale_force_after_timestep = 100000 + ComponentLabels = 0, -1, -2 + ComponentAffinity = -1.0, -1.0, -0.9 + // protocol = "image sequence" + // capillary_number = 1e-5 +} + +Domain { + Filename = "a2_2048x2048x8192.raw" + nproc = 2, 2, 2 // Number of processors (Npx,Npy,Npz) + offset = 0, 0, 0 + n = 382, 382, 382 // Size of local domain (Nx,Ny,Nz) + N = 2048, 2048, 1024 // size of the input image + + voxel_length = 1.0 // Length of domain (x,y,z) + BC = 0 // Boundary condition type + //Sw = 0.2 + ReadType = "8bit" + ReadValues = 0, 1, 2, -1, -2 + WriteValues = 0, 1, 2, -1, -2 + ComponentLabels = 0, -1, -2 + InletLayers = 0, 0, 5 + OutletLayers = 0, 0, 5 +} + +Analysis { + visualization_interval = 1000000 + //morph_interval = 100000 + //morph_delta = -0.08 + analysis_interval = 20000 // Frequency to perform analysis + min_steady_timesteps = 15000000 + max_steady_timesteps = 15000000 + restart_interval = 500000 // Frequency to write restart data + + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 0 // Number of threads to use + load_balance = "default" // Load balance method to use: "none", "default", "independent" +} + +Visualization { + save_8bit_raw = true + write_silo = true + +} + +FlowAdaptor { +} diff --git a/example/systems/crusher/Dense-8GPU/MPI-multinode.sh b/example/systems/crusher/Dense-8GPU/MPI-multinode.sh new file mode 100644 index 00000000..0304243f --- /dev/null +++ b/example/systems/crusher/Dense-8GPU/MPI-multinode.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +#SBATCH -A CSC380 +#SBATCH -J MPI-multinode +#SBATCH -o %x-%j.out +#SBATCH -t 6:00:00 +#SBATCH -p batch +#SBATCH -N 8 +#SBATCH --exclusive + +# MODULE ENVIRONMENT +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" +export MPICH_GPU_SUPPORT_ENABLED=1 + +#export MPL_MBX_SIZE=1024000000 + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +export LBPM_BIN=/ccs/proj/csc380/mcclurej/crusher/LBPM/tests + +echo "Running Color LBM" + +MYCPUBIND="--cpu-bind=verbose,map_cpu:57" + +srun --verbose -N8 -n8 --cpus-per-gpu=8 --gpus-per-task=1 --gpu-bind=closest ${MYCPUBIND} $LBPM_BIN/TestCommD3Q19 multinode.db +#srun --verbose -N1 -n2 --mem-per-gpu=8g --cpus-per-gpu=1 --gpus-per-node=2 --gpu-bind=closest $LBPM_BIN/lbpm_permeability_simulator input.db + +exit; diff --git a/example/systems/crusher/Dense-8GPU/MPI-singlenode.sh b/example/systems/crusher/Dense-8GPU/MPI-singlenode.sh new file mode 100644 index 00000000..ac0b81b3 --- /dev/null +++ b/example/systems/crusher/Dense-8GPU/MPI-singlenode.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +#SBATCH -A CSC380 +#SBATCH -J MPI-singlenode +#SBATCH -o %x-%j.out +#SBATCH -t 0:10:00 +#SBATCH -p batch +#SBATCH -N 1 +#SBATCH --exclusive + +# MODULE ENVIRONMENT +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" +export MPICH_GPU_SUPPORT_ENABLED=1 + +#export MPL_MBX_SIZE=1024000000 + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +export LBPM_BIN=/ccs/proj/csc380/mcclurej/crusher/LBPM/tests + +echo "Running Color LBM" + +MYCPUBIND="--cpu-bind=verbose,map_cpu:57,33,25,1,9,17,41,49" + +srun --verbose -N1 -n8 --cpus-per-gpu=8 --gpus-per-task=1 --gpu-bind=closest ${MYCPUBIND} $LBPM_BIN/TestCommD3Q19 multinode.db +#srun --verbose -N1 -n2 --mem-per-gpu=8g --cpus-per-gpu=1 --gpus-per-node=2 --gpu-bind=closest $LBPM_BIN/lbpm_permeability_simulator input.db + +exit; diff --git a/example/systems/crusher/Dense-8GPU/RandomSystem.py b/example/systems/crusher/Dense-8GPU/RandomSystem.py new file mode 100644 index 00000000..64e1380e --- /dev/null +++ b/example/systems/crusher/Dense-8GPU/RandomSystem.py @@ -0,0 +1,9 @@ +import numpy as np + +N = 1024 + +data = np.random.randint(low=1,high=3,size=(N,N,N),dtype=np.uint8) + +data.tofile("dense_1024x1024x1024.raw") + + diff --git a/example/systems/crusher/Dense-8GPU/multinode.db b/example/systems/crusher/Dense-8GPU/multinode.db new file mode 100644 index 00000000..605e03c0 --- /dev/null +++ b/example/systems/crusher/Dense-8GPU/multinode.db @@ -0,0 +1,69 @@ +MRT { + timestepMax = 100 + analysis_interval = 20000 + tau = 0.7 + F = 0, 0, 5.0e-5 + Restart = false + din = 1.0 + dout = 1.0 + flux = 0.0 +} + +Color { + tauA = 0.7; + tauB = 0.7; + rhoA = 1.0; + rhoB = 1.0; + alpha = 1e-2; + beta = 0.95; + F = 0, 0, 0.0 + Restart = false + flux = 0.0 // voxels per timestep + timestepMax = 10 + // rescale_force_after_timestep = 100000 + ComponentLabels = 0, -1, -2 + ComponentAffinity = -1.0, -1.0, -0.9 + // protocol = "image sequence" + // capillary_number = 1e-5 +} + +Domain { + Filename = "dense_1024x1024x1024.raw" + nproc = 2, 2, 2 // Number of processors (Npx,Npy,Npz) + offset = 0, 0, 0 + n = 222, 222, 222 // Size of local domain (Nx,Ny,Nz) + N = 1024, 1024, 1024 // size of the input image + + voxel_length = 1.0 // Length of domain (x,y,z) + BC = 0 // Boundary condition type + //Sw = 0.2 + ReadType = "8bit" + ReadValues = 0, 1, 2, -1, -2 + WriteValues = 0, 1, 2, -1, -2 + ComponentLabels = 0, -1, -2 + InletLayers = 0, 0, 5 + OutletLayers = 0, 0, 5 +} + +Analysis { + visualization_interval = 1000000 + //morph_interval = 100000 + //morph_delta = -0.08 + analysis_interval = 20000 // Frequency to perform analysis + min_steady_timesteps = 15000000 + max_steady_timesteps = 15000000 + restart_interval = 500000 // Frequency to write restart data + + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 0 // Number of threads to use + load_balance = "default" // Load balance method to use: "none", "default", "independent" +} + +Visualization { + save_8bit_raw = true + write_silo = true + +} + +FlowAdaptor { +} diff --git a/example/systems/spock/GenerateSphereTest.sh b/example/systems/spock/GenerateSphereTest.sh new file mode 100644 index 00000000..ff15ab4f --- /dev/null +++ b/example/systems/spock/GenerateSphereTest.sh @@ -0,0 +1,14 @@ +#!/bin/bash +#SBATCH -A CSC380 +#SBATCH -J sphere_test +#SBATCH -o %x-%j.out +#SBATCH -t 00:05:00 +#SBATCH -p caar +#SBATCH -N 1 + +module load rocm/4.2.0 +export LBPM_DIR=/ccs/proj/csc380/mcclurej/spock/install/lbpm/tests + +srun -n1 --ntasks-per-node=1 $LBPM_DIR/GenerateSphereTest input.db + + diff --git a/example/systems/spock/Spock.sh b/example/systems/spock/Spock.sh new file mode 100644 index 00000000..349aba8a --- /dev/null +++ b/example/systems/spock/Spock.sh @@ -0,0 +1,17 @@ +#!/bin/bash +#SBATCH -A CSC380 +#SBATCH -J sphere_test +#SBATCH -o %x-%j.out +#SBATCH -t 00:05:00 +#SBATCH -p caar +#SBATCH -N 1 + +module load rocm/4.2.0 +export LBPM_DIR=/ccs/proj/csc380/mcclurej/spock/install/lbpm/tests +export MPICH_SMP_SINGLE_COPY_MODE=CMA + +#srun -n1 --ntasks-per-node=1 --accel-bind=g --gpus-per-task=1 $LBPM_DIR/lbpm_color_simulator spheres322.db + + +srun -n1 --ntasks-per-node=1 --accel-bind=g --gpus-per-task=1 $LBPM_DIR/TestCommD3Q19 spheres322.db + diff --git a/example/systems/spock/Test.sh b/example/systems/spock/Test.sh new file mode 100644 index 00000000..66a94ac5 --- /dev/null +++ b/example/systems/spock/Test.sh @@ -0,0 +1,32 @@ +#!/bin/bash +#SBATCH -A CSC380 +#SBATCH -J sphere_test +#SBATCH -o %x-%j.out +#SBATCH -e %x-%j.err +#SBATCH -t 00:05:00 +#SBATCH -p caar +#SBATCH -N 1 + +module load craype-accel-amd-gfx908 +module load PrgEnv-cray +#module load rocm +module load rocm/4.2.0 + +export LBPM_DIR=/ccs/proj/csc380/mcclurej/spock/install/lbpm/tests +#export MPICH_RDMA_ENABLED_CUDA=1 +#export MPICH_ENV_DISPLAY=1 +#export MPICH_GPU_SUPPORT_ENABLED=1 +export MPICH_GPU_NO_ASYNC_MEMCPY=0 +export MPICH_SMP_SINGLE_COPY_MODE=CMA +#export MPICH_DBG_FILENAME="./mpich-dbg.log" +export MPICH_DBG_CLASS=ALL +export MPICH_DBG_LEVEL=VERBOSE +export MPICH_DBG=yes +#export PMI_DEBUG=1 +export MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 +export MPICH_GPU_SUPPORT_ENABLED=1 +#srun -n1 --ntasks-per-node=1 --accel-bind=g --gpus-per-task=1 $LBPM_DIR/lbpm_color_simulator spheres322.db + + +srun -n1 --ntasks-per-node=1 --accel-bind=g --gpus-per-task=1 --verbose --export=ALL $LBPM_DIR/TestCommD3Q19 test.db + diff --git a/example/systems/spock/input.db b/example/systems/spock/input.db new file mode 100644 index 00000000..9870c7a1 --- /dev/null +++ b/example/systems/spock/input.db @@ -0,0 +1,54 @@ +MRT { + tau = 1.0 // relaxation time + F = 0, 0, 1e-4 // external body force applied to system + timestepMax = 1000 // max number of timesteps + din = 1.0 + dout = 1.0 + Restart = false + flux = 0.0 +} + +Color { + tauA = 0.7; // relaxation time for fluid A + tauB = 0.7; // relaxation time for fluid B + rhoA = 1.0; // mass density for fluid A + rhoB = 1.0; // mass density for fluid B + alpha = 1e-3; // controls interfacial tension between fluids + beta = 0.95; // controls interface width + F = 0, 0, 1.0e-5 // external body force applied to the system + Restart = false // restart from checkpoint file? + din = 1.0 // density at inlet (if external BC is applied) + dout = 1.0 // density at outlet (if external BC is applied ) + timestepMax = 10 // maximum number of timesteps to simulate + flux = 0.0 // volumetric flux in voxels per timestep (if flux BC is applied) + ComponentLabels = 0 // comma separated list of solid mineral labels + ComponentAffinity = -1.0 // comma separated list of phase indicato field value to assign for each mineral label +} + +Domain { + nproc = 1, 1, 1 // Number of processors (Npx,Npy,Npz) + n = 318, 320, 320 // Size of local domain (Nx,Ny,Nz) + N = 320, 320, 320 + nspheres = 1896 // Number of spheres (only needed if using a sphere packing) + L = 1, 1, 1 // Length of domain (x,y,z) + BC = 0 // Boundary condition type + // BC = 0 for periodic BC + // BC = 1 for pressure BC (applied in z direction) + // BC = 4 for flux BC (applied in z direction + ReadType = "8bit" + ReadValues = 0, 1, 2 // list of labels within the binary file (read) + WriteValues = 0, 1, 2 // list of labels within the output files (write) +} + +Analysis { + analysis_interval = 1000 // Frequency to perform analysis + restart_interval = 50000 // Frequency to write restart data + visualization_interval = 50000 // Frequency to write visualization data + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 4 // Number of threads to use + load_balance = "independent" // Load balance method to use: "none", "default", "independent" +} + +Visualization { + +} diff --git a/example/systems/spock/sphere322.db b/example/systems/spock/sphere322.db new file mode 100644 index 00000000..e69de29b diff --git a/example/systems/spock/test.db b/example/systems/spock/test.db new file mode 100644 index 00000000..2fdf8fce --- /dev/null +++ b/example/systems/spock/test.db @@ -0,0 +1,54 @@ +MRT { + tau = 1.0 // relaxation time + F = 0, 0, 1e-4 // external body force applied to system + timestepMax = 1000 // max number of timesteps + din = 1.0 + dout = 1.0 + Restart = false + flux = 0.0 +} + +Color { + tauA = 0.7; // relaxation time for fluid A + tauB = 0.7; // relaxation time for fluid B + rhoA = 1.0; // mass density for fluid A + rhoB = 1.0; // mass density for fluid B + alpha = 1e-3; // controls interfacial tension between fluids + beta = 0.95; // controls interface width + F = 0, 0, 1.0e-5 // external body force applied to the system + Restart = false // restart from checkpoint file? + din = 1.0 // density at inlet (if external BC is applied) + dout = 1.0 // density at outlet (if external BC is applied ) + timestepMax = 10 // maximum number of timesteps to simulate + flux = 0.0 // volumetric flux in voxels per timestep (if flux BC is applied) + ComponentLabels = 0 // comma separated list of solid mineral labels + ComponentAffinity = -1.0 // comma separated list of phase indicato field value to assign for each mineral label +} + +Domain { + nproc = 1, 1, 1 // Number of processors (Npx,Npy,Npz) + n = 240, 240, 240 // Size of local domain (Nx,Ny,Nz) + N = 320, 320, 320 + nspheres = 1896 // Number of spheres (only needed if using a sphere packing) + L = 1, 1, 1 // Length of domain (x,y,z) + BC = 0 // Boundary condition type + // BC = 0 for periodic BC + // BC = 1 for pressure BC (applied in z direction) + // BC = 4 for flux BC (applied in z direction + ReadType = "8bit" + ReadValues = 0, 1, 2 // list of labels within the binary file (read) + WriteValues = 0, 1, 2 // list of labels within the output files (write) +} + +Analysis { + analysis_interval = 1000 // Frequency to perform analysis + restart_interval = 50000 // Frequency to write restart data + visualization_interval = 50000 // Frequency to write visualization data + restart_file = "Restart" // Filename to use for restart file (will append rank) + N_threads = 4 // Number of threads to use + load_balance = "independent" // Load balance method to use: "none", "default", "independent" +} + +Visualization { + +} diff --git a/hip/BGK.cu b/hip/BGK.hip similarity index 96% rename from hip/BGK.cu rename to hip/BGK.hip index c75e7324..aef19bfe 100644 --- a/hip/BGK.cu +++ b/hip/BGK.hip @@ -18,9 +18,11 @@ #include "hip/hip_runtime.h" #define NBLOCKS 1024 -#define NTHREADS 256 +#define NTHREADS 512 -__global__ void dvc_ScaLBL_D3Q19_AAeven_BGK(double *dist, int start, int finish, int Np, double rlx, double Fx, double Fy, double Fz){ + +__global__ void +__launch_bounds__(512,1) dvc_ScaLBL_D3Q19_AAeven_BGK(double *dist, int start, int finish, int Np, double rlx, double Fx, double Fy, double Fz){ int n; // conserved momemnts double rho,ux,uy,uz,uu; @@ -138,7 +140,8 @@ __global__ void dvc_ScaLBL_D3Q19_AAeven_BGK(double *dist, int start, int finish, } } -__global__ void dvc_ScaLBL_D3Q19_AAodd_BGK(int *neighborList, double *dist, int start, int finish, int Np, double rlx, double Fx, double Fy, double Fz){ +__global__ void +__launch_bounds__(512,1) dvc_ScaLBL_D3Q19_AAodd_BGK(int *neighborList, double *dist, int start, int finish, int Np, double rlx, double Fx, double Fy, double Fz){ int n; // conserved momemnts double rho,ux,uy,uz,uu; diff --git a/hip/CMakeLists.txt b/hip/CMakeLists.txt deleted file mode 100644 index 9e613960..00000000 --- a/hip/CMakeLists.txt +++ /dev/null @@ -1,10 +0,0 @@ -SET( HIP_SEPERABLE_COMPILATION ON ) -FILE( GLOB HIP_SOURCES "*.cu" ) -SET_SOURCE_FILES_PROPERTIES( ${HIP_SOURCES} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1 ) -HIP_ADD_LIBRARY( lbpm-hip ${HIP_SOURCES} SHARED HIPCC_OPTIONS ${HIP_HIPCC_OPTIONS} HCC_OPTIONS ${HIP_HCC_OPTIONS} NVCC_OPTIONS ${HIP_NVCC_OPTIONS} ${HIP_NVCC_FLAGS} ) -#TARGET_LINK_LIBRARIES( lbpm-hip /opt/rocm-3.3.0/lib/libhip_hcc.so ) -#TARGET_LINK_LIBRARIES( lbpm-wia lbpm-hip ) -#ADD_DEPENDENCIES( lbpm-hip copy-include ) - - - diff --git a/hip/Color.cu b/hip/Color.hip similarity index 97% rename from hip/Color.cu rename to hip/Color.hip index 4d200820..0199043f 100644 --- a/hip/Color.cu +++ b/hip/Color.hip @@ -21,6 +21,21 @@ #define NBLOCKS 1024 #define NTHREADS 256 + + +__device__ __constant__ double mrt_V1=0.05263157894736842; +__device__ __constant__ double mrt_V2=0.012531328320802; +__device__ __constant__ double mrt_V3=0.04761904761904762; +__device__ __constant__ double mrt_V4=0.004594820384294068; +__device__ __constant__ double mrt_V5=0.01587301587301587; +__device__ __constant__ double mrt_V6=0.0555555555555555555555555; +__device__ __constant__ double mrt_V7=0.02777777777777778; +__device__ __constant__ double mrt_V8=0.08333333333333333; +__device__ __constant__ double mrt_V9=0.003341687552213868; +__device__ __constant__ double mrt_V10=0.003968253968253968; +__device__ __constant__ double mrt_V11=0.01388888888888889; +__device__ __constant__ double mrt_V12=0.04166666666666666; + __global__ void dvc_ScaLBL_Color_Init(char *ID, double *Den, double *Phi, double das, double dbs, int Nx, int Ny, int Nz) { //int i,j,k; @@ -541,7 +556,7 @@ __global__ void dvc_ColorCollide( char *ID, double *disteven, double *distodd, } __global__ void -__launch_bounds__(512,2) +__launch_bounds__(256,1) dvc_ScaLBL_D3Q19_ColorCollide( char *ID, double *disteven, double *distodd, double *phi, double *ColorGrad, double *Velocity, int Nx, int Ny, int Nz, double rlx_setA, double rlx_setB, double alpha, double beta, double Fx, double Fy, double Fz) @@ -1257,7 +1272,8 @@ __global__ void dvc_ScaLBL_SetSlice_z(double *Phi, double value, int Nx, int Ny -__global__ void dvc_ScaLBL_D3Q19_AAeven_Color(int *Map, double *dist, double *Aq, double *Bq, double *Den, double *Phi, +__global__ void +__launch_bounds__(256,1) dvc_ScaLBL_D3Q19_AAeven_Color(int *Map, double *dist, double *Aq, double *Bq, double *Den, double *Phi, double *Velocity, double rhoA, double rhoB, double tauA, double tauB, double alpha, double beta, double Fx, double Fy, double Fz, int strideY, int strideZ, int start, int finish, int Np){ int ijk,nn,n; @@ -1273,19 +1289,6 @@ __global__ void dvc_ScaLBL_D3Q19_AAeven_Color(int *Map, double *dist, double *A double ux,uy,uz; double phi,tau,rho0,rlx_setA,rlx_setB; - const double mrt_V1=0.05263157894736842; - const double mrt_V2=0.012531328320802; - const double mrt_V3=0.04761904761904762; - const double mrt_V4=0.004594820384294068; - const double mrt_V5=0.01587301587301587; - const double mrt_V6=0.0555555555555555555555555; - const double mrt_V7=0.02777777777777778; - const double mrt_V8=0.08333333333333333; - const double mrt_V9=0.003341687552213868; - const double mrt_V10=0.003968253968253968; - const double mrt_V11=0.01388888888888889; - const double mrt_V12=0.04166666666666666; - int S = Np/NBLOCKS/NTHREADS + 1; for (int s=0; s>>(q, list, start, count, sendbuf, dist, N); + dvc_ScaLBL_D3Q19_Pack <<>>(q, list, start, count, sendbuf, dist, N); } extern "C" void ScaLBL_D3Q19_Unpack(int q, int *list, int start, int count, double *recvbuf, double *dist, int N){ int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q19_Unpack <<>>(q, list, start, count, recvbuf, dist, N); + dvc_ScaLBL_D3Q19_Unpack <<>>(q, list, start, count, recvbuf, dist, N); } //************************************************************************* @@ -2423,19 +2425,17 @@ extern "C" void ScaLBL_D3Q19_Swap_Compact(int *neighborList, double *disteven, d printf("CUDA error in ScaLBL_D3Q19_Swap: %s \n",hipGetErrorString(err)); } } - -extern "C" void ScaLBL_D3Q19_AAeven_Compact(char * ID, double *d_dist, int Np) { +extern "C" void ScaLBL_D3Q19_AAeven_Compact( double *d_dist, int Np) { hipFuncSetCacheConfig( (void*) dvc_ScaLBL_AAeven_Compact, hipFuncCachePreferL1); - dvc_ScaLBL_AAeven_Compact<<>>(ID, d_dist, Np); + dvc_ScaLBL_AAeven_Compact<<>>( d_dist, Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ printf("CUDA error in ScaLBL_D3Q19_Init: %s \n",hipGetErrorString(err)); } } - -extern "C" void ScaLBL_D3Q19_AAodd_Compact(char * ID, int *d_neighborList, double *d_dist, int Np) { +extern "C" void ScaLBL_D3Q19_AAodd_Compact( int *d_neighborList, double *d_dist, int Np) { hipFuncSetCacheConfig( (void*) dvc_ScaLBL_AAodd_Compact, hipFuncCachePreferL1); - dvc_ScaLBL_AAodd_Compact<<>>(ID,d_neighborList, d_dist,Np); + dvc_ScaLBL_AAodd_Compact<<>>(d_neighborList, d_dist,Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ printf("CUDA error in ScaLBL_D3Q19_Init: %s \n",hipGetErrorString(err)); diff --git a/hip/D3Q7.cu b/hip/D3Q7.hip similarity index 100% rename from hip/D3Q7.cu rename to hip/D3Q7.hip diff --git a/hip/D3Q7BC.cu b/hip/D3Q7BC.cu deleted file mode 100644 index c594791a..00000000 --- a/hip/D3Q7BC.cu +++ /dev/null @@ -1,567 +0,0 @@ -#include -#include -#include "hip/hip_runtime.h" - -#define NBLOCKS 560 -#define NTHREADS 128 - -__global__ void dvc_ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count) -{ - - int idx; - int iq,ib; - double value_b,value_q; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - iq = BounceBackDist_list[idx]; - ib = BounceBackSolid_list[idx]; - value_b = BoundaryValue[ib];//get boundary value from a solid site - value_q = dist[iq]; - dist[iq] = -1.0*value_q + value_b*0.25;//NOTE 0.25 is the speed of sound for D3Q7 lattice - } -} - -__global__ void dvc_ScaLBL_Solid_Neumann_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count) -{ - - int idx; - int iq,ib; - double value_b,value_q; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - iq = BounceBackDist_list[idx]; - ib = BounceBackSolid_list[idx]; - value_b = BoundaryValue[ib];//get boundary value from a solid site - value_q = dist[iq]; - dist[iq] = value_q + value_b; - } -} - -__global__ void dvc_ScaLBL_Solid_DirichletAndNeumann_D3Q7(double *dist, double *BoundaryValue,int *BoundaryLabel, int *BounceBackDist_list, int *BounceBackSolid_list, int count) -{ - - int idx; - int iq,ib; - double value_b,value_b_label,value_q; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - iq = BounceBackDist_list[idx]; - ib = BounceBackSolid_list[idx]; - value_b = BoundaryValue[ib];//get boundary value from a solid site - value_b_label = BoundaryLabel[ib];//get boundary label (i.e. type of BC) from a solid site - value_q = dist[iq]; - if (value_b_label==1){//Dirichlet BC - dist[iq] = -1.0*value_q + value_b*0.25;//NOTE 0.25 is the speed of sound for D3Q7 lattice - } - if (value_b_label==2){//Neumann BC - dist[iq] = value_q + value_b; - } - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z(int *list, double *dist, double Vin, int count, int Np) -{ - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f6 = dist[5*Np+n]; - //................................................... - f5 = Vin - (f0+f1+f2+f3+f4+f6); - dist[6*Np+n] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z(int *list, double *dist, double Vout, int count, int Np) -{ - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f5 = dist[6*Np+n]; - //................................................... - f6 = Vout - (f0+f1+f2+f3+f4+f5); - dist[5*Np+n] = f6; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z(int *d_neighborList, int *list, double *dist, double Vin, int count, int Np) -{ - int idx, n; - int nread,nr5; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - nread = d_neighborList[n+5*Np]; - f6 = dist[nread]; - - // Unknown distributions - nr5 = d_neighborList[n+4*Np]; - f5 = Vin - (f0+f1+f2+f3+f4+f6); - dist[nr5] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z(int *d_neighborList, int *list, double *dist, double Vout, int count, int Np) -{ - int idx, n; - int nread,nr6; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+4*Np]; - f5 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - // unknown distributions - nr6 = d_neighborList[n+5*Np]; - f6 = Vout - (f0+f1+f2+f3+f4+f5); - dist[nr6] = f6; - } -} - -__global__ void dvc_ScaLBL_Poisson_D3Q7_BC_z(int *list, int *Map, double *Psi, double Vin, int count) -{ - int idx,n,nm; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - nm = Map[n]; - Psi[nm] = Vin; - } -} - - -__global__ void dvc_ScaLBL_Poisson_D3Q7_BC_Z(int *list, int *Map, double *Psi, double Vout, int count) -{ - int idx,n,nm; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - nm = Map[n]; - Psi[nm] = Vout; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z(int *list, double *dist, double Cin, int count, int Np) -{ - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f6 = dist[5*Np+n]; - //................................................... - f5 = Cin - (f0+f1+f2+f3+f4+f6); - dist[6*Np+n] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z(int *list, double *dist, double Cout, int count, int Np) -{ - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f5 = dist[6*Np+n]; - //................................................... - f6 = Cout - (f0+f1+f2+f3+f4+f5); - dist[5*Np+n] = f6; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z(int *d_neighborList, int *list, double *dist, double Cin, int count, int Np) -{ - int idx, n; - int nread,nr5; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - nread = d_neighborList[n+5*Np]; - f6 = dist[nread]; - - // Unknown distributions - nr5 = d_neighborList[n+4*Np]; - f5 = Cin - (f0+f1+f2+f3+f4+f6); - dist[nr5] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z(int *d_neighborList, int *list, double *dist, double Cout, int count, int Np) -{ - int idx, n; - int nread,nr6; - double f0,f1,f2,f3,f4,f5,f6; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+4*Np]; - f5 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - // unknown distributions - nr6 = d_neighborList[n+5*Np]; - f6 = Cout - (f0+f1+f2+f3+f4+f5); - dist[nr6] = f6; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) -{ - //NOTE: FluxIn is the inward flux - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - double fsum_partial; - double uz; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f6 = dist[5*Np+n]; - fsum_partial = f0+f1+f2+f3+f4+f6; - uz = VelocityZ[n]; - //................................................... - f5 =(FluxIn+(1.0-0.5/tau)*f6-0.5*uz*fsum_partial/tau)/(1.0-0.5/tau+0.5*uz/tau); - dist[6*Np+n] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) -{ - //NOTE: FluxIn is the inward flux - int idx,n; - double f0,f1,f2,f3,f4,f5,f6; - double fsum_partial; - double uz; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - f1 = dist[2*Np+n]; - f2 = dist[1*Np+n]; - f3 = dist[4*Np+n]; - f4 = dist[3*Np+n]; - f5 = dist[6*Np+n]; - fsum_partial = f0+f1+f2+f3+f4+f5; - uz = VelocityZ[n]; - //................................................... - f6 =(FluxIn+(1.0-0.5/tau)*f5+0.5*uz*fsum_partial/tau)/(1.0-0.5/tau-0.5*uz/tau); - dist[5*Np+n] = f6; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) -{ - //NOTE: FluxIn is the inward flux - int idx, n; - int nread,nr5; - double f0,f1,f2,f3,f4,f5,f6; - double fsum_partial; - double uz; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - nread = d_neighborList[n+5*Np]; - f6 = dist[nread]; - - fsum_partial = f0+f1+f2+f3+f4+f6; - uz = VelocityZ[n]; - //................................................... - f5 =(FluxIn+(1.0-0.5/tau)*f6-0.5*uz*fsum_partial/tau)/(1.0-0.5/tau+0.5*uz/tau); - - // Unknown distributions - nr5 = d_neighborList[n+4*Np]; - dist[nr5] = f5; - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) -{ - //NOTE: FluxIn is the inward flux - int idx, n; - int nread,nr6; - double f0,f1,f2,f3,f4,f5,f6; - double fsum_partial; - double uz; - idx = blockIdx.x*blockDim.x + threadIdx.x; - if (idx < count){ - n = list[idx]; - f0 = dist[n]; - - nread = d_neighborList[n]; - f1 = dist[nread]; - - nread = d_neighborList[n+2*Np]; - f3 = dist[nread]; - - nread = d_neighborList[n+4*Np]; - f5 = dist[nread]; - - nread = d_neighborList[n+Np]; - f2 = dist[nread]; - - nread = d_neighborList[n+3*Np]; - f4 = dist[nread]; - - fsum_partial = f0+f1+f2+f3+f4+f5; - uz = VelocityZ[n]; - //................................................... - f6 =(FluxIn+(1.0-0.5/tau)*f5+0.5*uz*fsum_partial/tau)/(1.0-0.5/tau-0.5*uz/tau); - - // unknown distributions - nr6 = d_neighborList[n+5*Np]; - dist[nr6] = f6; - } -} -//************************************************************************* - -extern "C" void ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ - int GRID = count / 512 + 1; - dvc_ScaLBL_Solid_Dirichlet_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_Solid_Dirichlet_D3Q7 (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_Solid_Neumann_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ - int GRID = count / 512 + 1; - dvc_ScaLBL_Solid_Neumann_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_Solid_Neumann_D3Q7 (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_Solid_DirichletAndNeumann_D3Q7(double *dist, double *BoundaryValue,int *BoundaryLabel, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ - int GRID = count / 512 + 1; - dvc_ScaLBL_Solid_DirichletAndNeumann_D3Q7<<>>(dist, BoundaryValue, BoundaryLabel, BounceBackDist_list, BounceBackSolid_list, count); - cudaError_t err = cudaGetLastError(); - if (cudaSuccess != err){ - printf("hip error in ScaLBL_Solid_DirichletAndNeumann_D3Q7 (kernel): %s \n",cudaGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z(int *list, double *dist, double Vin, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z<<>>(list, dist, Vin, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z(int *list, double *dist, double Vout, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z<<>>(list, dist, Vout, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z(int *d_neighborList, int *list, double *dist, double Vin, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z<<>>(d_neighborList, list, dist, Vin, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z(int *d_neighborList, int *list, double *dist, double Vout, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z<<>>(d_neighborList, list, dist, Vout, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_Poisson_D3Q7_BC_z(int *list, int *Map, double *Psi, double Vin, int count){ - int GRID = count / 512 + 1; - dvc_ScaLBL_Poisson_D3Q7_BC_z<<>>(list, Map, Psi, Vin, count); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_Poisson_D3Q7_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_Poisson_D3Q7_BC_Z(int *list, int *Map, double *Psi, double Vout, int count){ - int GRID = count / 512 + 1; - dvc_ScaLBL_Poisson_D3Q7_BC_Z<<>>(list, Map, Psi, Vout, count); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_Poisson_D3Q7_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z(int *list, double *dist, double Cin, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z<<>>(list, dist, Cin, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z(int *list, double *dist, double Cout, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z<<>>(list, dist, Cout, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z(int *d_neighborList, int *list, double *dist, double Cin, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z<<>>(d_neighborList, list, dist, Cin, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z(int *d_neighborList, int *list, double *dist, double Cout, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z<<>>(d_neighborList, list, dist, Cout, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Ion_Flux_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Ion_Flux_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Ion_Flux_BC_z (kernel): %s \n",hipGetErrorString(err)); - } -} - -extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ - int GRID = count / 512 + 1; - dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Ion_Flux_BC_Z (kernel): %s \n",hipGetErrorString(err)); - } -} diff --git a/hip/D3Q7BC.hip b/hip/D3Q7BC.hip new file mode 100644 index 00000000..fd3c54fa --- /dev/null +++ b/hip/D3Q7BC.hip @@ -0,0 +1,917 @@ +#include +#include +#include "hip/hip_runtime.h" + +#define NBLOCKS 1024 +#define NTHREADS 256 + + +#define CHECK_ERROR(KERNEL) \ + do { \ + auto err = hipGetLastError(); \ + if ( hipSuccess != err ){ \ + auto errString = hipGetErrorString(err); \ + printf("error in %s (kernel): %s \n",KERNEL,errString); \ + } \ + } while(0) + + +__global__ void dvc_ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count) +{ + + int idx; + int iq,ib; + double value_b,value_q; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + iq = BounceBackDist_list[idx]; + ib = BounceBackSolid_list[idx]; + value_b = BoundaryValue[ib];//get boundary value from a solid site + value_q = dist[iq]; + dist[iq] = -1.0*value_q + value_b*0.25;//NOTE 0.25 is the speed of sound for D3Q7 lattice + } +} + +__global__ void dvc_ScaLBL_Solid_Neumann_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count) +{ + + int idx; + int iq,ib; + double value_b,value_q; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + iq = BounceBackDist_list[idx]; + ib = BounceBackSolid_list[idx]; + value_b = BoundaryValue[ib];//get boundary value from a solid site + value_q = dist[iq]; + dist[iq] = value_q + value_b; + } +} + +__global__ void dvc_ScaLBL_Solid_DirichletAndNeumann_D3Q7(double *dist, double *BoundaryValue,int *BoundaryLabel, int *BounceBackDist_list, int *BounceBackSolid_list, int count) +{ + + int idx; + int iq,ib; + double value_b,value_b_label,value_q; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + iq = BounceBackDist_list[idx]; + ib = BounceBackSolid_list[idx]; + value_b = BoundaryValue[ib];//get boundary value from a solid site + value_b_label = BoundaryLabel[ib];//get boundary label (i.e. type of BC) from a solid site + value_q = dist[iq]; + if (value_b_label==1){//Dirichlet BC + dist[iq] = -1.0*value_q + value_b*0.25;//NOTE 0.25 is the speed of sound for D3Q7 lattice + } + if (value_b_label==2){//Neumann BC + dist[iq] = value_q + value_b; + } + } +} + +__global__ void dvc_ScaLBL_Solid_SlippingVelocityBC_D3Q19(double *dist, double *zeta_potential, double *ElectricField, double *SolidGrad, + double epsilon_LB, double tau, double rho0,double den_scale, double h, double time_conv, + int *BounceBackDist_list, int *BounceBackSolid_list, int *FluidBoundary_list, + double *lattice_weight, float *lattice_cx, float *lattice_cy, float *lattice_cz, + int count, int Np) +{ + int idx; + int iq,ib,ifluidBC; + double value_b,value_q; + double Ex,Ey,Ez; + double Etx,Ety,Etz;//tangential part of electric field + double E_mag_normal; + double nsx,nsy,nsz;//unit normal solid gradient + double ubx,uby,ubz;//slipping velocity at fluid boundary nodes + float cx,cy,cz;//lattice velocity (D3Q19) + double LB_weight;//lattice weighting coefficient (D3Q19) + double cs2_inv = 3.0;//inverse of cs^2 for D3Q19 + double nu_LB = (tau-0.5)/cs2_inv; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + iq = BounceBackDist_list[idx]; + ib = BounceBackSolid_list[idx]; + ifluidBC = FluidBoundary_list[idx]; + value_b = zeta_potential[ib];//get zeta potential from a solid site + value_q = dist[iq]; + + //Load electric field and compute its tangential componet + Ex = ElectricField[ifluidBC+0*Np]; + Ey = ElectricField[ifluidBC+1*Np]; + Ez = ElectricField[ifluidBC+2*Np]; + nsx = SolidGrad[ifluidBC+0*Np]; + nsy = SolidGrad[ifluidBC+1*Np]; + nsz = SolidGrad[ifluidBC+2*Np]; + E_mag_normal = Ex*nsx+Ey*nsy+Ez*nsz;//magnitude of electric field in the direction normal to solid nodes + //compute tangential electric field + Etx = Ex - E_mag_normal*nsx; + Ety = Ey - E_mag_normal*nsy; + Etz = Ez - E_mag_normal*nsz; + ubx = -epsilon_LB*value_b*Etx/(nu_LB*rho0)*time_conv*time_conv/(h*h*1.0e-12)/den_scale; + uby = -epsilon_LB*value_b*Ety/(nu_LB*rho0)*time_conv*time_conv/(h*h*1.0e-12)/den_scale; + ubz = -epsilon_LB*value_b*Etz/(nu_LB*rho0)*time_conv*time_conv/(h*h*1.0e-12)/den_scale; + + //compute bounce-back distribution + LB_weight = lattice_weight[idx]; + cx = lattice_cx[idx]; + cy = lattice_cy[idx]; + cz = lattice_cz[idx]; + dist[iq] = value_q - 2.0*LB_weight*rho0*cs2_inv*(cx*ubx+cy*uby+cz*ubz); + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z(int *list, double *dist, double Vin, int count, int Np) +{ + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f6 = dist[5*Np+n]; + //................................................... + f5 = Vin - (f0+f1+f2+f3+f4+f6); + dist[6*Np+n] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z(int *list, double *dist, double Vout, int count, int Np) +{ + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f5 = dist[6*Np+n]; + //................................................... + f6 = Vout - (f0+f1+f2+f3+f4+f5); + dist[5*Np+n] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z(int *d_neighborList, int *list, double *dist, double Vin, int count, int Np) +{ + int idx, n; + int nread,nr5; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + nread = d_neighborList[n+5*Np]; + f6 = dist[nread]; + + // Unknown distributions + nr5 = d_neighborList[n+4*Np]; + f5 = Vin - (f0+f1+f2+f3+f4+f6); + dist[nr5] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z(int *d_neighborList, int *list, double *dist, double Vout, int count, int Np) +{ + int idx, n; + int nread,nr6; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+4*Np]; + f5 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + // unknown distributions + nr6 = d_neighborList[n+5*Np]; + f6 = Vout - (f0+f1+f2+f3+f4+f5); + dist[nr6] = f6; + } +} + +__global__ void dvc_ScaLBL_Poisson_D3Q7_BC_z(int *list, int *Map, double *Psi, double Vin, int count) +{ + int idx,n,nm; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + nm = Map[n]; + Psi[nm] = Vin; + } +} + + +__global__ void dvc_ScaLBL_Poisson_D3Q7_BC_Z(int *list, int *Map, double *Psi, double Vout, int count) +{ + int idx,n,nm; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + nm = Map[n]; + Psi[nm] = Vout; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z(int *list, double *dist, double Cin, int count, int Np) +{ + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f6 = dist[5*Np+n]; + //................................................... + f5 = Cin - (f0+f1+f2+f3+f4+f6); + dist[6*Np+n] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z(int *list, double *dist, double Cout, int count, int Np) +{ + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f5 = dist[6*Np+n]; + //................................................... + f6 = Cout - (f0+f1+f2+f3+f4+f5); + dist[5*Np+n] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z(int *d_neighborList, int *list, double *dist, double Cin, int count, int Np) +{ + int idx, n; + int nread,nr5; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + nread = d_neighborList[n+5*Np]; + f6 = dist[nread]; + + // Unknown distributions + nr5 = d_neighborList[n+4*Np]; + f5 = Cin - (f0+f1+f2+f3+f4+f6); + dist[nr5] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z(int *d_neighborList, int *list, double *dist, double Cout, int count, int Np) +{ + int idx, n; + int nread,nr6; + double f0,f1,f2,f3,f4,f5,f6; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+4*Np]; + f5 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + // unknown distributions + nr6 = d_neighborList[n+5*Np]; + f6 = Cout - (f0+f1+f2+f3+f4+f5); + dist[nr6] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f6 = dist[5*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*(f6+uz*fsum_partial))/(1.0-0.5/tau)/(1.0-uz); + dist[6*Np+n] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f5 = dist[6*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*(f5-uz*fsum_partial))/(1.0-0.5/tau)/(1.0+uz); + dist[5*Np+n] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr5; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + nread = d_neighborList[n+5*Np]; + f6 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*(f6+uz*fsum_partial))/(1.0-0.5/tau)/(1.0-uz); + + // Unknown distributions + nr5 = d_neighborList[n+4*Np]; + dist[nr5] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr6; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+4*Np]; + f5 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*(f5-uz*fsum_partial))/(1.0-0.5/tau)/(1.0+uz); + + // unknown distributions + nr6 = d_neighborList[n+5*Np]; + dist[nr6] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f6 = dist[5*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*f6-0.5*uz*fsum_partial/tau)/(1.0-0.5/tau+0.5*uz/tau); + dist[6*Np+n] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f5 = dist[6*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*f5+0.5*uz*fsum_partial/tau)/(1.0-0.5/tau-0.5*uz/tau); + dist[5*Np+n] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr5; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + nread = d_neighborList[n+5*Np]; + f6 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*f6-0.5*uz*fsum_partial/tau)/(1.0-0.5/tau+0.5*uz/tau); + + // Unknown distributions + nr5 = d_neighborList[n+4*Np]; + dist[nr5] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr6; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+4*Np]; + f5 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*f5+0.5*uz*fsum_partial/tau)/(1.0-0.5/tau-0.5*uz/tau); + + // unknown distributions + nr6 = d_neighborList[n+5*Np]; + dist[nr6] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + double uEPz;//electrochemical induced velocity + double Ez;//electrical field + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f6 = dist[5*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + Ez = ElectricField_Z[n]; + uEPz=zi*Di/Vt*Ez; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*f6-(0.5*uz/tau+uEPz)*fsum_partial)/(1.0-0.5/tau+0.5*uz/tau+uEPz); + dist[6*Np+n] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx,n; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + double uEPz;//electrochemical induced velocity + double Ez;//electrical field + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + f1 = dist[2*Np+n]; + f2 = dist[1*Np+n]; + f3 = dist[4*Np+n]; + f4 = dist[3*Np+n]; + f5 = dist[6*Np+n]; + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + Ez = ElectricField_Z[n]; + uEPz=zi*Di/Vt*Ez; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*f5+(0.5*uz/tau+uEPz)*fsum_partial)/(1.0-0.5/tau-0.5*uz/tau-uEPz); + dist[5*Np+n] = f6; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr5; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + double uEPz;//electrochemical induced velocity + double Ez;//electrical field + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + nread = d_neighborList[n+5*Np]; + f6 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f6; + uz = VelocityZ[n]; + Ez = ElectricField_Z[n]; + uEPz=zi*Di/Vt*Ez; + //................................................... + f5 =(FluxIn+(1.0-0.5/tau)*f6-(0.5*uz/tau+uEPz)*fsum_partial)/(1.0-0.5/tau+0.5*uz/tau+uEPz); + + // Unknown distributions + nr5 = d_neighborList[n+4*Np]; + dist[nr5] = f5; + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np) +{ + //NOTE: FluxIn is the inward flux + int idx, n; + int nread,nr6; + double f0,f1,f2,f3,f4,f5,f6; + double fsum_partial; + double uz; + double uEPz;//electrochemical induced velocity + double Ez;//electrical field + idx = blockIdx.x*blockDim.x + threadIdx.x; + if (idx < count){ + n = list[idx]; + f0 = dist[n]; + + nread = d_neighborList[n]; + f1 = dist[nread]; + + nread = d_neighborList[n+2*Np]; + f3 = dist[nread]; + + nread = d_neighborList[n+4*Np]; + f5 = dist[nread]; + + nread = d_neighborList[n+Np]; + f2 = dist[nread]; + + nread = d_neighborList[n+3*Np]; + f4 = dist[nread]; + + fsum_partial = f0+f1+f2+f3+f4+f5; + uz = VelocityZ[n]; + Ez = ElectricField_Z[n]; + uEPz=zi*Di/Vt*Ez; + //................................................... + f6 =(FluxIn+(1.0-0.5/tau)*f5+(0.5*uz/tau+uEPz)*fsum_partial)/(1.0-0.5/tau-0.5*uz/tau-uEPz); + + // unknown distributions + nr6 = d_neighborList[n+5*Np]; + dist[nr6] = f6; + } +} +//************************************************************************* + +extern "C" void ScaLBL_Solid_Dirichlet_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Solid_Dirichlet_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); + CHECK_ERROR("ScaLBL_Solid_Dirichlet_D3Q7"); +} + +extern "C" void ScaLBL_Solid_Neumann_D3Q7(double *dist, double *BoundaryValue, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Solid_Neumann_D3Q7<<>>(dist, BoundaryValue, BounceBackDist_list, BounceBackSolid_list, count); + CHECK_ERROR("ScaLBL_Solid_Neumann_D3Q7"); +} + +extern "C" void ScaLBL_Solid_DirichletAndNeumann_D3Q7(double *dist, double *BoundaryValue,int *BoundaryLabel, int *BounceBackDist_list, int *BounceBackSolid_list, int count){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Solid_DirichletAndNeumann_D3Q7<<>>(dist, BoundaryValue, BoundaryLabel, BounceBackDist_list, BounceBackSolid_list, count); + CHECK_ERROR("ScaLBL_Solid_DirichletAndNeumann_D3Q7"); +} + +extern "C" void ScaLBL_Solid_SlippingVelocityBC_D3Q19(double *dist, double *zeta_potential, double *ElectricField, double *SolidGrad, + double epsilon_LB, double tau, double rho0,double den_scale, double h, double time_conv, + int *BounceBackDist_list, int *BounceBackSolid_list, int *FluidBoundary_list, + double *lattice_weight, float *lattice_cx, float *lattice_cy, float *lattice_cz, + int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Solid_SlippingVelocityBC_D3Q19<<>>(dist, zeta_potential, ElectricField, SolidGrad, + epsilon_LB, tau, rho0, den_scale, h, time_conv, + BounceBackDist_list, BounceBackSolid_list, FluidBoundary_list, + lattice_weight, lattice_cx, lattice_cy, lattice_cz, + count, Np); + CHECK_ERROR("ScaLBL_Solid_SlippingVelocityBC_D3Q19"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z(int *list, double *dist, double Vin, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z<<>>(list, dist, Vin, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z(int *list, double *dist, double Vout, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z<<>>(list, dist, Vout, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Poisson_Potential_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z(int *d_neighborList, int *list, double *dist, double Vin, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z<<>>(d_neighborList, list, dist, Vin, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z(int *d_neighborList, int *list, double *dist, double Vout, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z<<>>(d_neighborList, list, dist, Vout, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Poisson_Potential_BC_Z"); +} + +extern "C" void ScaLBL_Poisson_D3Q7_BC_z(int *list, int *Map, double *Psi, double Vin, int count){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Poisson_D3Q7_BC_z<<>>(list, Map, Psi, Vin, count); + CHECK_ERROR("ScaLBL_Poisson_D3Q7_BC_z"); +} + +extern "C" void ScaLBL_Poisson_D3Q7_BC_Z(int *list, int *Map, double *Psi, double Vout, int count){ + int GRID = count / 512 + 1; + dvc_ScaLBL_Poisson_D3Q7_BC_Z<<>>(list, Map, Psi, Vout, count); + CHECK_ERROR("ScaLBL_Poisson_D3Q7_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z(int *list, double *dist, double Cin, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z<<>>(list, dist, Cin, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z(int *list, double *dist, double Cout, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z<<>>(list, dist, Cout, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Concentration_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z(int *d_neighborList, int *list, double *dist, double Cin, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z<<>>(d_neighborList, list, dist, Cin, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z(int *d_neighborList, int *list, double *dist, double Cout, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z<<>>(d_neighborList, list, dist, Cout, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Concentration_BC_Z"); +} +//------------Diff----------------- +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_Diff_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_Diff_BC_Z"); +} +//----------DiffAdvc------------- +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvc_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvc_BC_Z"); +} +//----------DiffAdvcElec------------- +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z<<>>(list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z(int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z<<>>(list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAeven_Ion_Flux_DiffAdvcElec_BC_Z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_z"); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z(int *d_neighborList, int *list, double *dist, double FluxIn, double tau, double *VelocityZ, double *ElectricField_Z, + double Di, double zi, double Vt, int count, int Np){ + int GRID = count / 512 + 1; + dvc_ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z<<>>(d_neighborList, list, dist, FluxIn, tau, VelocityZ, ElectricField_Z, Di, zi, Vt, count, Np); + CHECK_ERROR("ScaLBL_D3Q7_AAodd_Ion_Flux_DiffAdvcElec_BC_Z"); +} +//------------------------------- diff --git a/hip/Extras.cu b/hip/Extras.hip similarity index 100% rename from hip/Extras.cu rename to hip/Extras.hip diff --git a/hip/FreeLee.cu b/hip/FreeLee.hip similarity index 99% rename from hip/FreeLee.cu rename to hip/FreeLee.hip index ed8ed611..98b99f6d 100644 --- a/hip/FreeLee.cu +++ b/hip/FreeLee.hip @@ -2726,10 +2726,10 @@ __global__ void dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_Combined(int *Map, double * } } -__global__ void dvc_ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField(int *neighborList, int *Map, double *hq, double *Den, double *Phi, +__global__ void dvc_ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField_alt(int *neighborList, int *Map, double *hq, double *Den, double *Phi, double rhoA, double rhoB, int start, int finish, int Np){ - int idx,nread; + int n,idx,nread; double fq,phi; // for (int n=start; n>>(neighborList, Map, hq, Den, Phi, ColorGrad, Vel, rhoA, rhoB, tauM, W, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3406,9 +3405,9 @@ extern "C" void ScaLBL_D3Q7_AAodd_FreeLee_PhaseField(int *neighborList, int *Map } extern "C" void ScaLBL_D3Q7_AAeven_FreeLee_PhaseField( int *Map, double *hq, double *Den, double *Phi, double *ColorGrad, double *Vel, - double rhoA, double rhoB, double tauM, double W, int start, int finish, int Np){ - - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q7_AAeven_FreeLee_PhaseField, hipFuncCachePreferL1); + double rhoA, double rhoB, double tauM, double W, int start, int finish, int Np) +{ + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q7_AAeven_FreeLee_PhaseField, hipFuncCachePreferL1); dvc_ScaLBL_D3Q7_AAeven_FreeLee_PhaseField<<>>( Map, hq, Den, Phi, ColorGrad, Vel, rhoA, rhoB, tauM, W, start, finish, Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ @@ -3419,7 +3418,7 @@ extern "C" void ScaLBL_D3Q7_AAeven_FreeLee_PhaseField( int *Map, double *hq, dou extern "C" void ScaLBL_D3Q7_ComputePhaseField(int *Map, double *hq, double *Den, double *Phi, double rhoA, double rhoB, int start, int finish, int Np){ - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q7_ComputePhaseField, hipFuncCachePreferL1); + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q7_ComputePhaseField, hipFuncCachePreferL1); dvc_ScaLBL_D3Q7_ComputePhaseField<<>>( Map, hq, Den, Phi, rhoA, rhoB, start, finish, Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ @@ -3432,7 +3431,7 @@ extern "C" void ScaLBL_D3Q19_AAodd_FreeLeeModel(int *neighborList, int *Map, dou double rhoA, double rhoB, double tauA, double tauB, double kappa, double beta, double W, double Fx, double Fy, double Fz, int strideY, int strideZ, int start, int finish, int Np){ - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel, hipFuncCachePreferL1); + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel<<>>(neighborList, Map, dist, Den, Phi, mu_phi, Vel, Pressure, ColorGrad, rhoA, rhoB, tauA, tauB, kappa, beta, W, Fx, Fy, Fz, strideY, strideZ, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3445,7 +3444,7 @@ extern "C" void ScaLBL_D3Q19_AAeven_FreeLeeModel(int *Map, double *dist, double double rhoA, double rhoB, double tauA, double tauB, double kappa, double beta, double W, double Fx, double Fy, double Fz, int strideY, int strideZ, int start, int finish, int Np){ - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel, hipFuncCachePreferL1); + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel<<>>(Map, dist, Den, Phi, mu_phi, Vel, Pressure, ColorGrad, rhoA, rhoB, tauA, tauB, kappa, beta, W, Fx, Fy, Fz, strideY, strideZ, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3458,7 +3457,7 @@ extern "C" void ScaLBL_D3Q19_AAeven_FreeLeeModel(int *Map, double *dist, double extern "C" void ScaLBL_D3Q19_AAeven_FreeLeeModel_Combined(int *Map, double *dist, double *hq, double *Den, double *Phi, double *mu_phi, double *Vel, double *Pressure, double *ColorGrad, double rhoA, double rhoB, double tauA, double tauB, double tauM, double kappa, double beta, double W, double Fx, double Fy, double Fz, int strideY, int strideZ, int start, int finish, int Np){ - cudaFuncSetCacheConfig(dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_Combined, cudaFuncCachePreferL1); + //hipFuncSetCacheConfig(dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_Combined, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_Combined<<>>(Map, dist, Den, hq, Phi, mu_phi, Vel, Pressure, ColorGrad, rhoA, rhoB, tauA, tauB, tauM, kappa, beta, W, Fx, Fy, Fz, strideY, strideZ, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3471,7 +3470,7 @@ extern "C" void ScaLBL_D3Q19_AAodd_FreeLeeModel_Combined(int *neighborList, int double rhoA, double rhoB, double tauA, double tauB, double tauM, double kappa, double beta, double W, double Fx, double Fy, double Fz, int strideY, int strideZ, int start, int finish, int Np){ - cudaFuncSetCacheConfig(dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_Combined, cudaFuncCachePreferL1); + //hipFuncSetCacheConfig(dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_Combined, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_Combined<<>>(neighborList, Map, dist, hq, Den, Phi, mu_phi, Vel, Pressure, ColorGrad, rhoA, rhoB, tauA, tauB, tauM, kappa, beta, W, Fx, Fy, Fz, strideY, strideZ, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3482,7 +3481,7 @@ extern "C" void ScaLBL_D3Q19_AAodd_FreeLeeModel_Combined(int *neighborList, int extern "C" void ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField(int *neighborList, int *Map, double *hq, double *Den, double *Phi, double rhoA, double rhoB, int start, int finish, int Np){ - cudaFuncSetCacheConfig(dvc_ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField, cudaFuncCachePreferL1); + //hipFuncSetCacheConfig(dvc_ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField, hipFuncCachePreferL1); dvc_ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField<<>>( neighborList, Map, hq, Den, Phi, rhoA, rhoB, start, finish, Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ @@ -3492,7 +3491,7 @@ extern "C" void ScaLBL_D3Q7_AAodd_FreeLeeModel_PhaseField(int *neighborList, int extern "C" void ScaLBL_D3Q7_AAeven_FreeLeeModel_PhaseField(int *Map, double *hq, double *Den, double *Phi, double rhoA, double rhoB, int start, int finish, int Np){ - cudaFuncSetCacheConfig(dvc_ScaLBL_D3Q7_AAeven_FreeLeeModel_PhaseField, cudaFuncCachePreferL1); + //hipFuncSetCacheConfig(dvc_ScaLBL_D3Q7_AAeven_FreeLeeModel_PhaseField, hipFuncCachePreferL1); dvc_ScaLBL_D3Q7_AAeven_FreeLeeModel_PhaseField<<>>( Map, hq, Den, Phi, rhoA, rhoB, start, finish, Np); hipError_t err = hipGetLastError(); if (hipSuccess != err){ @@ -3503,7 +3502,7 @@ extern "C" void ScaLBL_D3Q7_AAeven_FreeLeeModel_PhaseField(int *Map, double *hq, extern "C" void ScaLBL_D3Q19_AAodd_FreeLeeModel_SingleFluid_BGK(int *neighborList, double *dist, double *Vel, double *Pressure, double tau, double rho0, double Fx, double Fy, double Fz, int start, int finish, int Np){ - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_SingleFluid_BGK, hipFuncCachePreferL1); + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_SingleFluid_BGK, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAodd_FreeLeeModel_SingleFluid_BGK<<>>(neighborList, dist, Vel, Pressure, tau, rho0, Fx, Fy, Fz, start, finish, Np); hipError_t err = hipGetLastError(); @@ -3513,9 +3512,9 @@ extern "C" void ScaLBL_D3Q19_AAodd_FreeLeeModel_SingleFluid_BGK(int *neighborLis } extern "C" void ScaLBL_D3Q19_AAeven_FreeLeeModel_SingleFluid_BGK(double *dist, double *Vel, double *Pressure, - double tau, double rho0, double Fx, double Fy, double Fz, int start, int finish, int Np){ - - hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_SingleFluid_BGK, hipFuncCachePreferL1); + double tau, double rho0, double Fx, double Fy, double Fz, int start, int finish, int Np) +{ + //hipFuncSetCacheConfig((void*)dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_SingleFluid_BGK, hipFuncCachePreferL1); dvc_ScaLBL_D3Q19_AAeven_FreeLeeModel_SingleFluid_BGK<<>>(dist, Vel, Pressure, tau, rho0, Fx, Fy, Fz, start, finish, Np); hipError_t err = hipGetLastError(); diff --git a/hip/Greyscale.cu b/hip/Greyscale.hip similarity index 100% rename from hip/Greyscale.cu rename to hip/Greyscale.hip diff --git a/hip/GreyscaleColor.cu b/hip/GreyscaleColor.hip similarity index 99% rename from hip/GreyscaleColor.cu rename to hip/GreyscaleColor.hip index aff7b39a..0607bc0a 100644 --- a/hip/GreyscaleColor.cu +++ b/hip/GreyscaleColor.hip @@ -1,5 +1,6 @@ #include #include +#include "hip/hip_runtime.h" #define NBLOCKS 1024 #define NTHREADS 256 @@ -1609,7 +1610,9 @@ __global__ void dvc_ScaLBL_D3Q19_AAodd_GreyscaleColor_CP(int *neighborList, int Fcpy = ny; Fcpz = nz; double Fcp_mag=sqrt(Fcpx*Fcpx+Fcpy*Fcpy+Fcpz*Fcpz); - if (Fcp_mag==0.0); Fcpx=Fcpy=Fcpz=0.0; + if (Fcp_mag==0.0) { + Fcpx=Fcpy=Fcpz=0.0; + } //NOTE for open node (porosity=1.0),Fcp=0.0 Fcpx *= alpha*W*(1.0-porosity)/sqrt(perm); Fcpy *= alpha*W*(1.0-porosity)/sqrt(perm); @@ -2396,7 +2399,9 @@ __global__ void dvc_ScaLBL_D3Q19_AAeven_GreyscaleColor_CP(int *Map, double *dis Fcpy = ny; Fcpz = nz; double Fcp_mag=sqrt(Fcpx*Fcpx+Fcpy*Fcpy+Fcpz*Fcpz); - if (Fcp_mag==0.0); Fcpx=Fcpy=Fcpz=0.0; + if (Fcp_mag==0.0) { + Fcpx=Fcpy=Fcpz=0.0; + } //NOTE for open node (porosity=1.0),Fcp=0.0 Fcpx *= alpha*W*(1.0-porosity)/sqrt(perm); Fcpy *= alpha*W*(1.0-porosity)/sqrt(perm); diff --git a/hip/Ion.cu b/hip/Ion.cu deleted file mode 100644 index b1d9636e..00000000 --- a/hip/Ion.cu +++ /dev/null @@ -1,422 +0,0 @@ -#include -#include -#include "hip/hip_runtime.h" - -#define NBLOCKS 1024 -#define NTHREADS 256 - - -__global__ void dvc_ScaLBL_D3Q7_AAodd_IonConcentration(int *neighborList, double *dist, double *Den, int start, int finish, int Np){ - int n,nread; - double fq,Ci; - - int S = Np/NBLOCKS/NTHREADS + 1; - for (int s=0; s 10Np => odd part of dist) - f1 = dist[nr1]; // reading the f1 data into register fq - // q=2 - nr2 = neighborList[n+Np]; // neighbor 1 ( < 10Np => even part of dist) - f2 = dist[nr2]; // reading the f2 data into register fq - // q=3 - nr3 = neighborList[n+2*Np]; // neighbor 4 - f3 = dist[nr3]; - // q=4 - nr4 = neighborList[n+3*Np]; // neighbor 3 - f4 = dist[nr4]; - // q=5 - nr5 = neighborList[n+4*Np]; - f5 = dist[nr5]; - // q=6 - nr6 = neighborList[n+5*Np]; - f6 = dist[nr6]; - - // compute diffusive flux - flux_diffusive_x = (1.0-0.5*rlx)*((f1-f2)-ux*Ci); - flux_diffusive_y = (1.0-0.5*rlx)*((f3-f4)-uy*Ci); - flux_diffusive_z = (1.0-0.5*rlx)*((f5-f6)-uz*Ci); - FluxDiffusive[n+0*Np] = flux_diffusive_x; - FluxDiffusive[n+1*Np] = flux_diffusive_y; - FluxDiffusive[n+2*Np] = flux_diffusive_z; - FluxAdvective[n+0*Np] = ux*Ci; - FluxAdvective[n+1*Np] = uy*Ci; - FluxAdvective[n+2*Np] = uz*Ci; - FluxElectrical[n+0*Np] = uEPx*Ci; - FluxElectrical[n+1*Np] = uEPy*Ci; - FluxElectrical[n+2*Np] = uEPz*Ci; - - // q=0 - dist[n] = f0*(1.0-rlx)+rlx*0.25*Ci; - //dist[n] = f0*(1.0-rlx)+rlx*0.25*Ci*(1.0 - 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q = 1 - dist[nr2] = f1*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(ux+uEPx)); - //dist[nr2] = f1*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(ux+uEPx)+8.0*(ux+uEPx)*(ux+uEPx)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q=2 - dist[nr1] = f2*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(ux+uEPx)); - //dist[nr1] = f2*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(ux+uEPx)+8.0*(ux+uEPx)*(ux+uEPx)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q = 3 - dist[nr4] = f3*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uy+uEPy)); - //dist[nr4] = f3*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uy+uEPy)+8.0*(uy+uEPy)*(uy+uEPy)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q = 4 - dist[nr3] = f4*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uy+uEPy)); - //dist[nr3] = f4*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uy+uEPy)+8.0*(uy+uEPy)*(uy+uEPy)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q = 5 - dist[nr6] = f5*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uz+uEPz)); - //dist[nr6] = f5*(1.0-rlx) + rlx*0.125*Ci*(1.0+4.0*(uz+uEPz)+8.0*(uz+uEPz)*(uz+uEPz)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - - // q = 6 - dist[nr5] = f6*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uz+uEPz)); - //dist[nr5] = f6*(1.0-rlx) + rlx*0.125*Ci*(1.0-4.0*(uz+uEPz)+8.0*(uz+uEPz)*(uz+uEPz)- 2.0*((ux+uEPx)*(ux+uEPx) + (uy+uEPy)*(uy+uEPy) + (uz+uEPz)*(uz+uEPz))); - } - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion(double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, - double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ - int n; - double Ci; - double ux,uy,uz; - double uEPx,uEPy,uEPz;//electrochemical induced velocity - double Ex,Ey,Ez;//electrical field - double flux_diffusive_x,flux_diffusive_y,flux_diffusive_z; - double f0,f1,f2,f3,f4,f5,f6; - - int S = Np/NBLOCKS/NTHREADS + 1; - for (int s=0; s0) + CD_tmp; - } - } -} - - -extern "C" void ScaLBL_D3Q7_AAodd_IonConcentration(int *neighborList, double *dist, double *Den, int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAodd_IonConcentration<<>>(neighborList,dist,Den,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_IonConcentration: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAeven_IonConcentration(double *dist, double *Den, int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAeven_IonConcentration<<>>(dist,Den,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_IonConcentration: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAodd_Ion(int *neighborList, double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, - double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAodd_Ion<<>>(neighborList,dist,Den,FluxDiffusive,FluxAdvective,FluxElectrical,Velocity,ElectricField,Di,zi,rlx,Vt,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Ion: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAeven_Ion(double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, - double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAeven_Ion<<>>(dist,Den,FluxDiffusive,FluxAdvective,FluxElectrical,Velocity,ElectricField,Di,zi,rlx,Vt,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Ion: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_Ion_Init(double *dist, double *Den, double DenInit, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_Ion_Init<<>>(dist,Den,DenInit,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_Ion_Init: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_Ion_Init_FromFile(double *dist, double *Den, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_Ion_Init_FromFile<<>>(dist,Den,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_Ion_Init_FromFile: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity, int IonValence, int ion_component, int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_Ion_ChargeDensity<<>>(Den,ChargeDensity,IonValence,ion_component,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_Ion_ChargeDensity: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} diff --git a/hip/Ion.hip b/hip/Ion.hip new file mode 100644 index 00000000..4da89201 --- /dev/null +++ b/hip/Ion.hip @@ -0,0 +1,969 @@ +#include +#include +#include "hip/hip_runtime.h" + +#define NBLOCKS 1024 +#define NTHREADS 512 + +extern "C" void Membrane_D3Q19_Unpack(int q, int *list, int *links, int start, int linkCount, + double *recvbuf, double *dist, int N) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, link; + for (link=0; link Threshold){ + aq = ThresholdMassFractionIn; ap = ThresholdMassFractionOut; + } + + /* Save the mass transfer coefficients */ + coef[2*link] = aq; coef[2*link+1] = ap; + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo( + const int Cqx, const int Cqy, int const Cqz, + int *Map, double *Distance, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int *d3q7_recvlist, int *d3q7_linkList, double *coef, int start, int nlinks, int count, + const int N, const int Nx, const int Ny, const int Nz) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, nqm, npm, label, i, j, k; + double distanceLocal, distanceNonlocal; + double psiLocal, psiNonlocal, membranePotential; + double ap,aq; // coefficient + + /* second enforce custom rule for membrane links */ + int S = count/NBLOCKS/NTHREADS + 1; + for (int s=0; s 0 && !(n < 0)){ + nqm = Map[n]; + distanceLocal = Distance[nqm]; + psiLocal = Psi[nqm]; + + // Get the 3-D indices from the send process + k = nqm/(Nx*Ny); j = (nqm-Nx*Ny*k)/Nx; i = nqm-Nx*Ny*k-Nx*j; + // Streaming link the non-local distribution + i -= Cqx; j -= Cqy; k -= Cqz; + npm = k*Nx*Ny + j*Nx + i; + distanceNonlocal = Distance[npm]; + psiNonlocal = Psi[npm]; + + membranePotential = psiLocal - psiNonlocal; + aq = MassFractionIn; + ap = MassFractionOut; + + /* link is inside membrane */ + if (distanceLocal > 0.0){ + if (membranePotential < Threshold*(-1.0)){ + ap = MassFractionIn; + aq = MassFractionOut; + } + else { + ap = ThresholdMassFractionIn; + aq = ThresholdMassFractionOut; + } + } + else if (membranePotential > Threshold){ + aq = ThresholdMassFractionIn; + ap = ThresholdMassFractionOut; + } + } + coef[2*idx]=aq; + coef[2*idx+1]=ap; + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef) { + //.................................................................................... + // Unack distribution from the recv buffer + // Distribution q matche Cqx, Cqy, Cqz + // swap rule means that the distributions in recvbuf are OPPOSITE of q + // dist may be even or odd distributions stored by stream layout + //.................................................................................... + int n, idx, link; + double fq,fp,fqq,ap,aq; // coefficient + + /* second enforce custom rule for membrane links */ + int S = count/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + // q=2 + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + // q=4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + // q=6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // compute diffusive flux + Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + Den[n] = Ci; + + /* use logistic function to prevent negative distributions*/ + X = 4.0 * (ux + uEPx); + Y = 4.0 * (uy + uEPy); + Z = 4.0 * (uz + uEPz); + factor_x = X / sqrt(1 + X*X); + factor_y = Y / sqrt(1 + Y*Y); + factor_z = Z / sqrt(1 + Z*Z); + + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; + + // q = 1 + dist[nr2] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + //f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); + + + // q=2 + dist[nr1] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + //f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + + // q = 3 + dist[nr4] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y ); + //f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + + // q = 4 + dist[nr3] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + //f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + + // q = 5 + dist[nr6] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + //f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + + // q = 6 + dist[nr5] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); + + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion(double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, + double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ + int n; + double Ci; + double ux,uy,uz; + double uEPx,uEPy,uEPz;//electrochemical induced velocity + double Ex,Ey,Ez;//electrical field + double flux_diffusive_x,flux_diffusive_y,flux_diffusive_z; + double f0,f1,f2,f3,f4,f5,f6; + double X,Y,Z,factor_x,factor_y,factor_z; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + // q=2 + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + // q=4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + // q=6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // compute diffusive flux + //Ci = f0 + f1 + f2 + f3 + f4 + f5 + f6; + flux_diffusive_x = (1.0 - 0.5 * rlx) * ((f1 - f2) - ux * Ci); + flux_diffusive_y = (1.0 - 0.5 * rlx) * ((f3 - f4) - uy * Ci); + flux_diffusive_z = (1.0 - 0.5 * rlx) * ((f5 - f6) - uz * Ci); + FluxDiffusive[n + 0 * Np] = flux_diffusive_x; + FluxDiffusive[n + 1 * Np] = flux_diffusive_y; + FluxDiffusive[n + 2 * Np] = flux_diffusive_z; + FluxAdvective[n + 0 * Np] = ux * Ci; + FluxAdvective[n + 1 * Np] = uy * Ci; + FluxAdvective[n + 2 * Np] = uz * Ci; + FluxElectrical[n + 0 * Np] = uEPx * Ci; + FluxElectrical[n + 1 * Np] = uEPy * Ci; + FluxElectrical[n + 2 * Np] = uEPz * Ci; + + //Den[n] = Ci; + + /* use logistic function to prevent negative distributions*/ + //X = 4.0 * (ux + uEPx); + //Y = 4.0 * (uy + uEPy); + //Z = 4.0 * (uz + uEPz); + //factor_x = X / sqrt(1 + X*X); + //factor_y = Y / sqrt(1 + Y*Y); + //factor_z = Z / sqrt(1 + Z*Z); + + // q=0 + dist[n] = f0 * (1.0 - rlx) + rlx * 0.25 * Ci; + + // q = 1 + dist[nr2] = + f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (ux + uEPx)); + // f1 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_x); + + + // q=2 + dist[nr1] = + f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (ux + uEPx)); + // f2 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_x); + + // q = 3 + dist[nr4] = + f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uy + uEPy)); + // f3 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_y ); + + // q = 4 + dist[nr3] = + f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uy + uEPy)); + // f4 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_y); + + // q = 5 + dist[nr6] = + f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + 4.0 * (uz + uEPz)); + // f5 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 + factor_z); + + // q = 6 + dist[nr5] = + f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - 4.0 * (uz + uEPz)); + // f6 * (1.0 - rlx) + rlx * 0.125 * Ci * (1.0 - factor_z); + + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Ion_v0( + double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, + double *FluxElectrical, double *Velocity, double *ElectricField, double Di, + int zi, double rlx, double Vt, int start, int finish, int Np) { + int n; + double Ci; + double ux, uy, uz; + double uEPx, uEPy, uEPz; //electrochemical induced velocity + double Ex, Ey, Ez; //electrical field + double flux_diffusive_x, flux_diffusive_y, flux_diffusive_z; + double f0, f1, f2, f3, f4, f5, f6; + //double X,Y,Z, factor_x, factor_y, factor_z; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s>>(dist, + Den, FluxDiffusive, FluxAdvective, + FluxElectrical, Velocity, + ElectricField, Di, zi, + rlx, Vt, start, finish, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in dvc_ScaLBL_D3Q7_AAeven_Ion_v0: %s \n",hipGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion_v0(int *neighborList, double *dist, + double *Den, double *FluxDiffusive, + double *FluxAdvective, + double *FluxElectrical, double *Velocity, + double *ElectricField, double Di, int zi, + double rlx, double Vt, int start, + int finish, int Np) { + + dvc_ScaLBL_D3Q7_AAodd_Ion_v0<<>>(neighborList, dist, + Den, FluxDiffusive, FluxAdvective, + FluxElectrical, Velocity, + ElectricField, Di, zi, + rlx, Vt, start, + finish, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in dvc_ScaLBL_D3Q7_AAodd_Ion_v0: %s \n",hipGetErrorString(err)); + } +} + + +extern "C" void ScaLBL_D3Q7_AAodd_IonConcentration(int *neighborList, double *dist, double *Den, int start, int finish, int Np){ + + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_AAodd_IonConcentration<<>>(neighborList,dist,Den,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAodd_IonConcentration: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAeven_IonConcentration(double *dist, double *Den, int start, int finish, int Np){ + + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_AAeven_IonConcentration<<>>(dist,Den,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAeven_IonConcentration: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Ion(int *neighborList, double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, + double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_AAodd_Ion<<>>(neighborList,dist,Den,FluxDiffusive,FluxAdvective,FluxElectrical,Velocity,ElectricField,Di,zi,rlx,Vt,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAodd_Ion: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Ion(double *dist, double *Den, double *FluxDiffusive, double *FluxAdvective, double *FluxElectrical, double *Velocity, double *ElectricField, + double Di, int zi, double rlx, double Vt, int start, int finish, int Np){ + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_AAeven_Ion<<>>(dist,Den,FluxDiffusive,FluxAdvective,FluxElectrical,Velocity,ElectricField,Di,zi,rlx,Vt,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAeven_Ion: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_Ion_Init(double *dist, double *Den, double DenInit, int Np){ + + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_Ion_Init<<>>(dist,Den,DenInit,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_Ion_Init: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_Ion_Init_FromFile(double *dist, double *Den, int Np){ + + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_Ion_Init_FromFile<<>>(dist,Den,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_Ion_Init_FromFile: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_Ion_ChargeDensity(double *Den, double *ChargeDensity, double IonValence, int ion_component, int start, int finish, int Np){ + + //cudaProfilerStart(); + dvc_ScaLBL_D3Q7_Ion_ChargeDensity<<>>(Den,ChargeDensity,IonValence,ion_component,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_Ion_ChargeDensity: %s \n",hipGetErrorString(err)); + } + //cudaProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef(int *membrane, int *Map, double *Distance, double *Psi, double *coef, + double Threshold, double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int memLinks, int Nx, int Ny, int Nz, int Np){ + + dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef<<>>(membrane, Map, Distance, Psi, coef, + Threshold, MassFractionIn, MassFractionOut, ThresholdMassFractionIn, ThresholdMassFractionOut, + memLinks, Nx, Ny, Nz, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef: %s \n",hipGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo( + const int Cqx, const int Cqy, int const Cqz, + int *Map, double *Distance, double *Psi, double Threshold, + double MassFractionIn, double MassFractionOut, double ThresholdMassFractionIn, double ThresholdMassFractionOut, + int *d3q7_recvlist, int *d3q7_linkList, double *coef, int start, int nlinks, int count, + const int N, const int Nx, const int Ny, const int Nz) { + + dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo<<>>( + Cqx, Cqy, Cqz, Map, Distance, Psi, Threshold, + MassFractionIn, MassFractionOut, ThresholdMassFractionIn, ThresholdMassFractionOut, + d3q7_recvlist, d3q7_linkList, coef, start, nlinks, count, N, Nx, Ny, Nz); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_AssignLinkCoef_halo: %s \n",hipGetErrorString(err)); + } +} + + +extern "C" void ScaLBL_D3Q7_Membrane_Unpack(int q, + int *d3q7_recvlist, double *recvbuf, int count, + double *dist, int N, double *coef){ + + dvc_ScaLBL_D3Q7_Membrane_Unpack<<>>(q, d3q7_recvlist, recvbuf,count, + dist, N, coef); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_Unpack: %s \n",hipGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q7_Membrane_IonTransport(int *membrane, double *coef, + double *dist, double *Den, int memLinks, int Np){ + + dvc_ScaLBL_D3Q7_Membrane_IonTransport<<>>(membrane, coef, dist, Den, memLinks, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("CUDA error in dvc_ScaLBL_D3Q7_Membrane_IonTransport: %s \n",hipGetErrorString(err)); + } +} diff --git a/hip/MRT.cu b/hip/MRT.hip similarity index 98% rename from hip/MRT.cu rename to hip/MRT.hip index f08bd9b2..e4ef3bfb 100644 --- a/hip/MRT.cu +++ b/hip/MRT.hip @@ -19,8 +19,8 @@ //************************************************************************* #include "hip/hip_runtime.h" -#define NBLOCKS 560 -#define NTHREADS 128 +#define NBLOCKS 1024 +#define NTHREADS 256 __global__ void INITIALIZE(char *ID, double *f_even, double *f_odd, int Nx, int Ny, int Nz) { @@ -122,8 +122,7 @@ __global__ void Compute_VELOCITY(char *ID, double *disteven, double *distodd, do //************************************************************************* __global__ void -__launch_bounds__(512,2) -D3Q19_MRT(char *ID, double *disteven, double *distodd, int Nx, int Ny, int Nz, +__launch_bounds__(256,4) D3Q19_MRT(char *ID, double *disteven, double *distodd, int Nx, int Ny, int Nz, double rlx_setA, double rlx_setB, double Fx, double Fy, double Fz) { diff --git a/hip/MixedGradient.cu b/hip/MixedGradient.hip similarity index 98% rename from hip/MixedGradient.cu rename to hip/MixedGradient.hip index 31518ee5..3c8f705e 100644 --- a/hip/MixedGradient.cu +++ b/hip/MixedGradient.hip @@ -65,13 +65,11 @@ __global__ void dvc_ScaLBL_D3Q19_MixedGradient(int *Map, double *Phi, double *Gr extern "C" void ScaLBL_D3Q19_MixedGradient(int *Map, double *Phi, double *Gradient, int start, int finish, int Np, int Nx, int Ny, int Nz) { - hipProfilerStart(); dvc_ScaLBL_D3Q19_MixedGradient<<>>(Map, Phi, Gradient, start, finish, Np, Nx, Ny, Nz); hipError_t err = hipGetLastError(); if (hipSuccess != err){ printf("hip error in ScaLBL_D3Q19_MixedGradient: %s \n",hipGetErrorString(err)); } - hipProfilerStop(); } diff --git a/hip/Poisson.cu b/hip/Poisson.cu deleted file mode 100644 index de6cc8eb..00000000 --- a/hip/Poisson.cu +++ /dev/null @@ -1,332 +0,0 @@ -#include -#include -#include "hip/hip_runtime.h" - -#define NBLOCKS 1024 -#define NTHREADS 256 - -__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(int *neighborList,int *Map, double *dist, double *Psi, int start, int finish, int Np){ - int n; - double psi;//electric potential - double fq; - int nread; - int idx; - - int S = Np/NBLOCKS/NTHREADS + 1; - for (int s=0; s 10Np => odd part of dist) - f1 = dist[nr1]; // reading the f1 data into register fq - - nr2 = neighborList[n+Np]; // neighbor 1 ( < 10Np => even part of dist) - f2 = dist[nr2]; // reading the f2 data into register fq - - // q=3 - nr3 = neighborList[n+2*Np]; // neighbor 4 - f3 = dist[nr3]; - - // q = 4 - nr4 = neighborList[n+3*Np]; // neighbor 3 - f4 = dist[nr4]; - - // q=5 - nr5 = neighborList[n+4*Np]; - f5 = dist[nr5]; - - // q = 6 - nr6 = neighborList[n+5*Np]; - f6 = dist[nr6]; - - Ex = (f1-f2)*rlx*4.0;//NOTE the unit of electric field here is V/lu - Ey = (f3-f4)*rlx*4.0;//factor 4.0 is D3Q7 lattice speed of sound - Ez = (f5-f6)*rlx*4.0; - ElectricField[n+0*Np] = Ex; - ElectricField[n+1*Np] = Ey; - ElectricField[n+2*Np] = Ez; - - // q = 0 - dist[n] = f0*(1.0-rlx) + 0.25*(rlx*psi+rho_e); - - // q = 1 - dist[nr2] = f1*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - - // q = 2 - dist[nr1] = f2*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - - // q = 3 - dist[nr4] = f3*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - - // q = 4 - dist[nr3] = f4*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - - // q = 5 - dist[nr6] = f5*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - - // q = 6 - dist[nr5] = f6*(1.0-rlx) + 0.125*(rlx*psi+rho_e); - //........................................................................ - } - } -} - -__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson(int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ - - int n; - double psi;//electric potential - double Ex,Ey,Ez;//electric field - double rho_e;//local charge density - double f0,f1,f2,f3,f4,f5,f6; - double rlx=1.0/tau; - int idx; - - int S = Np/NBLOCKS/NTHREADS + 1; - for (int s=0; s>>(neighborList,Map,dist,Psi,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(int *Map, double *dist, double *Psi, int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential<<>>(Map,dist,Psi,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAodd_Poisson(int *neighborList, int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAodd_Poisson<<>>(neighborList,Map,dist,Den_charge,Psi,ElectricField,tau,epsilon_LB,UseSlippingVelBC,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAodd_Poisson: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_AAeven_Poisson(int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_AAeven_Poisson<<>>(Map,dist,Den_charge,Psi,ElectricField,tau,epsilon_LB,UseSlippingVelBC,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_AAeven_Poisson: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} - -extern "C" void ScaLBL_D3Q7_Poisson_Init(int *Map, double *dist, double *Psi, int start, int finish, int Np){ - - //cudaProfilerStart(); - dvc_ScaLBL_D3Q7_Poisson_Init<<>>(Map,dist,Psi,start,finish,Np); - - hipError_t err = hipGetLastError(); - if (hipSuccess != err){ - printf("hip error in ScaLBL_D3Q7_Poisson_Init: %s \n",hipGetErrorString(err)); - } - //cudaProfilerStop(); -} diff --git a/hip/Poisson.hip b/hip/Poisson.hip new file mode 100644 index 00000000..2808ffa1 --- /dev/null +++ b/hip/Poisson.hip @@ -0,0 +1,705 @@ +#include +#include +#include "hip/hip_runtime.h" + +#define NBLOCKS 1024 +#define NTHREADS 256 + +__global__ void dvc_ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(int *neighborList,int *Map, double *dist, double *Psi, int start, int finish, int Np){ + int n; + double psi;//electric potential + double fq; + int nread; + int idx; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + + nr2 = neighborList[n+Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + + // q=3 + nr3 = neighborList[n+2*Np]; // neighbor 4 + f3 = dist[nr3]; + + // q = 4 + nr4 = neighborList[n+3*Np]; // neighbor 3 + f4 = dist[nr4]; + + // q=5 + nr5 = neighborList[n+4*Np]; + f5 = dist[nr5]; + + // q = 6 + nr6 = neighborList[n+5*Np]; + f6 = dist[nr6]; + + Ex = (f1-f2)*rlx*4.0;//NOTE the unit of electric field here is V/lu + Ey = (f3-f4)*rlx*4.0;//factor 4.0 is D3Q7 lattice speed of sound + Ez = (f5-f6)*rlx*4.0; + ElectricField[n+0*Np] = Ex; + ElectricField[n+1*Np] = Ey; + ElectricField[n+2*Np] = Ez; + + // q = 0 + dist[n] = f0*(1.0-rlx) + 0.25*(rlx*psi+rho_e); + + // q = 1 + dist[nr2] = f1*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + + // q = 2 + dist[nr1] = f2*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + + // q = 3 + dist[nr4] = f3*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + + // q = 4 + dist[nr3] = f4*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + + // q = 5 + dist[nr6] = f5*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + + // q = 6 + dist[nr5] = f6*(1.0-rlx) + 0.125*(rlx*psi+rho_e); + //........................................................................ + } + } +} + +__global__ void dvc_ScaLBL_D3Q7_AAeven_Poisson(int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ + + int n; + double psi;//electric potential + double Ex,Ey,Ez;//electric field + double rho_e;//local charge density + double f0,f1,f2,f3,f4,f5,f6; + double rlx=1.0/tau; + int idx; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s 10Np => odd part of dist) + f1 = dist[nr1]; // reading the f1 data into register fq + + nr2 = neighborList[n + Np]; // neighbor 1 ( < 10Np => even part of dist) + f2 = dist[nr2]; // reading the f2 data into register fq + + // q=3 + nr3 = neighborList[n + 2 * Np]; // neighbor 4 + f3 = dist[nr3]; + + // q = 4 + nr4 = neighborList[n + 3 * Np]; // neighbor 3 + f4 = dist[nr4]; + + // q=5 + nr5 = neighborList[n + 4 * Np]; + f5 = dist[nr5]; + + // q = 6 + nr6 = neighborList[n + 5 * Np]; + f6 = dist[nr6]; + + // q=7 + nr7 = neighborList[n + 6 * Np]; + f7 = dist[nr7]; + + // q = 8 + nr8 = neighborList[n + 7 * Np]; + f8 = dist[nr8]; + + // q=9 + nr9 = neighborList[n + 8 * Np]; + f9 = dist[nr9]; + + // q = 10 + nr10 = neighborList[n + 9 * Np]; + f10 = dist[nr10]; + + // q=11 + nr11 = neighborList[n + 10 * Np]; + f11 = dist[nr11]; + + // q=12 + nr12 = neighborList[n + 11 * Np]; + f12 = dist[nr12]; + + // q=13 + nr13 = neighborList[n + 12 * Np]; + f13 = dist[nr13]; + + // q=14 + nr14 = neighborList[n + 13 * Np]; + f14 = dist[nr14]; + + // q=15 + nr15 = neighborList[n + 14 * Np]; + f15 = dist[nr15]; + + // q=16 + nr16 = neighborList[n + 15 * Np]; + f16 = dist[nr16]; + + // q=17 + //fq = dist[18*Np+n]; + nr17 = neighborList[n + 16 * Np]; + f17 = dist[nr17]; + + // q=18 + nr18 = neighborList[n + 17 * Np]; + f18 = dist[nr18]; + + sum_q = f1+f2+f3+f4+f5+f6+f7+f8+f9+f10+f11+f12+f13+f14+f15+f16+f17+f18; + error = 8.0*(sum_q - f0) + rho_e; + + psi = 2.0*(f0*(1.0 - rlx) + rlx*(sum_q + 0.125*rho_e)); + + idx = Map[n]; + Psi[idx] = psi; + + Ex = (f1 - f2 + 0.5*(f7 - f8 + f9 - f10 + f11 - f12 + f13 - f14))*4.0; //NOTE the unit of electric field here is V/lu + Ey = (f3 - f4 + 0.5*(f7 - f8 - f9 + f10 + f15 - f16 + f17 - f18))*4.0; + Ez = (f5 - f6 + 0.5*(f11 - f12 - f13 + f14 + f15 - f16 - f17 + f18))*4.0; + ElectricField[n + 0 * Np] = Ex; + ElectricField[n + 1 * Np] = Ey; + ElectricField[n + 2 * Np] = Ez; + + // q = 0 + dist[n] = W0*psi; //f0 * (1.0 - rlx) - (1.0-0.5*rlx)*W0*rho_e; + + // q = 1 + dist[nr2] = W1*psi; //f1 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 2 + dist[nr1] = W1*psi; //f2 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 3 + dist[nr4] = W1*psi; //f3 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 4 + dist[nr3] = W1*psi; //f4 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 5 + dist[nr6] = W1*psi; //f5 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + + // q = 6 + dist[nr5] = W1*psi; //f6 * (1.0 - rlx) +W1* (rlx * psi) - (1.0-0.5*rlx)*0.05555555555555555*rho_e; + //........................................................................ + + // q = 7 + dist[nr8] = W2*psi; //f7 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 8 + dist[nr7] = W2*psi; //f8 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 9 + dist[nr10] = W2*psi; //f9 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 10 + dist[nr9] = W2*psi; //f10 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 11 + dist[nr12] = W2*psi; //f11 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 12 + dist[nr11] = W2*psi; //f12 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 13 + dist[nr14] = W2*psi; //f13 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q= 14 + dist[nr13] = W2*psi; //f14 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 15 + dist[nr16] = W2*psi; //f15 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 16 + dist[nr15] = W2*psi; //f16 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 17 + dist[nr18] = W2*psi; //f17 * (1.0 - rlx) +W2* (rlx * psi) - (1.0-0.5*rlx)*0.02777777777777778*rho_e; + + // q = 18 + dist[nr17] = W2*psi; + } + } +} + +__global__ void dvc_ScaLBL_D3Q19_AAeven_Poisson(int *Map, double *dist, + double *Den_charge, double *Psi, + double *ElectricField, double *Error, double tau, + double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + int n; + double psi; //electric potential + double Ex, Ey, Ez; //electric field + double rho_e; //local charge density + double f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, + f16, f17, f18; + double error,sum_q; + double rlx = 1.0 / tau; + int idx; + double W0 = 0.5; + double W1 = 1.0/24.0; + double W2 = 1.0/48.0; + + int S = Np/NBLOCKS/NTHREADS + 1; + for (int s=0; s>>(neighborList, Map, + dist, Den_charge, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, start, finish, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("Hip error in dvc_ScaLBL_D3Q19_AAodd_Poisson: %s \n",hipGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q19_AAeven_Poisson(int *Map, double *dist, + double *Den_charge, double *Psi, + double *ElectricField, double *Error, double tau, + double epsilon_LB, bool UseSlippingVelBC, + int start, int finish, int Np) { + + hipFuncSetCacheConfig( (void*) dvc_ScaLBL_D3Q19_AAeven_Poisson, hipFuncCachePreferL1); + + + dvc_ScaLBL_D3Q19_AAeven_Poisson<<>>( Map, dist, Den_charge, Psi, + ElectricField, Error, tau, epsilon_LB, UseSlippingVelBC, start, finish, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("Hip error in dvc_ScaLBL_D3Q19_AAeven_Poisson: %s \n",hipGetErrorString(err)); + } +} + +extern "C" void ScaLBL_D3Q19_Poisson_Init(int *Map, double *dist, double *Psi, + int start, int finish, int Np){ + //hipProfilerStart(); + + dvc_ScaLBL_D3Q19_Poisson_Init<<>>(Map, dist, Psi, start, finish, Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("Hip error in ScaLBL_D3Q19_Poisson_Init: %s \n",hipGetErrorString(err)); + } + +} + + +extern "C" void ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(int *neighborList,int *Map, double *dist, double *Psi, int start, int finish, int Np){ + + //hipProfilerStart(); + dvc_ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential<<>>(neighborList,Map,dist,Psi,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential: %s \n",hipGetErrorString(err)); + } + //hipProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(int *Map, double *dist, double *Psi, int start, int finish, int Np){ + + //hipProfilerStart(); + dvc_ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential<<>>(Map,dist,Psi,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential: %s \n",hipGetErrorString(err)); + } + //hipProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAodd_Poisson(int *neighborList, int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ + + //hipProfilerStart(); + dvc_ScaLBL_D3Q7_AAodd_Poisson<<>>(neighborList,Map,dist,Den_charge,Psi,ElectricField,tau,epsilon_LB,UseSlippingVelBC,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAodd_Poisson: %s \n",hipGetErrorString(err)); + } + //hipProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_AAeven_Poisson(int *Map, double *dist, double *Den_charge, double *Psi, double *ElectricField, double tau, double epsilon_LB,bool UseSlippingVelBC,int start, int finish, int Np){ + + //hipProfilerStart(); + dvc_ScaLBL_D3Q7_AAeven_Poisson<<>>(Map,dist,Den_charge,Psi,ElectricField,tau,epsilon_LB,UseSlippingVelBC,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_AAeven_Poisson: %s \n",hipGetErrorString(err)); + } + //hipProfilerStop(); +} + +extern "C" void ScaLBL_D3Q7_Poisson_Init(int *Map, double *dist, double *Psi, int start, int finish, int Np){ + + //hipProfilerStart(); + dvc_ScaLBL_D3Q7_Poisson_Init<<>>(Map,dist,Psi,start,finish,Np); + + hipError_t err = hipGetLastError(); + if (hipSuccess != err){ + printf("hip error in ScaLBL_D3Q7_Poisson_Init: %s \n",hipGetErrorString(err)); + } + //hipProfilerStop(); +} diff --git a/hip/Stokes.cu b/hip/Stokes.hip similarity index 100% rename from hip/Stokes.cu rename to hip/Stokes.hip diff --git a/hip/dfh.cu b/hip/dfh.hip similarity index 100% rename from hip/dfh.cu rename to hip/dfh.hip diff --git a/models/BGKModel.cpp b/models/BGKModel.cpp new file mode 100644 index 00000000..d2aa5216 --- /dev/null +++ b/models/BGKModel.cpp @@ -0,0 +1,538 @@ +/* + Copyright 2013--2018 James E. McClure, Virginia Polytechnic & State University + Copyright Equnior ASA + + This file is part of the Open Porous Media project (OPM). + OPM is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + OPM is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + You should have received a copy of the GNU General Public License + along with OPM. If not, see . +*/ +/* + * Multi-relaxation time LBM Model + */ +#include "models/BGKModel.h" +#include "analysis/distance.h" +#include "common/ReadMicroCT.h" +ScaLBL_BGKModel::ScaLBL_BGKModel(int RANK, int NP, const Utilities::MPI &COMM) + : rank(RANK), nprocs(NP), Restart(0), timestep(0), timestepMax(0), tau(0), + Fx(0), Fy(0), Fz(0), flux(0), din(0), dout(0), mu(0), Nx(0), Ny(0), Nz(0), + N(0), Np(0), nprocx(0), nprocy(0), nprocz(0), BoundaryCondition(0), Lx(0), + Ly(0), Lz(0), comm(COMM) {} +ScaLBL_BGKModel::~ScaLBL_BGKModel() {} + +void ScaLBL_BGKModel::ReadParams(string filename) { + // read the input database + db = std::make_shared(filename); + domain_db = db->getDatabase("Domain"); + mrt_db = db->getDatabase("BGK"); + vis_db = db->getDatabase("Visualization"); + + tau = 1.0; + timestepMax = 100000; + ANALYSIS_INTERVAL = 1000; + tolerance = 1.0e-8; + Fx = Fy = 0.0; + Fz = 1.0e-5; + dout = 1.0; + din = 1.0; + + // Color Model parameters + if (mrt_db->keyExists("timestepMax")) { + timestepMax = mrt_db->getScalar("timestepMax"); + } + if (mrt_db->keyExists("analysis_interval")) { + ANALYSIS_INTERVAL = mrt_db->getScalar("analysis_interval"); + } + if (mrt_db->keyExists("tolerance")) { + tolerance = mrt_db->getScalar("tolerance"); + } + if (mrt_db->keyExists("tau")) { + tau = mrt_db->getScalar("tau"); + } + if (mrt_db->keyExists("F")) { + Fx = mrt_db->getVector("F")[0]; + Fy = mrt_db->getVector("F")[1]; + Fz = mrt_db->getVector("F")[2]; + } + if (mrt_db->keyExists("Restart")) { + Restart = mrt_db->getScalar("Restart"); + } + if (mrt_db->keyExists("din")) { + din = mrt_db->getScalar("din"); + } + if (mrt_db->keyExists("dout")) { + dout = mrt_db->getScalar("dout"); + } + if (mrt_db->keyExists("flux")) { + flux = mrt_db->getScalar("flux"); + } + + // Read domain parameters + if (mrt_db->keyExists("BoundaryCondition")) { + BoundaryCondition = mrt_db->getScalar("BC"); + } else if (domain_db->keyExists("BC")) { + BoundaryCondition = domain_db->getScalar("BC"); + } + + mu = (tau - 0.5) / 3.0; +} +void ScaLBL_BGKModel::SetDomain() { + Dm = std::shared_ptr( + new Domain(domain_db, comm)); // full domain for analysis + Mask = std::shared_ptr( + new Domain(domain_db, comm)); // mask domain removes immobile phases + + // domain parameters + Nx = Dm->Nx; + Ny = Dm->Ny; + Nz = Dm->Nz; + Lx = Dm->Lx; + Ly = Dm->Ly; + Lz = Dm->Lz; + + N = Nx * Ny * Nz; + Distance.resize(Nx, Ny, Nz); + Velocity_x.resize(Nx, Ny, Nz); + Velocity_y.resize(Nx, Ny, Nz); + Velocity_z.resize(Nx, Ny, Nz); + + for (int i = 0; i < Nx * Ny * Nz; i++) + Dm->id[i] = 1; // initialize this way + //Averages = std::shared_ptr ( new TwoPhase(Dm) ); // TwoPhase analysis object + comm.barrier(); + Dm->CommInit(); + comm.barrier(); + + rank = Dm->rank(); + nprocx = Dm->nprocx(); + nprocy = Dm->nprocy(); + nprocz = Dm->nprocz(); +} + +void ScaLBL_BGKModel::ReadInput() { + + sprintf(LocalRankString, "%05d", Dm->rank()); + sprintf(LocalRankFilename, "%s%s", "ID.", LocalRankString); + sprintf(LocalRestartFile, "%s%s", "Restart.", LocalRankString); + + if (domain_db->keyExists("Filename")) { + auto Filename = domain_db->getScalar("Filename"); + Mask->Decomp(Filename); + } else if (domain_db->keyExists("GridFile")) { + // Read the local domain data + auto input_id = readMicroCT(*domain_db, comm); + // Fill the halo (assuming GCW of 1) + array size0 = {(int)input_id.size(0), (int)input_id.size(1), + (int)input_id.size(2)}; + ArraySize size1 = {(size_t)Mask->Nx, (size_t)Mask->Ny, + (size_t)Mask->Nz}; + ASSERT((int)size1[0] == size0[0] + 2 && (int)size1[1] == size0[1] + 2 && + (int)size1[2] == size0[2] + 2); + fillHalo fill(comm, Mask->rank_info, size0, {1, 1, 1}, 0, + 1); + Array id_view; + id_view.viewRaw(size1, Mask->id.data()); + fill.copy(input_id, id_view); + fill.fill(id_view); + } else { + Mask->ReadIDs(); + } + + // Generate the signed distance map + // Initialize the domain and communication + Array id_solid(Nx, Ny, Nz); + // Solve for the position of the solid phase + for (int k = 0; k < Nz; k++) { + for (int j = 0; j < Ny; j++) { + for (int i = 0; i < Nx; i++) { + int n = k * Nx * Ny + j * Nx + i; + // Initialize the solid phase + if (Mask->id[n] > 0) + id_solid(i, j, k) = 1; + else + id_solid(i, j, k) = 0; + } + } + } + // Initialize the signed distance function + for (int k = 0; k < Nz; k++) { + for (int j = 0; j < Ny; j++) { + for (int i = 0; i < Nx; i++) { + // Initialize distance to +/- 1 + Distance(i, j, k) = 2.0 * double(id_solid(i, j, k)) - 1.0; + } + } + } + // MeanFilter(Averages->SDs); + if (rank == 0) + printf("Initialized solid phase -- Converting to Signed Distance " + "function \n"); + CalcDist(Distance, id_solid, *Dm); + if (rank == 0) + cout << "Domain set." << endl; +} + +void ScaLBL_BGKModel::Create() { + /* + * This function creates the variables needed to run a LBM + */ + int rank = Mask->rank(); + //......................................................... + // Initialize communication structures in averaging domain + for (int i = 0; i < Nx * Ny * Nz; i++) + Dm->id[i] = Mask->id[i]; + Mask->CommInit(); + Np = Mask->PoreCount(); + //........................................................................... + if (rank == 0) + printf("Create ScaLBL_Communicator \n"); + // Create a communicator for the device (will use optimized layout) + // ScaLBL_Communicator ScaLBL_Comm(Mask); // original + ScaLBL_Comm = + std::shared_ptr(new ScaLBL_Communicator(Mask)); + + int Npad = (Np / 16 + 2) * 16; + if (rank == 0) + printf("Set up memory efficient layout \n"); + Map.resize(Nx, Ny, Nz); + Map.fill(-2); + auto neighborList = new int[18 * Npad]; + Np = ScaLBL_Comm->MemoryOptimizedLayoutAA(Map, neighborList, + Mask->id.data(), Np, 1); + comm.barrier(); + + //........................................................................... + // MAIN VARIABLES ALLOCATED HERE + //........................................................................... + // LBM variables + if (rank == 0) + printf("Allocating distributions \n"); + //......................device distributions................................. + int dist_mem_size = Np * sizeof(double); + int neighborSize = 18 * (Np * sizeof(int)); + //........................................................................... + ScaLBL_AllocateDeviceMemory((void **)&NeighborList, neighborSize); + ScaLBL_AllocateDeviceMemory((void **)&fq, 19 * dist_mem_size); + ScaLBL_AllocateDeviceMemory((void **)&Pressure, sizeof(double) * Np); + ScaLBL_AllocateDeviceMemory((void **)&Velocity, 3 * sizeof(double) * Np); + //........................................................................... + // Update GPU data structures + if (rank == 0) + printf("Setting up device map and neighbor list \n"); + // copy the neighbor list + ScaLBL_CopyToDevice(NeighborList, neighborList, neighborSize); + comm.barrier(); + double MLUPS = ScaLBL_Comm->GetPerformance(NeighborList, fq, Np); + printf(" MLPUS=%f from rank %i\n", MLUPS, rank); +} + +void ScaLBL_BGKModel::Initialize() { + /* + * This function initializes model + */ + if (rank == 0) + printf("Initializing distributions \n"); + ScaLBL_D3Q19_Init(fq, Np); +} + +void ScaLBL_BGKModel::Run() { + double rlx = 1.0 / tau; + + Minkowski Morphology(Mask); + + if (rank == 0) { + bool WriteHeader = false; + FILE *log_file = fopen("Permeability.csv", "r"); + if (log_file != NULL) + fclose(log_file); + else + WriteHeader = true; + + if (WriteHeader) { + log_file = fopen("Permeability.csv", "a+"); + fprintf(log_file, "time Fx Fy Fz mu Vs As Js Xs vx vy vz k\n"); + fclose(log_file); + } + } + + //.......create and start timer............ + ScaLBL_DeviceBarrier(); + comm.barrier(); + if (rank == 0) + printf("Beginning AA timesteps, timestepMax = %i \n", timestepMax); + if (rank == 0) + printf("********************************************************\n"); + timestep = 0; + double error = 1.0; + double flow_rate_previous = 0.0; + auto t1 = std::chrono::system_clock::now(); + while (timestep < timestepMax && error > tolerance) { + //************************************************************************/ + /* timestep++; + ScaLBL_Comm.SendD3Q19AA(dist); //READ FROM NORMAL + ScaLBL_D3Q19_AAodd_BGK(NeighborList, dist, ScaLBL_Comm.first_interior, ScaLBL_Comm.last_interior, Np, rlx, Fx, Fy, Fz); + ScaLBL_Comm.RecvD3Q19AA(dist); //WRITE INTO OPPOSITE + ScaLBL_D3Q19_AAodd_BGK(NeighborList, dist, 0, ScaLBL_Comm.next, Np, rlx, Fx, Fy, Fz); + ScaLBL_DeviceBarrier(); MPI_Barrier(comm); + + timestep++; + ScaLBL_Comm.SendD3Q19AA(dist); //READ FORM NORMAL + ScaLBL_D3Q19_AAeven_BGK(dist, ScaLBL_Comm.first_interior, ScaLBL_Comm.last_interior, Np, rlx, Fx, Fy, Fz); + ScaLBL_Comm.RecvD3Q19AA(dist); //WRITE INTO OPPOSITE + ScaLBL_D3Q19_AAeven_BGK(dist, 0, ScaLBL_Comm.next, Np, rlx, Fx, Fy, Fz); + ScaLBL_DeviceBarrier(); MPI_Barrie + */ + timestep++; + ScaLBL_Comm->SendD3Q19AA(fq); //READ FROM NORMAL + ScaLBL_D3Q19_AAodd_BGK(NeighborList, fq, ScaLBL_Comm->FirstInterior(), + ScaLBL_Comm->LastInterior(), Np, rlx, Fx, Fy, Fz); + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + // Set boundary conditions + if (BoundaryCondition == 3) { + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, din, timestep); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, dout, timestep); + } else if (BoundaryCondition == 4) { + din = + ScaLBL_Comm->D3Q19_Flux_BC_z(NeighborList, fq, flux, timestep); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, dout, timestep); + } else if (BoundaryCondition == 5) { + ScaLBL_Comm->D3Q19_Reflection_BC_z(fq); + ScaLBL_Comm->D3Q19_Reflection_BC_Z(fq); + } + ScaLBL_D3Q19_AAodd_BGK(NeighborList, fq, 0, ScaLBL_Comm->LastExterior(), + Np, rlx, Fx, Fy, Fz); + ScaLBL_DeviceBarrier(); + comm.barrier(); + timestep++; + ScaLBL_Comm->SendD3Q19AA(fq); //READ FORM NORMAL + ScaLBL_D3Q19_AAeven_BGK(fq, ScaLBL_Comm->FirstInterior(), + ScaLBL_Comm->LastInterior(), Np, rlx, Fx, Fy, Fz); + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + // Set boundary conditions + if (BoundaryCondition == 3) { + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, din, timestep); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, dout, timestep); + } else if (BoundaryCondition == 4) { + din = + ScaLBL_Comm->D3Q19_Flux_BC_z(NeighborList, fq, flux, timestep); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, dout, timestep); + } else if (BoundaryCondition == 5) { + ScaLBL_Comm->D3Q19_Reflection_BC_z(fq); + ScaLBL_Comm->D3Q19_Reflection_BC_Z(fq); + } + ScaLBL_D3Q19_AAeven_BGK(fq, 0, ScaLBL_Comm->LastExterior(), Np, + rlx, Fx, Fy, Fz); + ScaLBL_DeviceBarrier(); + comm.barrier(); + //************************************************************************/ + + if (timestep % ANALYSIS_INTERVAL == 0) { + ScaLBL_D3Q19_Momentum(fq, Velocity, Np); + ScaLBL_DeviceBarrier(); + comm.barrier(); + ScaLBL_Comm->RegularLayout(Map, &Velocity[0], Velocity_x); + ScaLBL_Comm->RegularLayout(Map, &Velocity[Np], Velocity_y); + ScaLBL_Comm->RegularLayout(Map, &Velocity[2 * Np], Velocity_z); + + double count_loc = 0; + double count; + double vax, vay, vaz; + double vax_loc, vay_loc, vaz_loc; + vax_loc = vay_loc = vaz_loc = 0.f; + for (int k = 1; k < Nz - 1; k++) { + for (int j = 1; j < Ny - 1; j++) { + for (int i = 1; i < Nx - 1; i++) { + if (Distance(i, j, k) > 0) { + vax_loc += Velocity_x(i, j, k); + vay_loc += Velocity_y(i, j, k); + vaz_loc += Velocity_z(i, j, k); + count_loc += 1.0; + } + } + } + } + vax = Dm->Comm.sumReduce(vax_loc); + vay = Dm->Comm.sumReduce(vay_loc); + vaz = Dm->Comm.sumReduce(vaz_loc); + count = Dm->Comm.sumReduce(count_loc); + + vax /= count; + vay /= count; + vaz /= count; + + double force_mag = sqrt(Fx * Fx + Fy * Fy + Fz * Fz); + double dir_x = Fx / force_mag; + double dir_y = Fy / force_mag; + double dir_z = Fz / force_mag; + if (force_mag == 0.0) { + // default to z direction + dir_x = 0.0; + dir_y = 0.0; + dir_z = 1.0; + force_mag = 1.0; + } + double flow_rate = (vax * dir_x + vay * dir_y + vaz * dir_z); + + error = fabs(flow_rate - flow_rate_previous) / fabs(flow_rate); + flow_rate_previous = flow_rate; + + //if (rank==0) printf("Computing Minkowski functionals \n"); + Morphology.ComputeScalar(Distance, 0.f); + //Morphology.PrintAll(); + double mu = (tau - 0.5) / 3.f; + double Vs = Morphology.V(); + double As = Morphology.A(); + double Hs = Morphology.H(); + double Xs = Morphology.X(); + Vs = Dm->Comm.sumReduce(Vs); + As = Dm->Comm.sumReduce(As); + Hs = Dm->Comm.sumReduce(Hs); + Xs = Dm->Comm.sumReduce(Xs); + + double h = Dm->voxel_length; + double absperm = + h * h * mu * Mask->Porosity() * flow_rate / force_mag; + if (rank == 0) { + printf(" %f\n", absperm); + FILE *log_file = fopen("Permeability.csv", "a"); + fprintf(log_file, + "%i %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g " + "%.8g %.8g\n", + timestep, Fx, Fy, Fz, mu, h * h * h * Vs, h * h * As, + h * Hs, Xs, vax, vay, vaz, absperm); + fclose(log_file); + } + } + } + //************************************************************************/ + if (rank == 0) + printf("---------------------------------------------------------------" + "----\n"); + // Compute the walltime per timestep + auto t2 = std::chrono::system_clock::now(); + double cputime = std::chrono::duration(t2 - t1).count() / timestep; + // Performance obtained from each node + double MLUPS = double(Np) / cputime / 1000000; + + if (rank == 0) + printf("********************************************************\n"); + if (rank == 0) + printf("CPU time = %f \n", cputime); + if (rank == 0) + printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); + MLUPS *= nprocs; + if (rank == 0) + printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); + if (rank == 0) + printf("********************************************************\n"); +} + +void ScaLBL_BGKModel::VelocityField() { + + auto format = vis_db->getWithDefault("format", "silo"); + + /* memcpy(Morphology.SDn.data(), Distance.data(), Nx*Ny*Nz*sizeof(double)); + Morphology.Initialize(); + Morphology.UpdateMeshValues(); + Morphology.ComputeLocal(); + Morphology.Reduce(); + + double count_loc=0; + double count; + double vax,vay,vaz; + double vax_loc,vay_loc,vaz_loc; + vax_loc = vay_loc = vaz_loc = 0.f; + for (int n=0; nLastExterior(); n++){ + vax_loc += VELOCITY[n]; + vay_loc += VELOCITY[Np+n]; + vaz_loc += VELOCITY[2*Np+n]; + count_loc+=1.0; + } + + for (int n=ScaLBL_Comm->FirstInterior(); nLastInterior(); n++){ + vax_loc += VELOCITY[n]; + vay_loc += VELOCITY[Np+n]; + vaz_loc += VELOCITY[2*Np+n]; + count_loc+=1.0; + } + MPI_Allreduce(&vax_loc,&vax,1,MPI_DOUBLE,MPI_SUM,Mask->Comm); + MPI_Allreduce(&vay_loc,&vay,1,MPI_DOUBLE,MPI_SUM,Mask->Comm); + MPI_Allreduce(&vaz_loc,&vaz,1,MPI_DOUBLE,MPI_SUM,Mask->Comm); + MPI_Allreduce(&count_loc,&count,1,MPI_DOUBLE,MPI_SUM,Mask->Comm); + + vax /= count; + vay /= count; + vaz /= count; + + double mu = (tau-0.5)/3.f; + if (rank==0) printf("Fx Fy Fz mu Vs As Js Xs vx vy vz\n"); + if (rank==0) printf("%.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g %.8g\n",Fx, Fy, Fz, mu, + Morphology.V(),Morphology.A(),Morphology.J(),Morphology.X(),vax,vay,vaz); + */ + vis_db = db->getDatabase("Visualization"); + if (vis_db->getWithDefault("write_silo", false)) { + + std::vector visData; + fillHalo fillData(Dm->Comm, Dm->rank_info, + {Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2}, + {1, 1, 1}, 0, 1); + + auto VxVar = std::make_shared(); + auto VyVar = std::make_shared(); + auto VzVar = std::make_shared(); + auto SignDistVar = std::make_shared(); + + IO::initialize("", format, "false"); + // Create the MeshDataStruct + visData.resize(1); + visData[0].meshName = "domain"; + visData[0].mesh = std::make_shared( + Dm->rank_info, Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2, Dm->Lx, Dm->Ly, + Dm->Lz); + SignDistVar->name = "SignDist"; + SignDistVar->type = IO::VariableType::VolumeVariable; + SignDistVar->dim = 1; + SignDistVar->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(SignDistVar); + + VxVar->name = "Velocity_x"; + VxVar->type = IO::VariableType::VolumeVariable; + VxVar->dim = 1; + VxVar->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(VxVar); + VyVar->name = "Velocity_y"; + VyVar->type = IO::VariableType::VolumeVariable; + VyVar->dim = 1; + VyVar->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(VyVar); + VzVar->name = "Velocity_z"; + VzVar->type = IO::VariableType::VolumeVariable; + VzVar->dim = 1; + VzVar->data.resize(Dm->Nx - 2, Dm->Ny - 2, Dm->Nz - 2); + visData[0].vars.push_back(VzVar); + + Array &SignData = visData[0].vars[0]->data; + Array &VelxData = visData[0].vars[1]->data; + Array &VelyData = visData[0].vars[2]->data; + Array &VelzData = visData[0].vars[3]->data; + + ASSERT(visData[0].vars[0]->name == "SignDist"); + ASSERT(visData[0].vars[1]->name == "Velocity_x"); + ASSERT(visData[0].vars[2]->name == "Velocity_y"); + ASSERT(visData[0].vars[3]->name == "Velocity_z"); + + fillData.copy(Distance, SignData); + fillData.copy(Velocity_x, VelxData); + fillData.copy(Velocity_y, VelyData); + fillData.copy(Velocity_z, VelzData); + + IO::writeData(timestep, visData, Dm->Comm); + } +} diff --git a/models/BGKModel.h b/models/BGKModel.h new file mode 100644 index 00000000..612f7a72 --- /dev/null +++ b/models/BGKModel.h @@ -0,0 +1,94 @@ +/* + Copyright 2013--2018 James E. McClure, Virginia Polytechnic & State University + Copyright Equnior ASA + + This file is part of the Open Porous Media project (OPM). + OPM is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + OPM is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + You should have received a copy of the GNU General Public License + along with OPM. If not, see . +*/ +/* + * Multi-relaxation time LBM Model + */ +#include +#include +#include +#include +#include +#include +#include + +#include "common/ScaLBL.h" +#include "common/Communication.h" +#include "common/MPI.h" +#include "analysis/Minkowski.h" +#include "ProfilerApp.h" + +class ScaLBL_BGKModel { +public: + ScaLBL_BGKModel(int RANK, int NP, const Utilities::MPI &COMM); + ~ScaLBL_BGKModel(); + + // functions in they should be run + void ReadParams(string filename); + void ReadParams(std::shared_ptr db0); + void SetDomain(); + void ReadInput(); + void Create(); + void Initialize(); + void Run(); + void VelocityField(); + + bool Restart, pBC; + int timestep, timestepMax; + int ANALYSIS_INTERVAL; + int BoundaryCondition; + double tau, mu; + double Fx, Fy, Fz, flux; + double din, dout; + double tolerance; + + int Nx, Ny, Nz, N, Np; + int rank, nprocx, nprocy, nprocz, nprocs; + double Lx, Ly, Lz; + + std::shared_ptr Dm; // this domain is for analysis + std::shared_ptr Mask; // this domain is for lbm + std::shared_ptr ScaLBL_Comm; + // input database + std::shared_ptr db; + std::shared_ptr domain_db; + std::shared_ptr mrt_db; + std::shared_ptr vis_db; + + IntArray Map; + DoubleArray Distance; + int *NeighborList; + double *fq; + double *Velocity; + double *Pressure; + + //Minkowski Morphology; + + DoubleArray Velocity_x; + DoubleArray Velocity_y; + DoubleArray Velocity_z; + +private: + Utilities::MPI comm; + + // filenames + char LocalRankString[8]; + char LocalRankFilename[40]; + char LocalRestartFile[40]; + + //int rank,nprocs; + void LoadParams(std::shared_ptr db0); +}; diff --git a/models/ColorModel.cpp b/models/ColorModel.cpp index 840fada0..ae3c9fe7 100644 --- a/models/ColorModel.cpp +++ b/models/ColorModel.cpp @@ -115,8 +115,7 @@ void ScaLBL_ColorModel::ReadParams(string filename) { inletB = 0.f; outletA = 0.f; outletB = 1.f; - - + BoundaryCondition = 0; if (color_db->keyExists("BC")) { BoundaryCondition = color_db->getScalar("BC"); @@ -388,6 +387,10 @@ void ScaLBL_ColorModel::AssignComponentLabels(double *phase) { AFFINITY, volume_fraction); } } + + // clean up + delete [] label_count; + delete [] label_count_global; } void ScaLBL_ColorModel::Create() { @@ -483,12 +486,22 @@ void ScaLBL_ColorModel::Create() { // copy the neighbor list ScaLBL_CopyToDevice(NeighborList, neighborList, neighborSize); + ScaLBL_Comm->Barrier(); delete[] neighborList; + // initialize phi based on PhaseLabel (include solid component labels) double *PhaseLabel; - PhaseLabel = new double[N]; + PhaseLabel = new double[Nx*Ny*Nz]; + ScaLBL_Comm->Barrier(); + AssignComponentLabels(PhaseLabel); - ScaLBL_CopyToDevice(Phi, PhaseLabel, N * sizeof(double)); + ScaLBL_Comm->Barrier(); + + ScaLBL_CopyToDevice(Phi, PhaseLabel, Nx*Ny*Nz * sizeof(double)); + ScaLBL_Comm->Barrier(); + + if (rank == 0) + printf("Model created \n"); delete[] PhaseLabel; } @@ -815,28 +828,28 @@ double ScaLBL_ColorModel::Run(int returntime) { } if (TRIGGER_FORCE_RESCALE) { - RESCALE_FORCE = false; - TRIGGER_FORCE_RESCALE = false; - double RESCALE_FORCE_FACTOR = capillary_number / Ca; - if (RESCALE_FORCE_FACTOR > 2.0) - RESCALE_FORCE_FACTOR = 2.0; - if (RESCALE_FORCE_FACTOR < 0.5) - RESCALE_FORCE_FACTOR = 0.5; - Fx *= RESCALE_FORCE_FACTOR; - Fy *= RESCALE_FORCE_FACTOR; - Fz *= RESCALE_FORCE_FACTOR; - force_mag = sqrt(Fx * Fx + Fy * Fy + Fz * Fz); - if (force_mag > 1e-3) { - Fx *= 1e-3 / force_mag; // impose ceiling for stability - Fy *= 1e-3 / force_mag; - Fz *= 1e-3 / force_mag; - } - if (rank == 0) - printf(" -- adjust force by factor %f \n ", - capillary_number / Ca); - Averages->SetParams(rhoA, rhoB, tauA, tauB, Fx, Fy, Fz, alpha, - beta); - color_db->putVector("F", {Fx, Fy, Fz}); + RESCALE_FORCE = false; + TRIGGER_FORCE_RESCALE = false; + double RESCALE_FORCE_FACTOR = capillary_number / Ca; + if (RESCALE_FORCE_FACTOR > 2.0) + RESCALE_FORCE_FACTOR = 2.0; + if (RESCALE_FORCE_FACTOR < 0.5) + RESCALE_FORCE_FACTOR = 0.5; + Fx *= RESCALE_FORCE_FACTOR; + Fy *= RESCALE_FORCE_FACTOR; + Fz *= RESCALE_FORCE_FACTOR; + force_mag = sqrt(Fx * Fx + Fy * Fy + Fz * Fz); + if (force_mag > 1e-3) { + Fx *= 1e-3 / force_mag; // impose ceiling for stability + Fy *= 1e-3 / force_mag; + Fz *= 1e-3 / force_mag; + } + if (rank == 0) + printf(" -- adjust force by factor %f \n ", + capillary_number / Ca); + Averages->SetParams(rhoA, rhoB, tauA, tauB, Fx, Fy, Fz, alpha, + beta); + color_db->putVector("F", {Fx, Fy, Fz}); } if (isSteady) { Averages->Full(); diff --git a/models/IonModel.cpp b/models/IonModel.cpp index 7213cfce..21f55225 100644 --- a/models/IonModel.cpp +++ b/models/IonModel.cpp @@ -13,7 +13,20 @@ ScaLBL_IonModel::ScaLBL_IonModel(int RANK, int NP, const Utilities::MPI &COMM) nprocy(0), nprocz(0), fluidVelx_dummy(0), fluidVely_dummy(0), fluidVelz_dummy(0), BoundaryConditionInlet(0), BoundaryConditionOutlet(0), BoundaryConditionSolid(0), Lx(0), Ly(0), Lz(0), comm(COMM) {} -ScaLBL_IonModel::~ScaLBL_IonModel() {} + +ScaLBL_IonModel::~ScaLBL_IonModel() { + + ScaLBL_FreeDeviceMemory(NeighborList); + ScaLBL_FreeDeviceMemory(dvcMap); + ScaLBL_FreeDeviceMemory(fq); + ScaLBL_FreeDeviceMemory(Ci); + ScaLBL_FreeDeviceMemory(ChargeDensity); + ScaLBL_FreeDeviceMemory(FluxDiffusive); + ScaLBL_FreeDeviceMemory(FluxAdvective); + ScaLBL_FreeDeviceMemory(FluxElectrical); + ScaLBL_FreeDeviceMemory(IonSolid); + ScaLBL_FreeDeviceMemory(FluidVelocityDummy); +} void ScaLBL_IonModel::ReadParams(string filename, vector &num_iter) { @@ -48,8 +61,13 @@ void ScaLBL_IonModel::ReadParams(string filename, vector &num_iter) { Ex_dummy = 0.0; //for debugging, unit [V/m] Ey_dummy = 0.0; //for debugging, unit [V/m] Ez_dummy = 0.0; //for debugging, unit [V/m] + + sprintf(LocalRankString, "%05d", rank); + sprintf(LocalRestartFile, "%s%s", "Restart.", LocalRankString); //--------------------------------------------------------------------------// + BoundaryConditionSolid = 0; + // Read domain parameters if (domain_db->keyExists("voxel_length")) { //default unit: um/lu h = domain_db->getScalar("voxel_length"); @@ -170,7 +188,7 @@ void ScaLBL_IonModel::ReadParams(string filename, vector &num_iter) { BoundaryConditionSolid = ion_db->getScalar("BC_Solid"); } // Read boundary condition for ion transport - // BC = 0: normal periodic BC + // BC = 0: zero-flux bounce-back BC // BC = 1: fixed ion concentration; unit=[mol/m^3] // BC = 2: fixed ion flux (inward flux); unit=[mol/m^2/sec] BoundaryConditionInlet.push_back(0); @@ -282,7 +300,7 @@ void ScaLBL_IonModel::ReadParams(string filename, vector &num_iter) { void ScaLBL_IonModel::ReadParams(string filename) { //NOTE: the maximum iteration timesteps for ions are left unspecified // it relies on the multiphys controller to compute the max timestep - + USE_MEMBRANE = true; // read the input database db = std::make_shared(filename); domain_db = db->getDatabase("Domain"); @@ -314,13 +332,18 @@ void ScaLBL_IonModel::ReadParams(string filename) { Ex_dummy = 0.0; //for debugging, unit [V/m] Ey_dummy = 0.0; //for debugging, unit [V/m] Ez_dummy = 0.0; //for debugging, unit [V/m] + + sprintf(LocalRankString, "%05d", rank); + sprintf(LocalRestartFile, "%s%s", "Restart.", LocalRankString); //--------------------------------------------------------------------------// // Read domain parameters if (domain_db->keyExists("voxel_length")) { //default unit: um/lu h = domain_db->getScalar("voxel_length"); } - + if (ion_db->keyExists("use_membrane")) { + USE_MEMBRANE = ion_db->getScalar("use_membrane"); + } // LB-Ion Model parameters //if (ion_db->keyExists( "timestepMax" )){ // timestepMax = ion_db->getScalar( "timestepMax" ); @@ -328,6 +351,9 @@ void ScaLBL_IonModel::ReadParams(string filename) { if (ion_db->keyExists("tolerance")) { tolerance = ion_db->getScalar("tolerance"); } + if (ion_db->keyExists("Restart")) { + Restart = ion_db->getScalar("Restart"); + } if (ion_db->keyExists("temperature")) { T = ion_db->getScalar("temperature"); //re-calculate thermal voltage @@ -421,14 +447,113 @@ void ScaLBL_IonModel::ReadParams(string filename) { 1.0e-18); //LB ion concentration has unit [mol/lu^3] } } + + if (USE_MEMBRANE){ + membrane_db = db->getDatabase("Membrane"); + /* get membrane permeability parameters*/ + if (membrane_db->keyExists("MassFractionIn")) { + if (rank == 0) printf(".... Read membrane permeability (MassFractionIn) \n"); + MassFractionIn.clear(); + MassFractionIn = membrane_db->getVector("MassFractionIn"); + if (MassFractionIn.size() != number_ion_species) { + ERROR("Error: number_ion_species and membrane permeability (MassFractionIn) must be " + "the same length! \n"); + } + } + else{ + MassFractionIn.resize(IonConcentration.size()); + for (size_t i = 0; i < IonConcentration.size(); i++) { + MassFractionIn[i] = 0.0; + } + } + if (membrane_db->keyExists("MassFractionOut")) { + if (rank == 0) printf(".... Read membrane permeability (MassFractionOut) \n"); + MassFractionOut.clear(); + MassFractionOut = membrane_db->getVector("MassFractionOut"); + if (MassFractionIn.size() != number_ion_species) { + ERROR("Error: number_ion_species and membrane permeability (MassFractionOut) must be " + "the same length! \n"); + } + } + else{ + MassFractionOut.resize(IonConcentration.size()); + for (size_t i = 0; i < IonConcentration.size(); i++) { + MassFractionOut[i] = 0.0; + } + } + if (membrane_db->keyExists("ThresholdMassFractionIn")) { + if (rank == 0) printf(".... Read membrane permeability (ThresholdMassFractionIn) \n"); + ThresholdMassFractionIn.clear(); + ThresholdMassFractionIn = membrane_db->getVector("ThresholdMassFractionIn"); + if (ThresholdMassFractionIn.size() != number_ion_species) { + ERROR("Error: number_ion_species and membrane permeability (ThresholdMassFractionIn) must be " + "the same length! \n"); + } + } + else{ + ThresholdMassFractionIn.resize(IonConcentration.size()); + for (size_t i = 0; i < IonConcentration.size(); i++) { + ThresholdMassFractionIn[i] = 0.0; + } + } + if (membrane_db->keyExists("ThresholdMassFractionOut")) { + if (rank == 0) printf(".... Read membrane permeability (ThresholdMassFractionOut) \n"); + ThresholdMassFractionOut.clear(); + ThresholdMassFractionOut = membrane_db->getVector("ThresholdMassFractionOut"); + if (ThresholdMassFractionOut.size() != number_ion_species) { + ERROR("Error: number_ion_species and membrane permeability (ThresholdMassFractionOut) must be " + "the same length! \n"); + } + } + else{ + ThresholdMassFractionOut.resize(IonConcentration.size()); + for (size_t i = 0; i < IonConcentration.size(); i++) { + ThresholdMassFractionOut[i] = 0.0; + } + } + if (membrane_db->keyExists("ThresholdVoltage")) { + if (rank == 0) printf(".... Read membrane threshold (ThresholdVoltage) \n"); + ThresholdVoltage.clear(); + ThresholdVoltage = membrane_db->getVector("ThresholdVoltage"); + if (ThresholdVoltage.size() != number_ion_species) { + ERROR("Error: number_ion_species and membrane voltage threshold (ThresholdVoltage) must be " + "the same length! \n"); + } + } + else{ + ThresholdVoltage.resize(IonConcentration.size()); + for (size_t i = 0; i < IonConcentration.size(); i++) { + ThresholdVoltage[i] = 0.0; + } + } + + if (ion_db->keyExists("MembraneIonConcentrationList")) { + if (rank == 0) printf(".... Read MembraneIonConcentrationList \n"); + MembraneIonConcentration.clear(); + MembraneIonConcentration = ion_db->getVector("MembraneIonConcentrationList"); + if (MembraneIonConcentration.size() != number_ion_species) { + ERROR("Error: number_ion_species and MembraneIonConcentrationList must be " + "the same length! \n"); + } + else { + for (size_t i = 0; i < MembraneIonConcentration.size(); i++) { + MembraneIonConcentration[i] = + MembraneIonConcentration[i] * + (h * h * h * + 1.0e-18); //LB ion concentration has unit [mol/lu^3] + } + } + } + + } //Read solid boundary condition specific to Ion model BoundaryConditionSolid = 0; if (ion_db->keyExists("BC_Solid")) { BoundaryConditionSolid = ion_db->getScalar("BC_Solid"); } // Read boundary condition for ion transport - // BC = 0: normal periodic BC + // BC = 0: zero-flux bounce-back BC // BC = 1: fixed ion concentration; unit=[mol/m^3] // BC = 2: fixed ion flux (inward flux); unit=[mol/m^2/sec] BoundaryConditionInlet.push_back(0); @@ -583,6 +708,72 @@ void ScaLBL_IonModel::SetDomain() { nprocz = Dm->nprocz(); } +void ScaLBL_IonModel::SetMembrane() { + + membrane_db = db->getDatabase("Membrane"); + + /* set distance based on labels inside the membrane (all other labels will be outside) */ + auto MembraneLabels = membrane_db->getVector("MembraneLabels"); + + IonMembrane = std::shared_ptr(new Membrane(ScaLBL_Comm, NeighborList, Np)); + size_t NLABELS = MembraneLabels.size(); + signed char LABEL = 0; + double *label_count; + double *label_count_global; + Array membrane_id(Nx,Ny,Nz); + label_count = new double[NLABELS]; + label_count_global = new double[NLABELS]; + // Assign the labels + for (size_t idx = 0; idx < NLABELS; idx++) + label_count[idx] = 0; + /* set the distance to the membrane */ + MembraneDistance.resize(Nx, Ny, Nz); + MembraneDistance.fill(0); + for (int k = 0; k < Nz; k++) { + for (int j = 0; j < Ny; j++) { + for (int i = 0; i < Nx; i++) { + membrane_id(i,j,k) = 1; // default value + LABEL = Dm->id[k*Nx*Ny + j*Nx + i]; + for (size_t m=0; mComm.sumReduce(label_count[m]); + } + if (rank == 0) { + printf(" Membrane labels: %lu \n", MembraneLabels.size()); + for (size_t m=0; mCreate(MembraneDistance, Map); + + // clean up + delete [] label_count; + delete [] label_count_global; +} + void ScaLBL_IonModel::ReadInput() { sprintf(LocalRankString, "%05d", Dm->rank()); @@ -709,6 +900,33 @@ void ScaLBL_IonModel::AssignSolidBoundary(double *ion_solid) { } } +void ScaLBL_IonModel::AssignIonConcentrationMembrane( double *Ci, int ic) { + // double *Ci, const vector MembraneIonConcentration, const vector IonConcentration, int ic) { + + double VALUE = 0.f; + + if (rank == 0){ + printf(".... Set concentration(%i): inside=%.6g [mol/m^3], outside=%.6g [mol/m^3] \n", ic, + MembraneIonConcentration[ic]/(h*h*h*1.0e-18), IonConcentration[ic]/(h*h*h*1.0e-18)); + } + for (int k = 0; k < Nz; k++) { + for (int j = 0; j < Ny; j++) { + for (int i = 0; i < Nx; i++) { + int idx = Map(i, j, k); + if (!(idx < 0)) { + if (MembraneDistance(i,j,k) < 0.0) { + VALUE = MembraneIonConcentration[ic];//* (h * h * h * 1.0e-18); + } else { + VALUE = IonConcentration[ic];//* (h * h * h * 1.0e-18); + + } + Ci[idx] = VALUE; + } + } + } + } +} + void ScaLBL_IonModel::AssignIonConcentration_FromFile( double *Ci, const vector &File_ion, int ic) { double *Ci_host; @@ -764,7 +982,7 @@ void ScaLBL_IonModel::Create() { Map.fill(-2); auto neighborList = new int[18 * Npad]; Np = ScaLBL_Comm->MemoryOptimizedLayoutAA(Map, neighborList, - Mask->id.data(), Np, 1); + Mask->id.data(), Npad, 1); comm.barrier(); //........................................................................... @@ -778,6 +996,7 @@ void ScaLBL_IonModel::Create() { int neighborSize = 18 * (Np * sizeof(int)); //........................................................................... ScaLBL_AllocateDeviceMemory((void **)&NeighborList, neighborSize); + ScaLBL_AllocateDeviceMemory((void **)&dvcMap, sizeof(int) * Np); ScaLBL_AllocateDeviceMemory((void **)&fq, number_ion_species * 7 * dist_mem_size); ScaLBL_AllocateDeviceMemory((void **)&Ci, @@ -794,6 +1013,36 @@ void ScaLBL_IonModel::Create() { if (rank == 0) printf("LB Ion Solver: Setting up device map and neighbor list \n"); // copy the neighbor list + int *TmpMap; + TmpMap = new int[Np]; + for (int k = 1; k < Nz - 1; k++) { + for (int j = 1; j < Ny - 1; j++) { + for (int i = 1; i < Nx - 1; i++) { + int idx = Map(i, j, k); + if (!(idx < 0)) + TmpMap[idx] = k * Nx * Ny + j * Nx + i; + } + } + } + // check that TmpMap is valid + for (int idx = 0; idx < ScaLBL_Comm->LastExterior(); idx++) { + auto n = TmpMap[idx]; + if (n > Nx * Ny * Nz) { + printf("Bad value! idx=%i \n", n); + TmpMap[idx] = Nx * Ny * Nz - 1; + } + } + for (int idx = ScaLBL_Comm->FirstInterior(); + idx < ScaLBL_Comm->LastInterior(); idx++) { + auto n = TmpMap[idx]; + if (n > Nx * Ny * Nz) { + printf("Bad value! idx=%i \n", n); + TmpMap[idx] = Nx * Ny * Nz - 1; + } + } + ScaLBL_CopyToDevice(dvcMap, TmpMap, sizeof(int) * Np); + ScaLBL_Comm->Barrier(); + ScaLBL_CopyToDevice(NeighborList, neighborList, neighborSize); comm.barrier(); @@ -814,6 +1063,13 @@ void ScaLBL_IonModel::Create() { ScaLBL_Comm->Barrier(); delete[] IonSolid_host; } + else { + IonSolid = NULL; + } + + + delete[] TmpMap; + delete[] neighborList; } void ScaLBL_IonModel::Initialize() { @@ -822,10 +1078,26 @@ void ScaLBL_IonModel::Initialize() { */ if (rank == 0) printf("LB Ion Solver: initializing D3Q7 distributions\n"); - if (ion_db->keyExists("IonConcentrationFile")) { + //USE_MEMBRANE = true; + if (USE_MEMBRANE){ + double *Ci_host; + if (rank == 0) + printf(" ...initializing based on membrane list \n"); + Ci_host = new double[number_ion_species * Np]; + for (size_t ic = 0; ic < number_ion_species; ic++) { + AssignIonConcentrationMembrane( &Ci_host[ic * Np], ic); + } + ScaLBL_CopyToDevice(Ci, Ci_host, number_ion_species * sizeof(double) * Np); + comm.barrier(); + for (size_t ic = 0; ic < number_ion_species; ic++) { + ScaLBL_D3Q7_Ion_Init_FromFile(&fq[ic * Np * 7], &Ci[ic * Np], Np); + } + delete[] Ci_host; + } + else if (ion_db->keyExists("IonConcentrationFile")) { //NOTE: "IonConcentrationFile" is a vector, including "file_name, datatype" auto File_ion = ion_db->getVector("IonConcentrationFile"); - if (File_ion.size() == 2 * number_ion_species) { + if (File_ion.size() == 2*number_ion_species) { double *Ci_host; Ci_host = new double[number_ion_species * Np]; for (size_t ic = 0; ic < number_ion_species; ic++) { @@ -844,12 +1116,50 @@ void ScaLBL_IonModel::Initialize() { ERROR("Error: Number of user-input ion concentration files should " "be equal to number of ion species!\n"); } - } else { + } + else { for (size_t ic = 0; ic < number_ion_species; ic++) { ScaLBL_D3Q7_Ion_Init(&fq[ic * Np * 7], &Ci[ic * Np], IonConcentration[ic], Np); } } + /** RESTART **/ + if (Restart == true) { + if (rank == 0) { + printf(" ION MODEL: Reading restart file! \n"); + } + double*cDist; + double *Ci_host; + cDist = new double[7 * number_ion_species * Np]; + Ci_host = new double[number_ion_species * Np]; + + ifstream File(LocalRestartFile, ios::binary); + int idx; + double value,sum; + // Read the distributions + for (size_t ic = 0; ic < number_ion_species; ic++){ + for (int n = 0; n < Np; n++) { + sum = 0.0; + for (int q = 0; q < 7; q++) { + File.read((char *)&value, sizeof(value)); + cDist[ic * 7 * Np + q * Np + n] = value; + sum += value; + } + Ci_host[ic * Np + n] = sum; + } + } + File.close(); + + // Copy the restart data to the GPU + ScaLBL_CopyToDevice(Ci, Ci_host, Np * number_ion_species* sizeof(double)); + ScaLBL_CopyToDevice(fq, cDist, 7 * Np * number_ion_species *sizeof(double)); + ScaLBL_Comm->Barrier(); + comm.barrier(); + delete[] Ci_host; + delete[] cDist; + } + /** END RESTART **/ + if (rank == 0) printf("LB Ion Solver: initializing charge density\n"); for (size_t ic = 0; ic < number_ion_species; ic++) { @@ -984,6 +1294,7 @@ void ScaLBL_IonModel::Run(double *Velocity, double *ElectricField) { //ScaLBL_Comm->Barrier(); comm.barrier(); //auto t1 = std::chrono::system_clock::now(); + auto t1 = std::chrono::system_clock::now(); for (size_t ic = 0; ic < number_ion_species; ic++) { timestep = 0; while (timestep < timestepMax[ic]) { @@ -1052,13 +1363,13 @@ void ScaLBL_IonModel::Run(double *Velocity, double *ElectricField) { ScaLBL_Comm->LastExterior(), Np); //LB-Ion collison - ScaLBL_D3Q7_AAodd_Ion( + ScaLBL_D3Q7_AAodd_Ion_v0( NeighborList, &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], Velocity, ElectricField, IonDiffusivity[ic], IonValence[ic], rlx[ic], Vt, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_D3Q7_AAodd_Ion( + ScaLBL_D3Q7_AAodd_Ion_v0( NeighborList, &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], Velocity, ElectricField, @@ -1136,13 +1447,13 @@ void ScaLBL_IonModel::Run(double *Velocity, double *ElectricField) { Np); //LB-Ion collison - ScaLBL_D3Q7_AAeven_Ion( + ScaLBL_D3Q7_AAeven_Ion_v0( &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], Velocity, ElectricField, IonDiffusivity[ic], IonValence[ic], rlx[ic], Vt, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_D3Q7_AAeven_Ion( + ScaLBL_D3Q7_AAeven_Ion_v0( &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], Velocity, ElectricField, IonDiffusivity[ic], IonValence[ic], @@ -1166,6 +1477,154 @@ void ScaLBL_IonModel::Run(double *Velocity, double *ElectricField) { ScaLBL_Comm->LastExterior(), Np); } //************************************************************************/ + if (rank == 0) + printf("---------------------------------------------------------------" + "----\n"); + // Compute the walltime per timestep + auto t2 = std::chrono::system_clock::now(); + double cputime = std::chrono::duration(t2 - t1).count() / timestep; + // Performance obtained from each node + double MLUPS = double(Np) / cputime / 1000000; + if (rank == 0) + printf("********************************************************\n"); + if (rank == 0) + printf("CPU time = %f \n", cputime); + if (rank == 0) + printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); + MLUPS *= nprocs; + if (rank == 0) + printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); + if (rank == 0) + printf("********************************************************\n"); +} + +void ScaLBL_IonModel::RunMembrane(double *Velocity, double *ElectricField, double *Psi) { + + //Input parameter: + //1. Velocity is from StokesModel + //2. ElectricField is from Poisson model + + //LB-related parameter + vector rlx; + for (size_t ic = 0; ic < tau.size(); ic++) { + rlx.push_back(1.0 / tau[ic]); + } + + //.......create and start timer............ + //double starttime,stoptime,cputime; + //ScaLBL_Comm->Barrier(); comm.barrier(); + //auto t1 = std::chrono::system_clock::now(); + + for (size_t ic = 0; ic < number_ion_species; ic++) { + + /* set the mass transfer coefficients for the membrane */ + IonMembrane->AssignCoefficients(dvcMap, Psi, ThresholdVoltage[ic],MassFractionIn[ic], + MassFractionOut[ic],ThresholdMassFractionIn[ic],ThresholdMassFractionOut[ic]); + + timestep = 0; + while (timestep < timestepMax[ic]) { + //************************************************************************/ + // *************ODD TIMESTEP*************// + timestep++; + + //LB-Ion collison + IonMembrane->SendD3Q7AA(&fq[ic * Np * 7]); //READ FORM NORMAL + + ScaLBL_D3Q7_AAodd_Ion( + IonMembrane->NeighborList, &fq[ic * Np * 7], &Ci[ic * Np], + &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], + &FluxElectrical[3 * ic * Np], Velocity, ElectricField, + IonDiffusivity[ic], IonValence[ic], rlx[ic], Vt, + ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + + IonMembrane->RecvD3Q7AA(&fq[ic * Np * 7]); //WRITE INTO OPPOSITE + + ScaLBL_D3Q7_AAodd_Ion( + IonMembrane->NeighborList, &fq[ic * Np * 7], &Ci[ic * Np], + &FluxDiffusive[3 * ic * Np], &FluxAdvective[3 * ic * Np], + &FluxElectrical[3 * ic * Np], Velocity, ElectricField, + IonDiffusivity[ic], IonValence[ic], rlx[ic], Vt, 0, + ScaLBL_Comm->LastExterior(), Np); + + + IonMembrane->IonTransport(&fq[ic * Np * 7],&Ci[ic * Np]); + + + /* if (BoundaryConditionSolid == 1) { + //TODO IonSolid may also be species-dependent + ScaLBL_Comm->SolidDirichletD3Q7(&fq[ic * Np * 7], IonSolid); + } + ScaLBL_Comm->Barrier(); + comm.barrier(); + */ + // *************EVEN TIMESTEP*************// + timestep++; + + //LB-Ion collison + IonMembrane->SendD3Q7AA(&fq[ic * Np * 7]); //READ FORM NORMAL + + ScaLBL_D3Q7_AAeven_Ion( + &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], + &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], + Velocity, ElectricField, IonDiffusivity[ic], IonValence[ic], + rlx[ic], Vt, ScaLBL_Comm->FirstInterior(), + ScaLBL_Comm->LastInterior(), Np); + + + IonMembrane->RecvD3Q7AA(&fq[ic * Np * 7]); //WRITE INTO OPPOSITE + + ScaLBL_D3Q7_AAeven_Ion( + &fq[ic * Np * 7], &Ci[ic * Np], &FluxDiffusive[3 * ic * Np], + &FluxAdvective[3 * ic * Np], &FluxElectrical[3 * ic * Np], + Velocity, ElectricField, IonDiffusivity[ic], IonValence[ic], + rlx[ic], Vt, 0, ScaLBL_Comm->LastExterior(), Np); + + IonMembrane->IonTransport(&fq[ic * Np * 7],&Ci[ic * Np]); + + ScaLBL_Comm->Barrier(); + comm.barrier(); + + /* + if (BoundaryConditionSolid == 1) { + //TODO IonSolid may also be species-dependent + ScaLBL_Comm->SolidDirichletD3Q7(&fq[ic * Np * 7], IonSolid); + } + ScaLBL_Comm->Barrier(); + comm.barrier(); + */ + } + } + + //Compute charge density for Poisson equation + for (size_t ic = 0; ic < number_ion_species; ic++) { + int Valence = IonValence[ic]; + ScaLBL_D3Q7_Ion_ChargeDensity(Ci, ChargeDensity, Valence, ic, + ScaLBL_Comm->FirstInterior(), + ScaLBL_Comm->LastInterior(), Np); + ScaLBL_D3Q7_Ion_ChargeDensity(Ci, ChargeDensity, Valence, ic, 0, + ScaLBL_Comm->LastExterior(), Np); + } + + /* DoubleArray Charge(Nx,Ny,Nz); + ScaLBL_Comm->RegularLayout(Map, ChargeDensity, Charge); + double charge_sum=0.0; + double charge_sum_total=0.0; + for (int k=1; kBarrier(); + comm.barrier(); + */ + ScaLBL_Comm->Barrier(); + comm.barrier(); + //if (rank==0) printf(" IonMembrane: completeted full step \n"); + //fflush(stdout); + //************************************************************************/ //if (rank==0) printf("-------------------------------------------------------------------\n"); //// Compute the walltime per timestep //auto t2 = std::chrono::system_clock::now(); @@ -1181,6 +1640,33 @@ void ScaLBL_IonModel::Run(double *Velocity, double *ElectricField) { //if (rank==0) printf("********************************************************\n"); } +void ScaLBL_IonModel::Checkpoint(){ + + if (rank == 0) { + printf(" ION MODEL: Writing restart file! \n"); + } + int idx; + double value,sum; + double*cDist; + cDist = new double[7 * number_ion_species * Np]; + ScaLBL_CopyToHost(cDist, fq, 7 * Np * number_ion_species *sizeof(double)); + + ofstream File(LocalRestartFile, ios::binary); + for (size_t ic = 0; ic < number_ion_species; ic++){ + for (int n = 0; n < Np; n++) { + // Write the distributions + for (int q = 0; q < 7; q++) { + value = cDist[ic * Np * 7 + q * Np + n]; + File.write((char *)&value, sizeof(value)); + } + } + } + File.close(); + + delete[] cDist; + +} + void ScaLBL_IonModel::getIonConcentration(DoubleArray &IonConcentration, const size_t ic) { //This function wirte out the data in a normal layout (by aggregating all decomposed domains) diff --git a/models/IonModel.h b/models/IonModel.h index 8930b1df..9a6756e6 100644 --- a/models/IonModel.h +++ b/models/IonModel.h @@ -1,7 +1,6 @@ /* * Ion transporte LB Model */ - #ifndef ScaLBL_IonModel_INC #define ScaLBL_IonModel_INC @@ -16,6 +15,7 @@ #include "common/ScaLBL.h" #include "common/Communication.h" +#include "common/Membrane.h" #include "common/MPI.h" #include "analysis/Minkowski.h" #include "ProfilerApp.h" @@ -30,10 +30,12 @@ public: void ReadParams(string filename); void ReadParams(std::shared_ptr db0); void SetDomain(); + void SetMembrane(); void ReadInput(); void Create(); void Initialize(); void Run(double *Velocity, double *ElectricField); + void RunMembrane(double *Velocity, double *ElectricField, double *Psi); void getIonConcentration(DoubleArray &IonConcentration, const size_t ic); void getIonConcentration_debug(int timestep); void getIonFluxDiffusive(DoubleArray &IonFlux_x, DoubleArray &IonFlux_y, @@ -47,9 +49,10 @@ public: void getIonFluxElectrical_debug(int timestep); void DummyFluidVelocity(); void DummyElectricField(); + void Checkpoint(); double CalIonDenConvergence(vector &ci_avg_previous); - //bool Restart,pBC; + bool Restart; int timestep; vector timestepMax; int BoundaryConditionSolid; @@ -66,10 +69,14 @@ public: vector IonDiffusivity; //User input unit [m^2/sec] vector IonValence; vector IonConcentration; //unit [mol/m^3] - vector - Cin; //inlet boundary value, can be either concentration [mol/m^3] or flux [mol/m^2/sec] - vector - Cout; //outlet boundary value, can be either concentration [mol/m^3] or flux [mol/m^2/sec] + vector MembraneIonConcentration; //unit [mol/m^3] + vector ThresholdVoltage; + vector MassFractionIn; + vector MassFractionOut; + vector ThresholdMassFractionIn; + vector ThresholdMassFractionOut; + vector Cin; //inlet boundary value, can be either concentration [mol/m^3] or flux [mol/m^2/sec] + vector Cout; //outlet boundary value, can be either concentration [mol/m^3] or flux [mol/m^2/sec] vector tau; vector time_conv; @@ -80,7 +87,7 @@ public: std::shared_ptr Dm; // this domain is for analysis std::shared_ptr Mask; // this domain is for lbm std::shared_ptr ScaLBL_Comm; - // input database + // input databaseF std::shared_ptr db; std::shared_ptr domain_db; std::shared_ptr ion_db; @@ -88,6 +95,7 @@ public: IntArray Map; DoubleArray Distance; int *NeighborList; + int *dvcMap; double *fq; double *Ci; double *ChargeDensity; @@ -97,7 +105,14 @@ public: double *FluxDiffusive; double *FluxAdvective; double *FluxElectrical; - + + /* these support membrane capabilities */ + bool USE_MEMBRANE; + std::shared_ptr membrane_db; + std::shared_ptr IonMembrane; + DoubleArray MembraneDistance; + int MembraneCount; // number of links the cross the membrane + private: Utilities::MPI comm; @@ -113,6 +128,8 @@ private: void AssignIonConcentration_FromFile(double *Ci, const vector &File_ion, int ic); + void AssignIonConcentrationMembrane( double *Ci, int ic); + void IonConcentration_LB_to_Phys(DoubleArray &Den_reg); void IonFlux_LB_to_Phys(DoubleArray &Den_reg, const size_t ic); }; diff --git a/models/MRTModel.cpp b/models/MRTModel.cpp index 6d5d41b2..83a5c95e 100644 --- a/models/MRTModel.cpp +++ b/models/MRTModel.cpp @@ -41,6 +41,7 @@ void ScaLBL_MRTModel::ReadParams(string filename) { tau = 1.0; timestepMax = 100000; + ANALYSIS_INTERVAL = 1000; tolerance = 1.0e-8; Fx = Fy = 0.0; Fz = 1.0e-5; @@ -51,6 +52,9 @@ void ScaLBL_MRTModel::ReadParams(string filename) { if (mrt_db->keyExists("timestepMax")) { timestepMax = mrt_db->getScalar("timestepMax"); } + if (mrt_db->keyExists("analysis_interval")) { + ANALYSIS_INTERVAL = mrt_db->getScalar("analysis_interval"); + } if (mrt_db->keyExists("tolerance")) { tolerance = mrt_db->getScalar("tolerance"); } @@ -323,7 +327,7 @@ void ScaLBL_MRTModel::Run() { comm.barrier(); //************************************************************************/ - if (timestep % 1000 == 0) { + if (timestep % ANALYSIS_INTERVAL == 0) { ScaLBL_D3Q19_Momentum(fq, Velocity, Np); ScaLBL_DeviceBarrier(); comm.barrier(); diff --git a/models/MRTModel.h b/models/MRTModel.h index f1489aa8..e2d907da 100644 --- a/models/MRTModel.h +++ b/models/MRTModel.h @@ -48,6 +48,7 @@ public: bool Restart, pBC; int timestep, timestepMax; + int ANALYSIS_INTERVAL; int BoundaryCondition; double tau, mu; double Fx, Fy, Fz, flux; diff --git a/models/MultiPhysController.cpp b/models/MultiPhysController.cpp index 974de530..b0d3c817 100644 --- a/models/MultiPhysController.cpp +++ b/models/MultiPhysController.cpp @@ -4,7 +4,7 @@ ScaLBL_Multiphys_Controller::ScaLBL_Multiphys_Controller( int RANK, int NP, const Utilities::MPI &COMM) : rank(RANK), nprocs(NP), Restart(0), timestepMax(0), num_iter_Stokes(0), num_iter_Ion(0), analysis_interval(0), visualization_interval(0), - tolerance(0), time_conv_max(0), comm(COMM) {} + tolerance(0), time_conv_max(0), time_conv_MainLoop(0), comm(COMM) {} ScaLBL_Multiphys_Controller::~ScaLBL_Multiphys_Controller() {} void ScaLBL_Multiphys_Controller::ReadParams(string filename) { @@ -16,12 +16,14 @@ void ScaLBL_Multiphys_Controller::ReadParams(string filename) { // Default parameters timestepMax = 10000; Restart = false; + restart_interval = 100000; num_iter_Stokes = 1; num_iter_Ion.push_back(1); analysis_interval = 500; visualization_interval = 10000; tolerance = 1.0e-6; time_conv_max = 0.0; + time_conv_MainLoop = 0.0; // load input parameters if (study_db->keyExists("timestepMax")) { @@ -34,6 +36,9 @@ void ScaLBL_Multiphys_Controller::ReadParams(string filename) { visualization_interval = study_db->getScalar("visualization_interval"); } + if (study_db->keyExists("restart_interval")) { + restart_interval = study_db->getScalar("restart_interval"); + } if (study_db->keyExists("tolerance")) { tolerance = study_db->getScalar("tolerance"); } @@ -150,6 +155,53 @@ vector ScaLBL_Multiphys_Controller::getIonNumIter_PNP_coupling( return num_iter_ion; } +vector ScaLBL_Multiphys_Controller::getIonNumIter_NernstPlanck_coupling( + const vector &IonTimeConv) { + //Return number of internal iterations for the Ion transport solver + vector TimeConv; + TimeConv.assign(IonTimeConv.begin(), IonTimeConv.end()); + vector num_iter_ion; + vector::iterator it_max = max_element(TimeConv.begin(), TimeConv.end()); + unsigned int idx_max = distance(TimeConv.begin(), it_max); + if (idx_max == 0) { + num_iter_ion.push_back(2); + for (unsigned int idx = 1; idx < TimeConv.size(); idx++) { + double temp = + 2 * TimeConv[idx_max] / + TimeConv + [idx]; //the factor 2 is the number of iterations for the element has max time_conv + num_iter_ion.push_back(int(round(temp / 2) * 2)); + } + } else if (idx_max == TimeConv.size() - 1) { + for (unsigned int idx = 0; idx < TimeConv.size() - 1; idx++) { + double temp = + 2 * TimeConv[idx_max] / + TimeConv + [idx]; //the factor 2 is the number of iterations for the element has max time_conv + num_iter_ion.push_back(int(round(temp / 2) * 2)); + } + num_iter_ion.push_back(2); + } else { + for (unsigned int idx = 0; idx < idx_max; idx++) { + double temp = + 2 * TimeConv[idx_max] / + TimeConv + [idx]; //the factor 2 is the number of iterations for the element has max time_conv + num_iter_ion.push_back(int(round(temp / 2) * 2)); + } + num_iter_ion.push_back(2); + for (unsigned int idx = idx_max + 1; idx < TimeConv.size(); idx++) { + double temp = + 2 * TimeConv[idx_max] / + TimeConv + [idx]; //the factor 2 is the number of iterations for the element has max time_conv + num_iter_ion.push_back(int(round(temp / 2) * 2)); + } + } + return num_iter_ion; +} + + void ScaLBL_Multiphys_Controller::getTimeConvMax_PNP_coupling( double StokesTimeConv, const vector &IonTimeConv) { //Return maximum of the time converting factor from Stokes and ion solvers diff --git a/models/MultiPhysController.h b/models/MultiPhysController.h index 9ce31d10..baee9966 100644 --- a/models/MultiPhysController.h +++ b/models/MultiPhysController.h @@ -28,7 +28,7 @@ public: const vector &IonTimeConv); vector getIonNumIter_PNP_coupling(double StokesTimeConv, const vector &IonTimeConv); - //void getIonNumIter_PNP_coupling(double StokesTimeConv,vector &IonTimeConv,vector &IonTimeMax); + vector getIonNumIter_NernstPlanck_coupling(const vector &IonTimeConv); void getTimeConvMax_PNP_coupling(double StokesTimeConv, const vector &IonTimeConv); @@ -38,8 +38,10 @@ public: vector num_iter_Ion; int analysis_interval; int visualization_interval; + int restart_interval; double tolerance; double time_conv_max; + double time_conv_MainLoop; //double SchmidtNum;//Schmidt number = kinematic_viscosity/mass_diffusivity int rank, nprocs; diff --git a/models/PoissonSolver.cpp b/models/PoissonSolver.cpp index e3dccfd2..4dab7323 100644 --- a/models/PoissonSolver.cpp +++ b/models/PoissonSolver.cpp @@ -32,6 +32,14 @@ ScaLBL_Poisson::ScaLBL_Poisson(int RANK, int NP, const Utilities::MPI& COMM): } ScaLBL_Poisson::~ScaLBL_Poisson() { + ScaLBL_FreeDeviceMemory(NeighborList); + ScaLBL_FreeDeviceMemory(dvcMap); + ScaLBL_FreeDeviceMemory(Psi); + ScaLBL_FreeDeviceMemory(Psi_BCLabel); + ScaLBL_FreeDeviceMemory(ElectricField); + ScaLBL_FreeDeviceMemory(ResidualError); + ScaLBL_FreeDeviceMemory(fq); + if ( TIMELOG ) fclose( TIMELOG ); } @@ -42,7 +50,7 @@ void ScaLBL_Poisson::ReadParams(string filename){ domain_db = db->getDatabase( "Domain" ); electric_db = db->getDatabase( "Poisson" ); - k2_inv = 4.0;//speed of sound for D3Q7 lattice + k2_inv = 3.0;//speed of sound for D3Q19 lattice tau = 0.5+k2_inv; timestepMax = 100000; tolerance = 1.0e-6;//stopping criterion for obtaining steady-state electricla potential @@ -58,11 +66,18 @@ void ScaLBL_Poisson::ReadParams(string filename){ TestPeriodicTime = 1.0;//unit: [sec] TestPeriodicTimeConv = 0.01; //unit [sec/lt] TestPeriodicSaveInterval = 0.1; //unit [sec] + Restart = "false"; // LB-Poisson Model parameters + if (electric_db->keyExists( "Restart" )){ + Restart = electric_db->getScalar("Restart"); + } if (electric_db->keyExists( "timestepMax" )){ timestepMax = electric_db->getScalar( "timestepMax" ); } + if (electric_db->keyExists( "tau" )){ + tau = electric_db->getScalar( "tau" ); + } if (electric_db->keyExists( "analysis_interval" )){ analysis_interval = electric_db->getScalar( "analysis_interval" ); } @@ -71,6 +86,7 @@ void ScaLBL_Poisson::ReadParams(string filename){ } //'tolerance_method' can be {"MSE","MSE_max"} tolerance_method = electric_db->getWithDefault( "tolerance_method", "MSE" ); + lattice_scheme = electric_db->getWithDefault( "lattice_scheme", "D3Q7" ); if (electric_db->keyExists( "epsilonR" )){ epsilonR = electric_db->getScalar( "epsilonR" ); } @@ -122,6 +138,10 @@ void ScaLBL_Poisson::ReadParams(string filename){ //Re-calcualte model parameters if user updates input epsilon0_LB = epsilon0*(h*1.0e-6);//unit:[C/(V*lu)] epsilon_LB = epsilon0_LB*epsilonR;//electric permittivity + + /* restart string */ + sprintf(LocalRankString, "%05d", rank); + sprintf(LocalRestartFile, "%s%s", "Psi.", LocalRankString); if (rank==0) printf("***********************************************************************************\n"); if (rank==0) printf("LB-Poisson Solver: steady-state MaxTimeStep = %i; steady-state tolerance = %.3g \n", timestepMax,tolerance); @@ -136,6 +156,15 @@ void ScaLBL_Poisson::ReadParams(string filename){ else{ if (rank==0) printf("LB-Poisson Solver: tolerance_method=%s cannot be identified!\n",tolerance_method.c_str()); } + if (lattice_scheme.compare("D3Q7")==0){ + if (rank==0) printf("LB-Poisson Solver: Use D3Q7 lattice structure.\n"); + } + else if (lattice_scheme.compare("D3Q19")==0){ + if (rank==0) printf("LB-Poisson Solver: Use D3Q19 lattice structure.\n"); + } + else{ + if (rank==0) printf("LB-Poisson Solver: lattice_scheme=%s cannot be identified!\n",lattice_scheme.c_str()); + } } void ScaLBL_Poisson::SetDomain(){ @@ -182,7 +211,7 @@ void ScaLBL_Poisson::ReadInput(){ sprintf(LocalRankString,"%05d",Dm->rank()); sprintf(LocalRankFilename,"%s%s","ID.",LocalRankString); - sprintf(LocalRestartFile,"%s%s","Restart.",LocalRankString); + sprintf(LocalRestartFile,"%s%s","Psi.",LocalRankString); if (domain_db->keyExists( "Filename" )){ @@ -330,7 +359,7 @@ void ScaLBL_Poisson::Create(){ if (rank==0) printf ("LB-Poisson Solver: Set up memory efficient layout \n"); Map.resize(Nx,Ny,Nz); Map.fill(-2); auto neighborList= new int[18*Npad]; - Np = ScaLBL_Comm->MemoryOptimizedLayoutAA(Map,neighborList,Mask->id.data(),Np,1); + Np = ScaLBL_Comm->MemoryOptimizedLayoutAA(Map,neighborList,Mask->id.data(),Npad,1); comm.barrier(); //........................................................................... @@ -345,11 +374,16 @@ void ScaLBL_Poisson::Create(){ ScaLBL_AllocateDeviceMemory((void **) &NeighborList, neighborSize); ScaLBL_AllocateDeviceMemory((void **) &dvcMap, sizeof(int)*Np); //ScaLBL_AllocateDeviceMemory((void **) &dvcID, sizeof(signed char)*Nx*Ny*Nz); - ScaLBL_AllocateDeviceMemory((void **) &fq, 7*dist_mem_size); ScaLBL_AllocateDeviceMemory((void **) &Psi, sizeof(double)*Nx*Ny*Nz); ScaLBL_AllocateDeviceMemory((void **) &Psi_BCLabel, sizeof(int)*Nx*Ny*Nz); ScaLBL_AllocateDeviceMemory((void **) &ElectricField, 3*sizeof(double)*Np); ScaLBL_AllocateDeviceMemory((void **) &ResidualError, sizeof(double)*Np); + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_AllocateDeviceMemory((void **) &fq, 7*dist_mem_size); + } + else if (lattice_scheme.compare("D3Q19")==0){ + ScaLBL_AllocateDeviceMemory((void **) &fq, 19*dist_mem_size); + } //........................................................................... // Update GPU data structures @@ -366,6 +400,8 @@ void ScaLBL_Poisson::Create(){ } } } + comm.barrier(); + if (rank==0) printf (" .... LB-Poisson Solver: check neighbor list \n"); // check that TmpMap is valid for (int idx=0; idxLastExterior(); idx++){ auto n = TmpMap[idx]; @@ -381,6 +417,8 @@ void ScaLBL_Poisson::Create(){ TmpMap[idx] = Nx*Ny*Nz-1; } } + comm.barrier(); + if (rank==0) printf (" .... LB-Poisson Solver: copy neighbor list to GPU \n"); ScaLBL_CopyToDevice(dvcMap, TmpMap, sizeof(int)*Np); ScaLBL_Comm->Barrier(); delete [] TmpMap; @@ -394,6 +432,7 @@ void ScaLBL_Poisson::Create(){ //ScaLBL_Comm->Barrier(); //Initialize solid boundary for electric potential + // DON'T USE WITH CELLULAR SYSTEM (NO SOLID -- NEED Membrane SOLUTION) ScaLBL_Comm->SetupBounceBackList(Map, Mask->id.data(), Np); comm.barrier(); } @@ -407,7 +446,57 @@ void ScaLBL_Poisson::Potential_Init(double *psi_init){ Vin = 1.0; //Boundary-z (inlet) electric potential Vout = 1.0; //Boundary-Z (outlet) electric potential - if (BoundaryConditionInlet>0){ + if (BoundaryConditionInlet==0 && BoundaryConditionOutlet==0){ + + signed char VALUE=0; + double AFFINITY=0.f; + + auto LabelList = electric_db->getVector( "InitialValueLabels" ); + auto AffinityList = electric_db->getVector( "InitialValues" ); + + size_t NLABELS = LabelList.size(); + if (NLABELS != AffinityList.size()){ + ERROR("Error: LB-Poisson Solver: InitialValueLabels and InitialValues must be of the same length! \n"); + } + + std::vector label_count( NLABELS, 0.0 ); + std::vector label_count_global( NLABELS, 0.0 ); + + for (int k=0;kid[n]; + AFFINITY=0.f; + // Assign the affinity from the paired list + for (unsigned int idx=0; idx < NLABELS; idx++){ + if (VALUE == LabelList[idx]){ + AFFINITY=AffinityList[idx]; + label_count[idx] += 1.0; + idx = NLABELS; + } + } + int idx=Map(i,j,k); + if (!(idx<0)) psi_init[n] = AFFINITY; + } + } + } + + for (size_t idx=0; idxComm.sumReduce( label_count[idx]); + + if (rank==0){ + printf("LB-Poisson Solver: number of Poisson initial-value labels: %lu \n",NLABELS); + for (unsigned int idx=0; idx0 && BoundaryConditionOutlet>0){ + //read input parameters for inlet switch (BoundaryConditionInlet){ case 1: if (electric_db->keyExists( "Vin" )){ @@ -431,8 +520,7 @@ void ScaLBL_Poisson::Potential_Init(double *psi_init){ } break; } - } - if (BoundaryConditionOutlet>0){ + //read input parameters for outlet switch (BoundaryConditionOutlet){ case 1: if (electric_db->keyExists( "Vout" )){ @@ -456,31 +544,51 @@ void ScaLBL_Poisson::Potential_Init(double *psi_init){ } break; } - } - //By default only periodic BC is applied and Vin=Vout=1.0, i.e. there is no potential gradient along Z-axis - if (BoundaryConditionInlet==2) Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,0); - if (BoundaryConditionOutlet==2) Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,0); - double slope = (Vout-Vin)/(Nz-2); - double psi_linearized; - for (int k=0;kid[n]>0){ - psi_init[n] = psi_linearized; + //calcualte inlet/outlet voltage for the case of BCInlet/Outlet=2 + if (BoundaryConditionInlet==2) Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,0); + if (BoundaryConditionOutlet==2) Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,0); + + //initialize a linear electrical potential between inlet and outlet + double slope = (Vout-Vin)/(Nz-2); + double psi_linearized; + for (int k=0;kid[n]>0){ + psi_init[n] = psi_linearized; + } } } } } + else{//mixed periodic and non-periodic BCs are not supported! + ERROR("Error: check the type of inlet and outlet boundary condition! Mixed periodic and non-periodic BCs are found!\n"); + } + /** RESTART **/ + if (Restart == true) { + if (rank == 0) { + printf(" POISSON MODEL: Reading restart file! \n"); + } + ifstream File(LocalRestartFile, ios::binary); + double value; + // Read the distributions + for (int n = 0; n < Nx*Ny*Nz; n++) { + File.read((char *)&value, sizeof(value)); + psi_init[n] = value; + } + File.close(); + } + /** END RESTART **/ } double ScaLBL_Poisson::getBoundaryVoltagefromPeriodicBC(double V0, double freq, double phase_shift, int time_step){ @@ -493,7 +601,12 @@ void ScaLBL_Poisson::Initialize(double time_conv_from_Study){ * "time_conv_from_Study" is the phys to LB time conversion factor, unit=[sec/lt] * which is used for periodic voltage input for inlet and outlet boundaries */ - if (rank==0) printf ("LB-Poisson Solver: initializing D3Q7 distributions\n"); + if (lattice_scheme.compare("D3Q7")==0){ + if (rank==0) printf ("LB-Poisson Solver: initializing D3Q7 distributions\n"); + } + else if (lattice_scheme.compare("D3Q19")==0){ + if (rank==0) printf ("LB-Poisson Solver: initializing D3Q19 distributions\n"); + } //NOTE the initialization involves two steps: //1. assign solid boundary value (surface potential or surface change density) //2. Initialize electric potential for pore nodes @@ -507,8 +620,15 @@ void ScaLBL_Poisson::Initialize(double time_conv_from_Study){ ScaLBL_CopyToDevice(Psi, psi_host, Nx*Ny*Nz*sizeof(double)); ScaLBL_CopyToDevice(Psi_BCLabel, psi_BCLabel_host, Nx*Ny*Nz*sizeof(int)); ScaLBL_Comm->Barrier(); - ScaLBL_D3Q7_Poisson_Init(dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_D3Q7_Poisson_Init(dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_D3Q7_Poisson_Init(dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_D3Q7_Poisson_Init(dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + } + else if (lattice_scheme.compare("D3Q19")==0){ + /* switch to d3Q19 model */ + ScaLBL_D3Q19_Poisson_Init(dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_D3Q19_Poisson_Init(dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + } delete [] psi_host; delete [] psi_BCLabel_host; @@ -523,136 +643,226 @@ void ScaLBL_Poisson::Initialize(double time_conv_from_Study){ //} } +//void ScaLBL_Poisson::Run(double *ChargeDensity, bool UseSlippingVelBC, int timestep_from_Study){ +// +// //.......create and start timer............ +// //double starttime,stoptime,cputime; +// //comm.barrier(); +// //auto t1 = std::chrono::system_clock::now(); +// double *host_Error; +// host_Error = new double [Np]; +// +// timestep=0; +// double error = 1.0; +// while (timestep < timestepMax && error > tolerance) { +// //************************************************************************/ +// // *************ODD TIMESTEP*************// +// timestep++; +// //SolveElectricPotentialAAodd(timestep_from_Study,ChargeDensity, UseSlippingVelBC);//update electric potential +// SolvePoissonAAodd(ChargeDensity, UseSlippingVelBC);//perform collision +// ScaLBL_Comm->Barrier(); comm.barrier(); +// +// // *************EVEN TIMESTEP*************// +// timestep++; +// //SolveElectricPotentialAAeven(timestep_from_Study,ChargeDensity, UseSlippingVelBC);//update electric potential +// SolvePoissonAAeven(ChargeDensity, UseSlippingVelBC);//perform collision +// ScaLBL_Comm->Barrier(); comm.barrier(); +// //************************************************************************/ +// +// +// // Check convergence of steady-state solution +// if (timestep==2){ +// //save electric potential for convergence check +// } +// if (timestep%analysis_interval==0){ +// /* get the elecric potential */ +// ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); +// if (rank==0) printf(" ... getting Poisson solver error \n"); +// double err = 0.0; +// double max_error = 0.0; +// ScaLBL_CopyToHost(host_Error,ResidualError,sizeof(double)*Np); +// for (int idx=0; idx max_error ){ +// max_error = err; +// } +// } +// error=Dm->Comm.maxReduce(max_error); +// +// /* compute the eletric field */ +// //ScaLBL_D3Q19_Poisson_getElectricField(fq, ElectricField, tau, Np); +// +// } +// } +// if(WriteLog==true){ +// getConvergenceLog(timestep,error); +// } +// +// //************************************************************************/ +// ////if (rank==0) printf("LB-Poission Solver: a steady-state solution is obtained\n"); +// ////if (rank==0) printf("---------------------------------------------------------------------------\n"); +// //// Compute the walltime per timestep +// //auto t2 = std::chrono::system_clock::now(); +// //double cputime = std::chrono::duration( t2 - t1 ).count() / timestep; +// //// Performance obtained from each node +// //double MLUPS = double(Np)/cputime/1000000; +// +// //if (rank==0) printf("******************* LB-Poisson Solver Statistics ********************\n"); +// //if (rank==0) printf("CPU time = %f \n", cputime); +// //if (rank==0) printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); +// //MLUPS *= nprocs; +// //if (rank==0) printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); +// //if (rank==0) printf("*********************************************************************\n"); +// +//} + void ScaLBL_Poisson::Run(double *ChargeDensity, bool UseSlippingVelBC, int timestep_from_Study){ - //.......create and start timer............ - //double starttime,stoptime,cputime; - //comm.barrier(); - //auto t1 = std::chrono::system_clock::now(); + double error = 1.0; + if (lattice_scheme.compare("D3Q7")==0){ + timestep=0; + while (timestep < timestepMax && error > tolerance) { + //************************************************************************/ + // *************ODD TIMESTEP*************// + timestep++; + SolveElectricPotentialAAodd(timestep_from_Study);//update electric potential + SolvePoissonAAodd(ChargeDensity, UseSlippingVelBC);//perform collision + ScaLBL_Comm->Barrier(); comm.barrier(); - timestep=0; - double error = 1.0; - while (timestep < timestepMax && error > tolerance) { - //************************************************************************/ - // *************ODD TIMESTEP*************// - timestep++; - SolveElectricPotentialAAodd(timestep_from_Study);//update electric potential - SolvePoissonAAodd(ChargeDensity, UseSlippingVelBC);//perform collision - ScaLBL_Comm->Barrier(); comm.barrier(); + // *************EVEN TIMESTEP*************// + timestep++; + SolveElectricPotentialAAeven(timestep_from_Study);//update electric potential + SolvePoissonAAeven(ChargeDensity, UseSlippingVelBC);//perform collision + ScaLBL_Comm->Barrier(); comm.barrier(); + //************************************************************************/ - // *************EVEN TIMESTEP*************// - timestep++; - SolveElectricPotentialAAeven(timestep_from_Study);//update electric potential - SolvePoissonAAeven(ChargeDensity, UseSlippingVelBC);//perform collision - ScaLBL_Comm->Barrier(); comm.barrier(); - //************************************************************************/ - - // Check convergence of steady-state solution - if (timestep==2){ - //save electric potential for convergence check - ScaLBL_CopyToHost(Psi_previous.data(),Psi,sizeof(double)*Nx*Ny*Nz); - } - if (timestep%analysis_interval==0){ - if (tolerance_method.compare("MSE")==0){ - double count_loc=0; - double count; - double MSE_loc=0.0; - ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); - for (int k=1; k 0){ - MSE_loc += (Psi_host(i,j,k) - Psi_previous(i,j,k))*(Psi_host(i,j,k) - Psi_previous(i,j,k)); - count_loc+=1.0; + // Check convergence of steady-state solution + if (timestep==2){ + //save electric potential for convergence check + ScaLBL_CopyToHost(Psi_previous.data(),Psi,sizeof(double)*Nx*Ny*Nz); + } + if (timestep%analysis_interval==0){ + if (tolerance_method.compare("MSE")==0){ + double count_loc=0; + double count; + double MSE_loc=0.0; + ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); + for (int k=1; k 0){ + MSE_loc += (Psi_host(i,j,k) - Psi_previous(i,j,k))*(Psi_host(i,j,k) - Psi_previous(i,j,k)); + count_loc+=1.0; + } } } } + error=Dm->Comm.sumReduce(MSE_loc); + count=Dm->Comm.sumReduce(count_loc); + error /= count; } - error=Dm->Comm.sumReduce(MSE_loc); - count=Dm->Comm.sumReduce(count_loc); - error /= count; - } - else if (tolerance_method.compare("MSE_max")==0){ - vectorMSE_loc; - double MSE_loc_max; - ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); - for (int k=1; k 0){ - MSE_loc.push_back((Psi_host(i,j,k) - Psi_previous(i,j,k))*(Psi_host(i,j,k) - Psi_previous(i,j,k))); + else if (tolerance_method.compare("MSE_max")==0){ + vectorMSE_loc; + double MSE_loc_max; + ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); + for (int k=1; k 0){ + MSE_loc.push_back((Psi_host(i,j,k) - Psi_previous(i,j,k))*(Psi_host(i,j,k) - Psi_previous(i,j,k))); + } } } } + vector::iterator it_max = max_element(MSE_loc.begin(),MSE_loc.end()); + unsigned int idx_max=distance(MSE_loc.begin(),it_max); + MSE_loc_max=MSE_loc[idx_max]; + error=Dm->Comm.maxReduce(MSE_loc_max); } - vector::iterator it_max = max_element(MSE_loc.begin(),MSE_loc.end()); - unsigned int idx_max=distance(MSE_loc.begin(),it_max); - MSE_loc_max=MSE_loc[idx_max]; - error=Dm->Comm.maxReduce(MSE_loc_max); + else{ + ERROR("Error: user-specified tolerance_method cannot be identified; check you input database! \n"); + } + ScaLBL_CopyToHost(Psi_previous.data(),Psi,sizeof(double)*Nx*Ny*Nz); } - else{ - ERROR("Error: user-specified tolerance_method cannot be identified; check you input database! \n"); - } - ScaLBL_CopyToHost(Psi_previous.data(),Psi,sizeof(double)*Nx*Ny*Nz); - - - - - - //legacy code that tried to use residual to check convergence - //ScaLBL_D3Q7_PoissonResidualError(NeighborList,dvcMap,ResidualError,Psi,ChargeDensity,epsilon_LB,Nx,Nx*Ny,ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior()); - //ScaLBL_D3Q7_PoissonResidualError(NeighborList,dvcMap,ResidualError,Psi,ChargeDensity,epsilon_LB,Nx,Nx*Ny,0, ScaLBL_Comm->LastExterior()); - //ScaLBL_Comm->Barrier(); comm.barrier(); - - //vector ResidualError_host(Np); - //double error_loc_max; - ////calculate the maximum residual error - //ScaLBL_CopyToHost(&ResidualError_host[0],ResidualError,sizeof(double)*Np); - - //vector::iterator it_temp1,it_temp2; - //it_temp1=ResidualError_host.begin(); - //advance(it_temp1,ScaLBL_Comm->LastExterior()); - //vector::iterator it_max = max_element(ResidualError_host.begin(),it_temp1); - //unsigned int idx_max1 = distance(ResidualError_host.begin(),it_max); - - //it_temp1=ResidualError_host.begin(); - //it_temp2=ResidualError_host.begin(); - //advance(it_temp1,ScaLBL_Comm->FirstInterior()); - //advance(it_temp2,ScaLBL_Comm->LastInterior()); - //it_max = max_element(it_temp1,it_temp2); - //unsigned int idx_max2 = distance(ResidualError_host.begin(),it_max); - //if (ResidualError_host[idx_max1]>ResidualError_host[idx_max2]){ - // error_loc_max=ResidualError_host[idx_max1]; - //} - //else{ - // error_loc_max=ResidualError_host[idx_max2]; - //} - //error = Dm->Comm.maxReduce(error_loc_max); } - } + } + else if (lattice_scheme.compare("D3Q19")==0){ + + double *host_Error; + host_Error = new double [Np]; + + timestep=0; + auto t1 = std::chrono::system_clock::now(); + while (timestep < timestepMax && error > tolerance) { + //************************************************************************/ + // *************ODD TIMESTEP*************// + timestep++; + //SolveElectricPotentialAAodd(timestep_from_Study,ChargeDensity, UseSlippingVelBC);//update electric potential + SolvePoissonAAodd(ChargeDensity, UseSlippingVelBC);//perform collision + ScaLBL_Comm->Barrier(); comm.barrier(); + + // *************EVEN TIMESTEP*************// + timestep++; + //SolveElectricPotentialAAeven(timestep_from_Study,ChargeDensity, UseSlippingVelBC);//update electric potential + SolvePoissonAAeven(ChargeDensity, UseSlippingVelBC);//perform collision + ScaLBL_Comm->Barrier(); comm.barrier(); + //************************************************************************/ + + + // Check convergence of steady-state solution + if (timestep==2){ + //save electric potential for convergence check + } + if (timestep%analysis_interval==0){ + /* get the elecric potential */ + ScaLBL_CopyToHost(Psi_host.data(),Psi,sizeof(double)*Nx*Ny*Nz); + if (rank==0) printf(" ... getting Poisson solver error \n"); + double err = 0.0; + double max_error = 0.0; + ScaLBL_CopyToHost(host_Error,ResidualError,sizeof(double)*Np); + for (int idx=0; idx max_error ){ + max_error = err; + } + } + error=Dm->Comm.maxReduce(max_error); + + /* compute the eletric field */ + //ScaLBL_D3Q19_Poisson_getElectricField(fq, ElectricField, tau, Np); + + } + } + if (rank == 0) + printf("---------------------------------------------------------------" + "----\n"); + // Compute the walltime per timestep + auto t2 = std::chrono::system_clock::now(); + double cputime = std::chrono::duration(t2 - t1).count() / timestep; + // Performance obtained from each node + double MLUPS = double(Np) / cputime / 1000000; + + if (rank == 0) + printf("********************************************************\n"); + if (rank == 0) + printf("CPU time = %f \n", cputime); + if (rank == 0) + printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); + MLUPS *= nprocs; + if (rank == 0) + printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); + if (rank == 0) + printf("********************************************************\n"); + delete [] host_Error; + + } + //************************************************************************/ + if(WriteLog==true){ getConvergenceLog(timestep,error); } - - //************************************************************************/ - ////if (rank==0) printf("LB-Poission Solver: a steady-state solution is obtained\n"); - ////if (rank==0) printf("---------------------------------------------------------------------------\n"); - //// Compute the walltime per timestep - //auto t2 = std::chrono::system_clock::now(); - //double cputime = std::chrono::duration( t2 - t1 ).count() / timestep; - //// Performance obtained from each node - //double MLUPS = double(Np)/cputime/1000000; - - //if (rank==0) printf("******************* LB-Poisson Solver Statistics ********************\n"); - //if (rank==0) printf("CPU time = %f \n", cputime); - //if (rank==0) printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); - //MLUPS *= nprocs; - //if (rank==0) printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); - //if (rank==0) printf("*********************************************************************\n"); - } - void ScaLBL_Poisson::getConvergenceLog(int timestep,double error){ if ( rank == 0 ) { fprintf(TIMELOG,"%i %.5g\n",timestep,error); @@ -661,94 +871,217 @@ void ScaLBL_Poisson::getConvergenceLog(int timestep,double error){ } void ScaLBL_Poisson::SolveElectricPotentialAAodd(int timestep_from_Study){ - ScaLBL_Comm->SendD3Q7AA(fq, 0); //READ FROM NORMAL - ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_Comm->RecvD3Q7AA(fq, 0); //WRITE INTO OPPOSITE - ScaLBL_Comm->Barrier(); - // Set boundary conditions - if (BoundaryConditionInlet > 0){ - switch (BoundaryConditionInlet){ - case 1: - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); - break; - case 2: - Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); - break; + + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_Comm->SendD3Q7AA(fq, 0); //READ FROM NORMAL + ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_Comm->RecvD3Q7AA(fq, 0); //WRITE INTO OPPOSITE + ScaLBL_Comm->Barrier(); + // Set boundary conditions + if (BoundaryConditionInlet > 0){ + switch (BoundaryConditionInlet){ + case 1: + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); + break; + case 2: + Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); + break; + } } - } - if (BoundaryConditionOutlet > 0){ - switch (BoundaryConditionOutlet){ - case 1: - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); - break; - case 2: - Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); - break; + if (BoundaryConditionOutlet > 0){ + switch (BoundaryConditionOutlet){ + case 1: + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); + break; + case 2: + Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); + break; + } } - } - //-------------------------// - ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + //-------------------------// + ScaLBL_D3Q7_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + } + else if (lattice_scheme.compare("D3Q19")==0){ + ScaLBL_Comm->SendD3Q19AA(fq); //READ FROM NORMAL + //ScaLBL_D3Q19_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, ChargeDensity, Psi, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + ScaLBL_Comm->Barrier(); + + // Set boundary conditions + if (BoundaryConditionInlet > 0){ + switch (BoundaryConditionInlet){ + case 1: + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, Vin, timestep); + break; + case 2: + Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, Vin, timestep); + break; + } + } + if (BoundaryConditionOutlet > 0){ + switch (BoundaryConditionOutlet){ + case 1: + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, Vout, timestep); + break; + case 2: + Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, Vout, timestep); + break; + } + } + //-------------------------// + + //ScaLBL_D3Q19_AAodd_Poisson_ElectricPotential(NeighborList, dvcMap, fq, ChargeDensity, Psi, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + } } void ScaLBL_Poisson::SolveElectricPotentialAAeven(int timestep_from_Study){ - ScaLBL_Comm->SendD3Q7AA(fq, 0); //READ FORM NORMAL - ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_Comm->RecvD3Q7AA(fq, 0); //WRITE INTO OPPOSITE - ScaLBL_Comm->Barrier(); - // Set boundary conditions - if (BoundaryConditionInlet > 0){ - switch (BoundaryConditionInlet){ - case 1: - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); - break; - case 2: - Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); - break; + + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_Comm->SendD3Q7AA(fq, 0); //READ FORM NORMAL + ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(dvcMap, fq, Psi, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_Comm->RecvD3Q7AA(fq, 0); //WRITE INTO OPPOSITE + ScaLBL_Comm->Barrier(); + // Set boundary conditions + if (BoundaryConditionInlet > 0){ + switch (BoundaryConditionInlet){ + case 1: + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); + break; + case 2: + Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_z(NeighborList, fq, Vin, timestep); + break; + } } - } - if (BoundaryConditionOutlet > 0){ - switch (BoundaryConditionOutlet){ - case 1: - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); - break; - case 2: - Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); - ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); - break; + if (BoundaryConditionOutlet > 0){ + switch (BoundaryConditionOutlet){ + case 1: + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); + break; + case 2: + Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); + ScaLBL_Comm->D3Q7_Poisson_Potential_BC_Z(NeighborList, fq, Vout, timestep); + break; + } } - } - //-------------------------// - ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + //-------------------------// + ScaLBL_D3Q7_AAeven_Poisson_ElectricPotential(dvcMap, fq, Psi, 0, ScaLBL_Comm->LastExterior(), Np); + } + else if (lattice_scheme.compare("D3Q19")==0){ + ScaLBL_Comm->SendD3Q19AA(fq); //READ FORM NORMAL + //ScaLBL_D3Q19_AAeven_Poisson_ElectricPotential(dvcMap, fq, ChargeDensity, Psi, epsilon_LB, UseSlippingVelBC, + // ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + ScaLBL_Comm->Barrier(); + + + // Set boundary conditions + if (BoundaryConditionInlet > 0){ + switch (BoundaryConditionInlet){ + case 1: + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, Vin, timestep); + break; + case 2: + Vin = getBoundaryVoltagefromPeriodicBC(Vin0,freqIn,PhaseShift_In,timestep_from_Study); + ScaLBL_Comm->D3Q19_Pressure_BC_z(NeighborList, fq, Vin, timestep); + break; + } + } + if (BoundaryConditionOutlet > 0){ + switch (BoundaryConditionOutlet){ + case 1: + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, Vout, timestep); + break; + case 2: + Vout = getBoundaryVoltagefromPeriodicBC(Vout0,freqOut,PhaseShift_Out,timestep_from_Study); + ScaLBL_Comm->D3Q19_Pressure_BC_Z(NeighborList, fq, Vout, timestep); + break; + } + } + + //-------------------------// + //ScaLBL_D3Q19_AAeven_Poisson_ElectricPotential(dvcMap, fq, ChargeDensity, Psi, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + } } void ScaLBL_Poisson::SolvePoissonAAodd(double *ChargeDensity, bool UseSlippingVelBC){ - ScaLBL_D3Q7_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_D3Q7_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); - //TODO: perhaps add another ScaLBL_Comm routine to update Psi values on solid boundary nodes. - //something like: - //ScaLBL_Comm->SolidDirichletBoundaryUpdates(Psi, Psi_BCLabel, timestep); - ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); - //if (BoundaryConditionSolid==1){ - // ScaLBL_Comm->SolidDirichletD3Q7(fq, Psi); - //} - //else if (BoundaryConditionSolid==2){ - // ScaLBL_Comm->SolidNeumannD3Q7(fq, Psi); - //} + + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_D3Q7_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_D3Q7_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + //TODO: perhaps add another ScaLBL_Comm routine to update Psi values on solid boundary nodes. + //something like: + //ScaLBL_Comm->SolidDirichletBoundaryUpdates(Psi, Psi_BCLabel, timestep); + ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); + //if (BoundaryConditionSolid==1){ + // ScaLBL_Comm->SolidDirichletD3Q7(fq, Psi); + //} + //else if (BoundaryConditionSolid==2){ + // ScaLBL_Comm->SolidNeumannD3Q7(fq, Psi); + //} + } + else if (lattice_scheme.compare("D3Q19")==0){ + ScaLBL_Comm->SendD3Q19AA(fq); //READ FROM NORMAL + ScaLBL_D3Q19_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + //ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + + ScaLBL_D3Q19_AAodd_Poisson(NeighborList, dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + ScaLBL_Comm->Barrier(); + //TODO: perhaps add another ScaLBL_Comm routine to update Psi values on solid boundary nodes. + //something like: + //ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); + } } void ScaLBL_Poisson::SolvePoissonAAeven(double *ChargeDensity, bool UseSlippingVelBC){ - ScaLBL_D3Q7_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); - ScaLBL_D3Q7_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); - ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); - //if (BoundaryConditionSolid==1){ - // ScaLBL_Comm->SolidDirichletD3Q7(fq, Psi); - //} - //else if (BoundaryConditionSolid==2){ - // ScaLBL_Comm->SolidNeumannD3Q7(fq, Psi); - //} + + if (lattice_scheme.compare("D3Q7")==0){ + ScaLBL_D3Q7_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_D3Q7_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); + //if (BoundaryConditionSolid==1){ + // ScaLBL_Comm->SolidDirichletD3Q7(fq, Psi); + //} + //else if (BoundaryConditionSolid==2){ + // ScaLBL_Comm->SolidNeumannD3Q7(fq, Psi); + //} + } + else if (lattice_scheme.compare("D3Q19")==0){ + ScaLBL_Comm->SendD3Q19AA(fq); //READ FROM NORMAL + ScaLBL_D3Q19_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, ResidualError, tau, epsilon_LB, UseSlippingVelBC, ScaLBL_Comm->FirstInterior(), ScaLBL_Comm->LastInterior(), Np); + ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + // ScaLBL_Comm->RecvD3Q19AA(fq); //WRITE INTO OPPOSITE + ScaLBL_D3Q19_AAeven_Poisson(dvcMap, fq, ChargeDensity, Psi, ElectricField, ResidualError, tau, epsilon_LB, UseSlippingVelBC, 0, ScaLBL_Comm->LastExterior(), Np); + ScaLBL_Comm->Barrier(); + + //ScaLBL_Comm->SolidDirichletAndNeumannD3Q7(fq, Psi, Psi_BCLabel); + } +} + +void ScaLBL_Poisson::Checkpoint(){ + + if (rank == 0) { + printf(" POISSON MODEL: Writing restart file! \n"); + } + double value; + double *cPsi; + cPsi = new double[Nx*Ny*Nz]; + ScaLBL_CopyToHost(cPsi, Psi, Nx*Ny*Nz *sizeof(double)); + + ofstream File(LocalRestartFile, ios::binary); + for (int n = 0; n < Nx*Ny*Nz; n++) { + value = cPsi[n]; + File.write((char *)&value, sizeof(value)); + } + + File.close(); + + delete[] cPsi; } void ScaLBL_Poisson::DummyChargeDensity(){ diff --git a/models/PoissonSolver.h b/models/PoissonSolver.h index ca97054d..8ab60140 100644 --- a/models/PoissonSolver.h +++ b/models/PoissonSolver.h @@ -40,9 +40,11 @@ public: void getElectricField(DoubleArray &Values_x, DoubleArray &Values_y, DoubleArray &Values_z); void getElectricField_debug(int timestep); + void Checkpoint(); + void DummyChargeDensity(); //for debugging - //bool Restart,pBC; + bool Restart; int timestep, timestepMax; int analysis_interval; int BoundaryConditionInlet; @@ -51,6 +53,7 @@ public: double tau; double tolerance; std::string tolerance_method; + std::string lattice_scheme; double k2_inv; double epsilon0, epsilon0_LB, epsilonR, epsilon_LB; double Vin, Vout; @@ -108,8 +111,8 @@ private: void AssignSolidBoundary(double *poisson_solid, int *poisson_solid_label); void Potential_Init(double *psi_init); void ElectricField_LB_to_Phys(DoubleArray &Efield_reg); - void SolveElectricPotentialAAodd(int timestep_from_Study); void SolveElectricPotentialAAeven(int timestep_from_Study); + void SolveElectricPotentialAAodd(int timestep_from_Study); //void SolveElectricField(); void SolvePoissonAAodd(double *ChargeDensity, bool UseSlippingVelBC); void SolvePoissonAAeven(double *ChargeDensity, bool UseSlippingVelBC); diff --git a/sample_scripts/configure_crusher_cpu b/sample_scripts/configure_crusher_cpu new file mode 100755 index 00000000..b6f7667a --- /dev/null +++ b/sample_scripts/configure_crusher_cpu @@ -0,0 +1,47 @@ +#module load cmake/3.21.3 +#module load PrgEnv-gnu +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +# Need a new version of cmake +export CMAKE_DIR=/gpfs/alpine/csc380/proj-shared/LBPM/cmake-3.21.0/bin + +#-I${MPICH_DIR}/include +#-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa + +HIPFLAGS = --amdgpu-target=gfx90a + +# configure +rm -rf CMake* +${CMAKE_DIR}/cmake \ + -D CMAKE_BUILD_TYPE:STRING=Release \ + -D CMAKE_C_COMPILER:PATH=cc \ + -D CMAKE_CXX_COMPILER:PATH=CC \ + -D CMAKE_CXX_STANDARD=14 \ + -D DISABLE_GOLD:BOOL=TRUE \ + -D DISABLE_LTO:BOOL=TRUE \ + -D CMAKE_C_FLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa -I${HDF5_DIR}/include" \ + -D CMAKE_CXX_FLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa -I${HDF5_DIR}/include" \ + -D LINK_LIBRARIES="${ROCM_PATH}/lib/libamdhip64.so;${CRAY_MPICH_ROOTDIR}/gtl/lib/libmpi_gtl_hsa.so" \ + -D USE_HIP=0 \ + -D CMAKE_HIP_COMPILER_TOOLKIT_ROOT=$ROCM_PATH/hip \ + -D USE_MPI=1 \ + -D MPI_SKIP_SEARCH=1 \ + -D MPIEXEC="srun" \ + -D USE_HDF5=1 \ + -D HDF5_DIRECTORY="${HDF5_DIR}" \ + -D USE_SILO=0 \ + -D USE_TIMER=0 \ + -D USE_DOXYGEN:BOOL=false \ + ~/LBPM-WIA + + diff --git a/sample_scripts/configure_crusher_hip b/sample_scripts/configure_crusher_hip new file mode 100755 index 00000000..5281e713 --- /dev/null +++ b/sample_scripts/configure_crusher_hip @@ -0,0 +1,52 @@ +#module load cmake/3.21.3 +#module load PrgEnv-gnu +module load PrgEnv-amd +module load rocm/4.5.0 +module load cray-mpich +module load cray-hdf5-parallel +module load craype-accel-amd-gfx908 + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx90a="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx90a="-lmpi_gtl_hsa" + +export LD_LIBRARY_PATH=${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH} + +# Need a new version of cmake +export CMAKE_DIR=/gpfs/alpine/csc380/proj-shared/LBPM/cmake-3.21.0/bin + +#-I${MPICH_DIR}/include +#-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa + +#export HIPFLAGS="--amdgpu-target=gfx90a --save-temps" +#--amdgpu-spill-vgpr-to-agpr=0 + +#THIS IS HOW TO CHECK FOR SPILLS (example) +# hipcc -c -g -ggdb --save-temps Color.hip + +# -munsafe-fp-atomics +# configure +rm -rf CMake* +${CMAKE_DIR}/cmake \ + -D CMAKE_BUILD_TYPE:STRING=Release \ + -D CMAKE_C_COMPILER:PATH=cc \ + -D CMAKE_CXX_COMPILER:PATH=CC \ + -D CMAKE_CXX_STANDARD=14 \ + -D DISABLE_GOLD:BOOL=TRUE \ + -D DISABLE_LTO:BOOL=TRUE \ + -D CMAKE_C_FLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa -I${HDF5_DIR}/include" \ + -D CMAKE_CXX_FLAGS="-L${MPICH_DIR}/lib -lmpi -L${CRAY_MPICH_ROOTDIR}/gtl/lib -lmpi_gtl_hsa -I${HDF5_DIR}/include" \ + -D LINK_LIBRARIES="${ROCM_PATH}/lib/libamdhip64.so;${CRAY_MPICH_ROOTDIR}/gtl/lib/libmpi_gtl_hsa.so" \ + -D USE_HIP=1 \ + -D CMAKE_HIP_COMPILER_TOOLKIT_ROOT=$ROCM_PATH/hip \ + -D USE_MPI=1 \ + -D MPI_SKIP_SEARCH=1 \ + -D MPIEXEC="srun" \ + -D USE_HDF5=1 \ + -D HDF5_DIRECTORY="${HDF5_DIR}" \ + -D USE_SILO=0 \ + -D USE_TIMER=0 \ + -D USE_DOXYGEN:BOOL=false \ + ~/LBPM-WIA + + diff --git a/sample_scripts/configure_spock_TPLs b/sample_scripts/configure_spock_TPLs index e5293e8a..1290e3b2 100755 --- a/sample_scripts/configure_spock_TPLs +++ b/sample_scripts/configure_spock_TPLs @@ -1,14 +1,13 @@ -export TPL_ROOT=/ccs/home/mbt/repos -export TPL_BUILDER=/ccs/home/mbt/repos/TPL-builder +export TPL_ROOT=/ccs/proj/csc380/mcclurej/spock +export TPL_BUILDER=/ccs/home/mcclurej/tpl-builder export TPL_WEBPAGE=http://bitbucket.org/AdvancedMultiPhysics/tpl-builder/downloads -export INSTALL_DIR=/ccs/home/mbt/spock/install +export INSTALL_DIR=/ccs/proj/csc380/mcclurej/spock/install module load cmake module load llvm-amdgpu module load hip - cmake \ -D CMAKE_BUILD_TYPE=Release \ -D CXX_STD=14 \ @@ -24,7 +23,7 @@ cmake \ -D ENABLE_SHARED:BOOL=OFF \ -D PROCS_INSTALL=8 \ -D TPL_LIST:STRING="TIMER;ZLIB;HDF5;SILO" \ - -D TIMER_URL="${TPL_ROOT}/TimerUtility" \ + -D TIMER_URL="${TPL_ROOT}/timerutility" \ -D ZLIB_URL="http://zlib.net/zlib-1.2.11.tar.gz" \ -D HDF5_URL="${TPL_ROOT}/hdf5-1.8.12.tar.gz" \ -D BUILD_TYPE=x86_64 \ diff --git a/sample_scripts/configure_spock_hip b/sample_scripts/configure_spock_hip index 19eb208f..64af556c 100755 --- a/sample_scripts/configure_spock_hip +++ b/sample_scripts/configure_spock_hip @@ -1,30 +1,40 @@ -module load cmake -module load llvm-amdgpu -module load hip +module load PrgEnv-gnu +module load rocm/4.2.0 +module load cray-mpich +module load cray-hdf5-parallel +#module load craype-accel-amd-gfx908 -export TPL_DIR=/gpfs/alpine/stf006/proj-shared/mbt/spock/install +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx908="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" + +export PE_MPICH_GTL_LIBS_amd_gfx908="-lmpi_gtl_hsa" + + +# Need a new version of cmake +export CMAKE_DIR=/gpfs/alpine/csc380/proj-shared/LBPM/cmake-3.21.0/bin # configure rm -rf CMake* -cmake \ +${CMAKE_DIR}/cmake \ -D CMAKE_BUILD_TYPE:STRING=Release \ - -D CMAKE_C_COMPILER:PATH=cc \ + -D CMAKE_C_COMPILER:PATH=cc \ -D CMAKE_CXX_COMPILER:PATH=CC \ -D CMAKE_CXX_STANDARD=14 \ + -D DISABLE_GOLD:BOOL=TRUE \ + -D DISABLE_LTO:BOOL=TRUE \ + -D LINK_LIBRARIES="${ROCM_PATH}/lib/libamdhip64.so;${CRAY_MPICH_ROOTDIR}/gtl/lib/libmpi_gtl_hsa.so" \ -D USE_HIP=1 \ - -D LINK_LIBRARIES=${HIP_PATH}/lib/libamdhip64.so \ - -D USE_CUDA=0 \ - -D CMAKE_CUDA_FLAGS="-arch sm_70 -Xptxas=-v -Xptxas -dlcm=cg -lineinfo" \ - -D CMAKE_CUDA_HOST_COMPILER="gcc" \ - -D USE_MPI=0 \ + -D CMAKE_HIP_COMPILER_TOOLKIT_ROOT=$ROCM_PATH/hip \ + -D USE_MPI=1 \ + -D MPI_SKIP_SEARCH=1 \ + -D MPIEXEC="srun" \ -D USE_HDF5=1 \ - -D HDF5_DIRECTORY="${TPL_DIR}/hdf5" \ - -D USE_SILO=0 \ - -D SILO_DIRECTORY="${TPL_DIR}/silo" \ - -D USE_DOXYGEN:BOOL=false \ + -D HDF5_DIRECTORY="${HDF5_DIR}" \ + -D USE_SILO=0 \ -D USE_TIMER=0 \ - ~/repos/LBPM-WIA + -D USE_DOXYGEN:BOOL=false \ + ~/LBPM-WIA diff --git a/sample_scripts/configure_spock_hip_mark b/sample_scripts/configure_spock_hip_mark new file mode 100755 index 00000000..b8cebe40 --- /dev/null +++ b/sample_scripts/configure_spock_hip_mark @@ -0,0 +1,39 @@ +## Load the desired modules +module load PrgEnv-gcc +module load rocm/4.3.0 +module load cray-mpich +module load cray-hdf5-parallel + +## These must be set before compiling so the executable picks up GTL +export PE_MPICH_GTL_DIR_amd_gfx908="-L${CRAY_MPICH_ROOTDIR}/gtl/lib" +export PE_MPICH_GTL_LIBS_amd_gfx908="-lmpi_gtl_hsa" + +## These must be set before running +export MPIR_CVAR_GPU_EAGER_DEVICE_MEM=0 +export MPICH_GPU_SUPPORT_ENABLED=1 +export MPICH_SMP_SINGLE_COPY_MODE=CMA + +#export CMAKE_DIR=/gpfs/alpine/csc380/proj-shared/LBPM/cmake-3.21.3/bin +export CMAKE_DIR=/ccs/home/mbt/spock/cmake-3.21.3/bin + +# configure +rm -rf CMake* +${CMAKE_DIR}/cmake \ + -D CMAKE_BUILD_TYPE:STRING=Release \ + -D CMAKE_CXX_COMPILER:PATH=CC \ + -D CMAKE_CXX_STANDARD=14 \ + -D DISABLE_GOLD:BOOL=TRUE \ + -D DISABLE_LTO:BOOL=TRUE \ + -D USE_HIP=1 \ + -D LINK_LIBRARIES="${ROCM_PATH}/lib/libamdhip64.so;${CRAY_MPICH_ROOTDIR}/gtl/lib/libmpi_gtl_hsa.so" \ + -D USE_MPI=1 \ + -D MPI_SKIP_SEARCH=1 \ + -D MPIEXEC="srun" \ + -D USE_HDF5=1 \ + -D HDF5_DIRECTORY="${HDF5_DIR}" \ + -D USE_SILO=0 \ + -D USE_TIMER=0 \ + -D USE_DOXYGEN:BOOL=false \ + ~/repos/LBPM-WIA + + diff --git a/sample_scripts/configure_summit b/sample_scripts/configure_summit index d5c5b52b..ddc3848b 100755 --- a/sample_scripts/configure_summit +++ b/sample_scripts/configure_summit @@ -1,19 +1,28 @@ -# load the module for cmake -#module load cmake +module load cmake + +# gcc/7.5.0 + +module load gcc/7.5.0 +module load cuda/10.2.89 +module load hdf5/1.10.7 -#source /gpfs/gpfs_stage1/b6p315aa/setup/setup-mpi.sh -module load cmake gcc/7.5.0 -module load cuda -module load hdf5 #/ccs/proj/csc380/mcclurej + #export HDF5_DIR=/ccs/proj/csc380/mcclurej/install/hdf5/1.8.12/ + #export SILO_DIR=/ccs/proj/csc380/mcclurej/install/silo/4.10.2/ + #export NETCDF_DIR=/ccs/proj/geo136/install/netcdf/4.6.1 + export HDF5_DIR="$OLCF_HDF5_ROOT" + # configure + + rm -rf CMake* + cmake \ -D CMAKE_BUILD_TYPE:STRING=Release \ -D CMAKE_C_COMPILER:PATH=mpicc \ @@ -22,7 +31,7 @@ cmake \ -D CMAKE_CXX_STANDARD=14 \ -D USE_CUDA=1 \ -D CMAKE_CUDA_FLAGS="-arch sm_70 -Xptxas=-v -Xptxas -dlcm=cg -lineinfo" \ - -D CMAKE_CUDA_HOST_COMPILER="/sw/summit/gcc/6.4.0/bin/gcc" \ + -D CMAKE_CUDA_HOST_COMPILER="/sw/summit/gcc/7.5.0-2/bin/gcc" \ -D USE_MPI=1 \ -D MPIEXEC=mpirun \ -D USE_EXT_MPI_FOR_SERIAL_TESTS:BOOL=TRUE \ @@ -38,4 +47,6 @@ cmake \ -D USE_TIMER=0 \ ~/LBPM-WIA -make VERBOSE=1 -j1 && make install + + +make VERBOSE=1 -j8 && make install diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 5e4441e3..8ceafff4 100755 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -6,9 +6,12 @@ ADD_LBPM_EXECUTABLE( lbpm_permeability_simulator ) ADD_LBPM_EXECUTABLE( lbpm_greyscale_simulator ) ADD_LBPM_EXECUTABLE( lbpm_greyscaleColor_simulator ) ADD_LBPM_EXECUTABLE( lbpm_electrokinetic_SingleFluid_simulator ) +ADD_LBPM_EXECUTABLE( lbpm_nernst_planck_simulator ) +ADD_LBPM_EXECUTABLE( lbpm_nernst_planck_cell_simulator ) +ADD_LBPM_EXECUTABLE( lbpm_cell_simulator ) ADD_LBPM_EXECUTABLE( lbpm_freelee_simulator ) ADD_LBPM_EXECUTABLE( lbpm_freelee_SingleFluidBGK_simulator ) -#ADD_LBPM_EXECUTABLE( lbpm_BGK_simulator ) +ADD_LBPM_EXECUTABLE( lbpm_BGK_simulator ) #ADD_LBPM_EXECUTABLE( lbpm_color_macro_simulator ) ADD_LBPM_EXECUTABLE( lbpm_dfh_simulator ) #ADD_LBPM_EXECUTABLE( lbpm_sphere_pp ) @@ -47,7 +50,6 @@ ADD_LBPM_EXECUTABLE( TestPNP_Stokes ) ADD_LBPM_EXECUTABLE( TestMixedGrad ) - CONFIGURE_FILE( ${CMAKE_CURRENT_SOURCE_DIR}/cylindertest ${CMAKE_CURRENT_BINARY_DIR}/cylindertest COPYONLY ) # Add the tests @@ -59,6 +61,7 @@ ADD_LBPM_TEST( TestTopo3D ) ADD_LBPM_TEST( TestFluxBC ) ADD_LBPM_TEST( TestFlowAdaptor ) ADD_LBPM_TEST( TestMap ) +ADD_LBPM_TEST( TestMembrane ) #ADD_LBPM_TEST( TestMRT ) #ADD_LBPM_TEST( TestColorGrad ) ADD_LBPM_TEST( TestWideHalo ) diff --git a/tests/TestCommD3Q19.cpp b/tests/TestCommD3Q19.cpp index 79b80ba9..c432e93a 100644 --- a/tests/TestCommD3Q19.cpp +++ b/tests/TestCommD3Q19.cpp @@ -183,11 +183,12 @@ int main(int argc, char **argv) int i,j,k; // Load inputs - auto db = loadInputs( nprocs ); - /* auto filename = argv[1]; - auto input_db = std::make_shared( filename ); - auto db = input_db->getDatabase( "Domain" ); - */ + auto filename = argv[1]; + auto input_db = std::make_shared( filename ); + auto db = input_db->getDatabase( "Domain" ); + //else { + // auto db = loadInputs( nprocs ); + //} int Nx = db->getVector( "n" )[0]; int Ny = db->getVector( "n" )[1]; int Nz = db->getVector( "n" )[2]; @@ -269,17 +270,19 @@ int main(int argc, char **argv) //....................................................................... //........................................................................... - comm.barrier(); + //comm.barrier(); if (rank == 0) cout << "Domain set." << endl; //........................................................................... - + cout << flush; //........................................................................... if (rank==0) printf ("Create ScaLBL_Communicator \n"); + cout << flush; // Create a communicator for the device (will use optimized layout) ScaLBL_Communicator ScaLBL_Comm(Dm); int Npad=(Np/16 + 2)*16; if (rank==0) printf ("Set up memory efficient layout, %i | %i | %i \n", Np, Npad, N); + cout << flush; auto neighborList= new int[18*Npad]; IntArray Map(Nx,Ny,Nz); Map.fill(-2); @@ -290,7 +293,8 @@ int main(int argc, char **argv) //......................device distributions................................. dist_mem_size = Np*sizeof(double); if (rank==0) printf ("Allocating distributions \n"); - + cout << flush; + int *NeighborList; int *dvcMap; double *fq; @@ -320,6 +324,9 @@ int main(int argc, char **argv) ScaLBL_DeviceBarrier(); delete [] TmpMap; + if (rank==0) printf("Map is copied to GPU \n"); + cout << flush; + //........................................................................... /* // Write the communcation structure into a file for debugging @@ -351,11 +358,13 @@ int main(int argc, char **argv) fclose(CommFile); */ if (rank==0) printf("Setting the distributions, size = : %i\n", Np); + cout << flush; + //........................................................................... GlobalFlipScaLBL_D3Q19_Init(fq_host, Map, Np, Nx-2, Ny-2, Nz-2, iproc,jproc,kproc,nprocx,nprocy,nprocz); ScaLBL_CopyToDevice(fq, fq_host, 19*dist_mem_size); ScaLBL_DeviceBarrier(); - comm.barrier(); + //comm.barrier(); //************************************************************************* // First timestep ScaLBL_Comm.SendD3Q19AA(fq); //READ FROM NORMAL @@ -375,6 +384,7 @@ int main(int argc, char **argv) int timestep = 0; if (rank==0) printf("********************************************************\n"); if (rank==0) printf("No. of timesteps for timing: %i \n", 100); + cout << flush; //.......create and start timer............ double starttime,stoptime,cputime; @@ -420,13 +430,16 @@ int main(int argc, char **argv) // 18 reads and 18 writes for each lattice site double MemoryRefs = double(Np)*36; // number of memory references for the swap algorithm - GigaBytes / second - if (rank==0) printf("DRAM bandwidth (per process)= %f GB/sec \n",MemoryRefs*8*double(timestep)*1e-9); + if (rank==0) printf("DRAM bandwidth (per process)= %f GB/sec \n",MemoryRefs*8*double(timestep)/cputime*1e-9); // Report bandwidth in Gigabits per second // communication bandwidth includes both send and recieve - if (rank==0) printf("Communication bandwidth (per process)= %f Gbit/sec \n",ScaLBL_Comm.CommunicationCount*64*timestep/1e9); - if (rank==0) printf("Aggregated communication bandwidth = %f Gbit/sec \n",nprocs*ScaLBL_Comm.CommunicationCount*64*timestep/1e9); + if (rank==0) printf("Communication bandwidth (per process)= %f Gbit/sec \n",ScaLBL_Comm.CommunicationCount*64*timestep/cputime*1e-9); + if (rank==0) printf("Aggregated communication bandwidth = %f Gbit/sec \n",nprocs*ScaLBL_Comm.CommunicationCount*64*timestep/cputime*1e-9); + cout << flush; + } // **************************************************** + //cout << fflush; comm.barrier(); Utilities::shutdown(); // **************************************************** diff --git a/tests/TestMembrane.cpp b/tests/TestMembrane.cpp new file mode 100644 index 00000000..51b0d0fb --- /dev/null +++ b/tests/TestMembrane.cpp @@ -0,0 +1,348 @@ + +//************************************************************************* +// Lattice Boltzmann Simulator for Single Phase Flow in Porous Media +// James E. McCLure +//************************************************************************* +#include +#include +#include +#include "common/MPI.h" +#include "common/Membrane.h" +#include "common/ScaLBL.h" + +using namespace std; + +std::shared_ptr loadInputs( int nprocs ) +{ + //auto db = std::make_shared( "Domain.in" ); + auto db = std::make_shared(); + db->putScalar( "BC", 0 ); + db->putVector( "nproc", { 1, 1, 1 } ); + db->putVector( "n", { 32, 32, 32 } ); + db->putScalar( "nspheres", 1 ); + db->putVector( "L", { 1, 1, 1 } ); + return db; +} + +//*************************************************************************************** +int main(int argc, char **argv) +{ + // Initialize MPI + Utilities::startup( argc, argv ); + Utilities::MPI comm( MPI_COMM_WORLD ); + int check=0; + { + + int i,j,k,n; + + int rank = comm.getRank(); + if (rank == 0){ + printf("********************************************************\n"); + printf("Running unit test: TestMembrane \n"); + printf("********************************************************\n"); + } + + // Load inputs + auto db = loadInputs( comm.getSize() ); + int Nx = db->getVector( "n" )[0]; + int Ny = db->getVector( "n" )[1]; + int Nz = db->getVector( "n" )[2]; + auto Dm = std::make_shared(db,comm); + + Nx += 2; + Ny += 2; + Nz += 2; + int N = Nx*Ny*Nz; + //....................................................................... + int Np = 0; + double distance,radius; + DoubleArray Distance(Nx,Ny,Nz); + for (k=0;kid[n] = 1; + radius = double(Nx)/4; + distance = sqrt(double((i-0.5*Nx)*(i-0.5*Nx)+ (j-0.5*Ny)*(j-0.5*Ny)+ (k-0.5*Nz)*(k-0.5*Nz)))-radius; + if (distance < 0.0 ){ + Dm->id[n] = 1; + } + Distance(i,j,k) = distance; + Np++; + } + } + } + Dm->CommInit(); + + // Create a communicator for the device (will use optimized layout) + std::shared_ptr ScaLBL_Comm(new ScaLBL_Communicator(Dm)); + //Create a second communicator based on the regular data layout + std::shared_ptr ScaLBL_Comm_Regular(new ScaLBL_Communicator(Dm)); + + if (rank==0){ + printf("Total domain size = %i \n",N); + printf("Reduced domain size = %i \n",Np); + } + + // LBM variables + if (rank==0) printf ("Set up the neighborlist \n"); + int Npad=Np+32; + int neighborSize=18*Npad*sizeof(int); + int *neighborList; + IntArray Map(Nx,Ny,Nz); + neighborList= new int[18*Npad]; + + //......................device distributions................................. + int *NeighborList; + int *dvcMap; + //........................................................................... + ScaLBL_AllocateDeviceMemory((void **) &NeighborList, neighborSize); + ScaLBL_AllocateDeviceMemory((void **) &dvcMap, sizeof(int)*Npad); + + Np = ScaLBL_Comm->MemoryOptimizedLayoutAA(Map,neighborList,Dm->id.data(),Np,1); + comm.barrier(); + ScaLBL_CopyToDevice(NeighborList, neighborList, 18*Np*sizeof(int)); + + double *dist; + dist = new double [19*Np]; + + // Check the neighborlist + printf("Check neighborlist: exterior %i, first interior %i last interior %i \n",ScaLBL_Comm->LastExterior(),ScaLBL_Comm->FirstInterior(),ScaLBL_Comm->LastInterior()); + for (int idx=0; idxLastExterior(); idx++){ + for (int q=0; q<18; q++){ + int nn = neighborList[q*Np+idx]%Np; + if (nn>Np) printf("neighborlist error (exterior) at q=%i, idx=%i \n",q,idx); + dist[q*Np + idx] = 0.0; + } + } + for (int idx=ScaLBL_Comm->FirstInterior(); idxLastInterior(); idx++){ + for (int q=0; q<18; q++){ + int nn = neighborList[q*Np+idx]%Np; + if (nn>Np) printf("neighborlist error (exterior) at q=%i, idx=%i \n",q,idx); + dist[q*Np + idx] = 0.0; + } + } + + /* create a membrane data structure */ + Membrane M(ScaLBL_Comm, NeighborList, Np); + + int MembraneCount = M.Create(Distance, Map); + if (rank==0) printf (" Number of membrane links: %i \n", MembraneCount); + + /* create a tagged array to show where the mebrane is*/ + double *MembraneLinks; + MembraneLinks = new double [Nx*Ny*Nz]; + for (int n=0; n 0.f){ + Dm->id[n] = 127; + } + if (sum < 0.f){ + Dm->id[n] = 64; + } + } + } + } + if (argc > 1) + Dm->AggregateLabels("membrane.raw"); + + + /* create a pair of distributions to test membrane mass transport routine */ + double *fq, *gq, *Ci, *Cj, *Psi, *Ci_host; + Ci_host = new double [Np]; + + ScaLBL_AllocateDeviceMemory((void **)&fq, 19 * sizeof(double) * Np); + ScaLBL_AllocateDeviceMemory((void **)&gq, 19 * sizeof(double) * Np); + ScaLBL_AllocateDeviceMemory((void **)&Ci, sizeof(double) * Np); + ScaLBL_AllocateDeviceMemory((void **)&Cj, sizeof(double) * Np); + ScaLBL_AllocateDeviceMemory((void **)&Psi, sizeof(double) * Np); + + /* initialize concentration inside membrane */ + for (k=1;k 0.0) + Ci_host[idx] = 1.0; + else + Ci_host[idx] = 0.0; + } + } + } + ScaLBL_CopyToDevice(Ci, Ci_host, sizeof(double) * Np); + + /* initialize the distributions */ + ScaLBL_D3Q7_Ion_Init_FromFile(fq, Ci, Np); + ScaLBL_D3Q7_Ion_Init_FromFile(gq, Ci, Np); + + /* Streaming with the usual neighborlist */ + ScaLBL_D3Q19_AAodd_Compact(NeighborList, fq, Np); + + /* Streaming with the membrane neighborlist*/ + ScaLBL_D3Q19_AAodd_Compact(M.NeighborList, gq, Np); + + /* explicit mass transfer step with the membrane*/ + M.AssignCoefficients(dvcMap, Psi, 0.0, 1.0, 1.0, 1.0, 1.0); + M.IonTransport(gq, Cj); + ScaLBL_CopyToHost(Ci_host, Cj, sizeof(double) * Np); + + double ionError = 0.0; + for (int n=0; n 1e-12) { + printf(" Failed error tolerance in membrane ion transport routine! \n"); + check = 2; + } + + DoubleArray Ions(Nx,Ny,Nz); + ScaLBL_Comm->RegularLayout(Map, Cj, Ions); + if (argc > 1) + Dm->AggregateLabels("membrane2.raw",Ions); + + /* now compare streaming */ + ScaLBL_D3Q7_Ion_Init_FromFile(gq, Ci, Np); + M.IonTransport(gq, Cj); + ScaLBL_D3Q19_AAodd_Compact(M.NeighborList, gq, Np); + M.IonTransport(gq, Cj); + + /* now check that the two results agree*/ + double *fq_h, *gq_h; + fq_h = new double [7*Np]; + gq_h = new double [7*Np]; + ScaLBL_CopyToHost(fq_h, fq, 7*sizeof(double) * Np); + ScaLBL_CopyToHost(gq_h, gq, 7*sizeof(double) * Np); + for (int n = 0; nAggregateLabels("membrane3.raw",MembraneErrors); + + + //........................................................................... + // Update GPU data structures + if (rank==0) printf ("Setting up device map and neighbor list \n"); + int *TmpMap; + TmpMap=new int[Np*sizeof(int)]; + for (k=1; kLastExterior(), + Np); + DoubleArray Result(Nx,Ny,Nz); + + ScaLBL_Comm->RegularLayout(Map, Ci, Result); + +/* for (k=1; kReduce(); Object->MeasureObject(); + //Object->MeasureConnectedPathway(); double Vi = Object->V(); double Ai = Object->A(); double Hi = Object->H(); diff --git a/tests/lbpm_BGK_simulator.cpp b/tests/lbpm_BGK_simulator.cpp index 46002886..6b262582 100644 --- a/tests/lbpm_BGK_simulator.cpp +++ b/tests/lbpm_BGK_simulator.cpp @@ -9,8 +9,8 @@ #include "common/ScaLBL.h" #include "common/Communication.h" #include "analysis/TwoPhase.h" -#include "common/MPI_Helpers.h" - +#include "common/MPI.h" +#include "models/BGKModel.h" //#define WRITE_SURFACES /* @@ -23,414 +23,33 @@ using namespace std; int main(int argc, char **argv) { - //***************************************** - // ***** MPI STUFF **************** - //***************************************** // Initialize MPI - int rank,nprocs; - Utilities::startup( argc, argv ); - Utilities::MPI comm( MPI_COMM_WORLD ); - int rank = comm.getRank(); - int nprocs = comm.getSize(); + Utilities::startup( argc, argv ); + Utilities::MPI comm( MPI_COMM_WORLD ); + int rank = comm.getRank(); + int nprocs = comm.getSize(); { - // parallel domain size (# of sub-domains) - int nprocx,nprocy,nprocz; - if (rank == 0){ printf("********************************************************\n"); printf("Running Single Phase Permeability Calculation \n"); printf("********************************************************\n"); } - - // Variables that specify the computational domain - string FILENAME; - int Nx,Ny,Nz; // local sub-domain size - int nspheres; // number of spheres in the packing - double Lx,Ly,Lz; // Domain length - double D = 1.0; // reference length for non-dimensionalization - // Color Model parameters - int timestepMax, interval; - double tau,Fx,Fy,Fz,tol,err; - double din,dout; - bool pBC,Restart; - int i,j,k,n; - - int RESTART_INTERVAL=20000; - - if (rank==0){ - //............................................................. - // READ SIMULATION PARMAETERS FROM INPUT FILE - //............................................................. - ifstream input("Permeability.in"); - // Line 1: model parameters (tau, alpha, beta, das, dbs) - input >> tau; // Viscosity parameter - // Line 2: External force components (Fx,Fy, Fz) - input >> Fx; - input >> Fy; - input >> Fz; - // Line 3: Pressure Boundary conditions - input >> Restart; - input >> pBC; - input >> din; - input >> dout; - // Line 4: time-stepping criteria - input >> timestepMax; // max no. of timesteps - input >> interval; // restart interval - input >> tol; // error tolerance - //............................................................. - - //....................................................................... - // Reading the domain information file - //....................................................................... - ifstream domain("Domain.in"); - domain >> nprocx; - domain >> nprocy; - domain >> nprocz; - domain >> Nx; - domain >> Ny; - domain >> Nz; - //domain >> nspheres; - domain >> Lx; - domain >> Ly; - domain >> Lz; - //....................................................................... - - } - // ************************************************************** - // Broadcast simulation parameters from rank 0 to all other procs - MPI_Barrier(comm); - //................................................. - MPI_Bcast(&tau,1,MPI_DOUBLE,0,comm); - //MPI_Bcast(&pBC,1,MPI_LOGICAL,0,comm); - // MPI_Bcast(&Restart,1,MPI_LOGICAL,0,comm); - MPI_Bcast(&din,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&dout,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&Fx,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&Fy,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&Fz,1,MPI_DOUBLE,0,comm); - MPI_Bcast(×tepMax,1,MPI_INT,0,comm); - MPI_Bcast(&interval,1,MPI_INT,0,comm); - MPI_Bcast(&tol,1,MPI_DOUBLE,0,comm); - // Computational domain - MPI_Bcast(&Nx,1,MPI_INT,0,comm); - MPI_Bcast(&Ny,1,MPI_INT,0,comm); - MPI_Bcast(&Nz,1,MPI_INT,0,comm); - MPI_Bcast(&nprocx,1,MPI_INT,0,comm); - MPI_Bcast(&nprocy,1,MPI_INT,0,comm); - MPI_Bcast(&nprocz,1,MPI_INT,0,comm); - //MPI_Bcast(&nspheres,1,MPI_INT,0,comm); - MPI_Bcast(&Lx,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&Ly,1,MPI_DOUBLE,0,comm); - MPI_Bcast(&Lz,1,MPI_DOUBLE,0,comm); - //................................................. - MPI_Barrier(comm); - - RESTART_INTERVAL=interval; - // ************************************************************** - // ************************************************************** - double rlx = 1.f/tau; - - if (nprocs != nprocx*nprocy*nprocz){ - printf("nprocx = %i \n",nprocx); - printf("nprocy = %i \n",nprocy); - printf("nprocz = %i \n",nprocz); - INSIST(nprocs == nprocx*nprocy*nprocz,"Fatal error in processor count!"); - } - - if (rank==0){ - printf("********************************************************\n"); - printf("tau = %f \n", tau); - printf("Force(x) = %.5g \n", Fx); - printf("Force(y) = %.5g \n", Fy); - printf("Force(z) = %.5g \n", Fz); - printf("Sub-domain size = %i x %i x %i\n",Nx,Ny,Nz); - printf("Process grid = %i x %i x %i\n",nprocx,nprocy,nprocz); - printf("********************************************************\n"); - } - - double viscosity=(tau-0.5)/3.0; - // Initialized domain and averaging framework for Two-Phase Flow - int BC=pBC; - Domain Dm(Nx,Ny,Nz,rank,nprocx,nprocy,nprocz,Lx,Ly,Lz,BC); - for (i=0; i 0) iVol_global = 1.0/(1.0*(Nx-2)*nprocx*(Ny-2)*nprocy*((Nz-2)*nprocz-6)); - double porosity, pore_vol; - //........................................................................... - if (rank == 0) cout << "Reading in domain from signed distance function..." << endl; - - //....................................................................... - // Read the signed distance - sprintf(LocalRankString,"%05d",rank); - sprintf(LocalRankFilename,"%s%s","SignDist.",LocalRankString); - ReadBinaryFile(LocalRankFilename, Averages.SDs.data(), N); - MPI_Barrier(comm); - if (rank == 0) cout << "Domain set." << endl; - - //....................................................................... - // Assign the phase ID field based on the signed distance - //....................................................................... - - for (k=0;k 0.0){ - id[n] = 2; - } - // compute the porosity (actual interface location used) - if (Averages.SDs(n) > 0.0){ - sum++; - } - } - } - } - - if (rank==0) printf("Initialize from segmented data: solid=0, NWP=1, WP=2 \n"); - sprintf(LocalRankFilename,"ID.%05i",rank); - size_t readID; - FILE *IDFILE = fopen(LocalRankFilename,"rb"); - if (IDFILE==NULL) ERROR("lbpm_permeability_simulator: Error opening file: ID.xxxxx"); - readID=fread(id,1,N,IDFILE); - if (readID != size_t(N)) printf("lbpm_permeability_simulator: Error reading ID (rank=%i) \n",rank); - fclose(IDFILE); - - //....................................................................... - // Compute the media porosity, assign phase labels and solid composition - //....................................................................... - sum_local=0.0; - int Np=0; // number of local pore nodes - //....................................................................... - for (k=1;k 0){ - sum_local+=1.0; - Np++; - } - } - } - } - MPI_Allreduce(&sum_local,&sum,1,MPI_DOUBLE,MPI_SUM,comm); - porosity = sum*iVol_global; - if (rank==0) printf("Media porosity = %f \n",porosity); - - //......................................................... - // don't perform computations at the eight corners - id[0] = id[Nx-1] = id[(Ny-1)*Nx] = id[(Ny-1)*Nx + Nx-1] = 0; - id[(Nz-1)*Nx*Ny] = id[(Nz-1)*Nx*Ny+Nx-1] = id[(Nz-1)*Nx*Ny+(Ny-1)*Nx] = id[(Nz-1)*Nx*Ny+(Ny-1)*Nx + Nx-1] = 0; - //......................................................... - MPI_Barrier(comm); - - // Initialize communication structures in averaging domain - for (i=0; i tol ){ - - timestep++; - ScaLBL_Comm.SendD3Q19AA(dist); //READ FROM NORMAL - ScaLBL_D3Q19_AAodd_BGK(NeighborList, dist, ScaLBL_Comm.first_interior, ScaLBL_Comm.last_interior, Np, rlx, Fx, Fy, Fz); - ScaLBL_Comm.RecvD3Q19AA(dist); //WRITE INTO OPPOSITE - ScaLBL_D3Q19_AAodd_BGK(NeighborList, dist, 0, ScaLBL_Comm.next, Np, rlx, Fx, Fy, Fz); - ScaLBL_DeviceBarrier(); MPI_Barrier(comm); - - timestep++; - ScaLBL_Comm.SendD3Q19AA(dist); //READ FORM NORMAL - ScaLBL_D3Q19_AAeven_BGK(dist, ScaLBL_Comm.first_interior, ScaLBL_Comm.last_interior, Np, rlx, Fx, Fy, Fz); - ScaLBL_Comm.RecvD3Q19AA(dist); //WRITE INTO OPPOSITE - ScaLBL_D3Q19_AAeven_BGK(dist, 0, ScaLBL_Comm.next, Np, rlx, Fx, Fy, Fz); - ScaLBL_DeviceBarrier(); MPI_Barrier(comm); - //************************************************************************/ - - if (timestep%500 == 0){ - //........................................................................... - // Copy the data for for the analysis timestep - //........................................................................... - // Copy the phase from the GPU -> CPU - //........................................................................... - ScaLBL_DeviceBarrier(); - ScaLBL_D3Q19_Pressure(dist,Pressure,Np); - ScaLBL_D3Q19_Momentum(dist,Velocity,Np); - - ScaLBL_Comm.RegularLayout(Map,Pressure,Averages.Press); - ScaLBL_Comm.RegularLayout(Map,&Velocity[0],Averages.Vel_x); - ScaLBL_Comm.RegularLayout(Map,&Velocity[Np],Averages.Vel_y); - ScaLBL_Comm.RegularLayout(Map,&Velocity[2*Np],Averages.Vel_z); - - // Way more work than necessary -- this is just to get the solid interfacial area!! - Averages.Initialize(); - Averages.UpdateMeshValues(); - Averages.ComputeLocal(); - Averages.Reduce(); - - double vawx = Averages.vaw_global(0); - double vawy = Averages.vaw_global(1); - double vawz = Averages.vaw_global(2); - if (rank==0){ - // ************* DIMENSIONLESS FORCHEIMER EQUATION ************************* - // Dye, A.L., McClure, J.E., Gray, W.G. and C.T. Miller - // Description of Non-Darcy Flows in Porous Medium Systems - // Physical Review E 87 (3), 033012 - // Fo := density*D32^3*(density*force) / (viscosity^2) - // Re := density*D32*velocity / viscosity - // Fo = a*Re + b*Re^2 - // ************************************************************************* - //viscosity = (tau-0.5)*0.333333333333333333; - D32 = 6.0*(Dm.Volume-Averages.vol_w_global)/Averages.As_global; - printf("Sauter Mean Diameter = %f \n",D32); - mag_force = sqrt(Fx*Fx+Fy*Fy+Fz*Fz); - Fo = D32*D32*D32*mag_force/viscosity/viscosity; - // .... 1-D flow should be aligned with force ... - velocity = vawx*Fx/mag_force + vawy*Fy/mag_force + vawz*Fz/mag_force; - err1D = fabs(velocity-sqrt(vawx*vawx+vawy*vawy+vawz*vawz))/velocity; - //.......... Computation of the Reynolds number Re .............. - Re = D32*velocity/viscosity; - printf("Force: %.5g,%.5g,%.5g \n",Fx,Fy,Fz); - printf("Velocity: %.5g,%.5g,%.5g \n",vawx,vawy,vawz); - printf("Relative error for 1D representation: %.5g \n",err1D); - printf("Dimensionless force: %5g \n", Fo); - printf("Reynolds number: %.5g \n", Re); - printf("Dimensionless Permeability (k/D^2): %.5g \n", Re/Fo); - } - } - } - //************************************************************************/ + // Initialize compute device + int device=ScaLBL_SetDevice(rank); + NULL_USE( device ); ScaLBL_DeviceBarrier(); - MPI_Barrier(comm); - stoptime = MPI_Wtime(); - if (rank==0) printf("-------------------------------------------------------------------\n"); - // Compute the walltime per timestep - cputime = (stoptime - starttime)/timestep; - // Performance obtained from each node - double MLUPS = double(Np)/cputime/1000000; - - if (rank==0) printf("********************************************************\n"); - if (rank==0) printf("CPU time = %f \n", cputime); - if (rank==0) printf("Lattice update rate (per core)= %f MLUPS \n", MLUPS); - MLUPS *= nprocs; - if (rank==0) printf("Lattice update rate (total)= %f MLUPS \n", MLUPS); - if (rank==0) printf("********************************************************\n"); - - NULL_USE(RESTART_INTERVAL); + comm.barrier(); + + ScaLBL_BGKModel BGK(rank,nprocs,comm); + auto filename = argv[1]; + BGK.ReadParams(filename); + BGK.SetDomain(); // this reads in the domain + BGK.ReadInput(); + BGK.Create(); // creating the model will create data structure to match the pore structure and allocate variables + BGK.Initialize(); // initializing the model will set initial conditions for variables + BGK.Run(); + BGK.VelocityField(); + cout << flush; } - // **************************************************** - comm.barrier(); - Utilities::shutdown(); - // **************************************************** + Utilities::shutdown(); } diff --git a/tests/lbpm_cell_simulator.cpp b/tests/lbpm_cell_simulator.cpp new file mode 100644 index 00000000..dd0b0d52 --- /dev/null +++ b/tests/lbpm_cell_simulator.cpp @@ -0,0 +1,159 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include "models/IonModel.h" +#include "models/StokesModel.h" +#include "models/PoissonSolver.h" +#include "models/MultiPhysController.h" +#include "common/Utilities.h" +#include "analysis/ElectroChemistry.h" + +using namespace std; + +//*************************************************************************** +// Test lattice-Boltzmann Ion Model coupled with Poisson equation +//*************************************************************************** + +int main(int argc, char **argv) +{ + // Initialize MPI and error handlers + Utilities::startup( argc, argv ); + Utilities::MPI comm( MPI_COMM_WORLD ); + int rank = comm.getRank(); + int nprocs = comm.getSize(); + + { // Limit scope so variables that contain communicators will free before MPI_Finialize + + if (rank == 0){ + printf("********************************************************\n"); + printf("Running LBPM electrokinetic single-fluid solver \n"); + printf("********************************************************\n"); + } + // Initialize compute device + int device=ScaLBL_SetDevice(rank); + NULL_USE( device ); + ScaLBL_DeviceBarrier(); + comm.barrier(); + + PROFILE_ENABLE(1); + //PROFILE_ENABLE_TRACE(); + //PROFILE_ENABLE_MEMORY(); + PROFILE_SYNCHRONIZE(); + PROFILE_START("Main"); + Utilities::setErrorHandlers(); + + auto filename = argv[1]; + ScaLBL_StokesModel StokesModel(rank,nprocs,comm); + ScaLBL_IonModel IonModel(rank,nprocs,comm); + ScaLBL_Poisson PoissonSolver(rank,nprocs,comm); + ScaLBL_Multiphys_Controller Study(rank,nprocs,comm);//multiphysics controller coordinating multi-model coupling + + bool SlipBC = false; + + // Load controller information + Study.ReadParams(filename); + + // Load user input database files for Navier-Stokes and Ion solvers + StokesModel.ReadParams(filename); + + // Setup other model specific structures + StokesModel.SetDomain(); + StokesModel.ReadInput(); + StokesModel.Create(); // creating the model will create data structure to match the pore structure and allocate variables + comm.barrier(); + if (rank == 0) printf("Stokes model setup complete\n"); + + IonModel.ReadParams(filename); + IonModel.SetDomain(); + IonModel.ReadInput(); + IonModel.Create(); + IonModel.SetMembrane(); + comm.barrier(); + if (rank == 0) printf("Ion model setup complete\n"); + fflush(stdout); + + // Create analysis object + ElectroChemistryAnalyzer Analysis(IonModel.Dm); + + // Get internal iteration number + StokesModel.timestepMax = Study.getStokesNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + StokesModel.Initialize(); // initializing the model will set initial conditions for variables + comm.barrier(); + if (rank == 0) printf("Stokes model initialized \n"); + fflush(stdout); + + IonModel.timestepMax = Study.getIonNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + IonModel.Initialize(); + IonModel.DummyFluidVelocity(); + comm.barrier(); + if (rank == 0) printf("Ion model initialized \n"); + // Get maximal time converting factor based on Sotkes and Ion solvers + Study.getTimeConvMax_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + + // Initialize LB-Poisson model + PoissonSolver.ReadParams(filename); + PoissonSolver.SetDomain(); + PoissonSolver.ReadInput(); + PoissonSolver.Create(); + comm.barrier(); + if (rank == 0) printf("Poisson solver created \n"); + fflush(stdout); + PoissonSolver.Initialize(Study.time_conv_max); + comm.barrier(); + if (rank == 0) printf("Poisson solver initialized \n"); + fflush(stdout); + + int timestep=0; + while (timestep < Study.timestepMax){ + + timestep++; + PoissonSolver.Run(IonModel.ChargeDensity,SlipBC,timestep);//solve Poisson equtaion to get steady-state electrical potental + comm.barrier(); + //if (rank == 0) printf(" Poisson step %i \n",timestep); + //StokesModel.Run_Lite(IonModel.ChargeDensity, PoissonSolver.ElectricField);// Solve the N-S equations to get velocity + //fflush(stdout); + + IonModel.RunMembrane(IonModel.FluidVelocityDummy,PoissonSolver.ElectricField,PoissonSolver.Psi); //solve for ion transport with membrane + comm.barrier(); + //if (rank == 0) printf(" Membrane step %i \n",timestep); + //fflush(stdout); + + //timestep++;//AA operations + + if (timestep%Study.analysis_interval==0){ + //Analysis.Basic(IonModel,PoissonSolver,StokesModel,timestep); + } + if (timestep%Study.visualization_interval==0){ + //Analysis.WriteVis(IonModel,PoissonSolver,StokesModel,Study.db,timestep); + // PoissonSolver.getElectricPotential(timestep); + //PoissonSolver.getElectricField(timestep); + //IonModel.getIonConcentration(timestep); + //StokesModel.getVelocity(timestep); + PoissonSolver.getElectricPotential_debug(timestep); + PoissonSolver.getElectricField_debug(timestep); + IonModel.getIonConcentration_debug(timestep); + } + } + + if (rank==0) printf("Save simulation raw data at maximum timestep\n"); + Analysis.WriteVis(IonModel,PoissonSolver,StokesModel,Study.db,timestep); + + if (rank==0) printf("Maximum timestep is reached and the simulation is completed\n"); + if (rank==0) printf("*************************************************************\n"); + + PROFILE_STOP("Main"); + PROFILE_SAVE("lbpm_electrokinetic_SingleFluid_simulator",1); + // **************************************************** + + } // Limit scope so variables that contain communicators will free before MPI_Finialize + + Utilities::shutdown(); +} + + diff --git a/tests/lbpm_color_simulator.cpp b/tests/lbpm_color_simulator.cpp index b9353c00..679e5f73 100644 --- a/tests/lbpm_color_simulator.cpp +++ b/tests/lbpm_color_simulator.cpp @@ -23,7 +23,7 @@ int main( int argc, char **argv ) { // Initialize - Utilities::startup( argc, argv ); + Utilities::startup( argc, argv, true ); { // Limit scope so variables that contain communicators will free before MPI_Finialize @@ -198,7 +198,7 @@ int main( int argc, char **argv ) } // Limit scope so variables that contain communicators will free before MPI_Finialize - + cout << flush; Utilities::shutdown(); return 0; } diff --git a/tests/lbpm_electrokinetic_SingleFluid_simulator.cpp b/tests/lbpm_electrokinetic_SingleFluid_simulator.cpp index e814f915..a15af979 100644 --- a/tests/lbpm_electrokinetic_SingleFluid_simulator.cpp +++ b/tests/lbpm_electrokinetic_SingleFluid_simulator.cpp @@ -79,17 +79,29 @@ int main(int argc, char **argv) IonModel.timestepMax = Study.getIonNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); IonModel.Initialize(); + // Get maximal time converting factor based on Sotkes and Ion solvers - Study.getTimeConvMax_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + //Study.getTimeConvMax_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + // Get time conversion factor for the main iteration loop in electrokinetic single fluid simulator + Study.time_conv_MainLoop = StokesModel.timestepMax*StokesModel.time_conv; // Initialize LB-Poisson model PoissonSolver.ReadParams(filename); PoissonSolver.SetDomain(); PoissonSolver.ReadInput(); PoissonSolver.Create(); - PoissonSolver.Initialize(Study.time_conv_max); + PoissonSolver.Initialize(Study.time_conv_MainLoop); + if (rank == 0){ + printf("********************************************************\n"); + printf("Key Summary of LBPM electrokinetic single-fluid solver \n"); + printf(" 1. Max LB Timestep: %i [lt]\n", Study.timestepMax); + printf(" 2. Time conversion factor per LB Timestep: %.6g [sec/lt]\n",Study.time_conv_MainLoop); + printf(" 3. Max Physical Time: %.6g [sec]\n",Study.timestepMax*Study.time_conv_MainLoop); + printf("********************************************************\n"); + } + int timestep=0; while (timestep < Study.timestepMax){ @@ -98,7 +110,6 @@ int main(int argc, char **argv) StokesModel.Run_Lite(IonModel.ChargeDensity, PoissonSolver.ElectricField);// Solve the N-S equations to get velocity IonModel.Run(StokesModel.Velocity,PoissonSolver.ElectricField); //solve for ion transport and electric potential - timestep++;//AA operations if (timestep%Study.analysis_interval==0){ Analysis.Basic(IonModel,PoissonSolver,StokesModel,timestep); @@ -116,7 +127,7 @@ int main(int argc, char **argv) if (rank==0) printf("Save simulation raw data at maximum timestep\n"); Analysis.WriteVis(IonModel,PoissonSolver,StokesModel,Study.db,timestep); - if (rank==0) printf("Maximum timestep is reached and the simulation is completed\n"); + if (rank==0) printf("Maximum LB timestep = %i is reached and the simulation is completed\n",Study.timestepMax); if (rank==0) printf("*************************************************************\n"); PROFILE_STOP("Main"); diff --git a/tests/lbpm_nernst_planck_cell_simulator.cpp b/tests/lbpm_nernst_planck_cell_simulator.cpp new file mode 100644 index 00000000..3ac2065c --- /dev/null +++ b/tests/lbpm_nernst_planck_cell_simulator.cpp @@ -0,0 +1,174 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#include "models/IonModel.h" +#include "models/StokesModel.h" +#include "models/PoissonSolver.h" +#include "models/MultiPhysController.h" +#include "common/Utilities.h" +#include "analysis/ElectroChemistry.h" + +using namespace std; + +//*************************************************************************** +// Test lattice-Boltzmann Ion Model coupled with Poisson equation +//*************************************************************************** + +int main(int argc, char **argv) +{ + // Initialize MPI and error handlers + //Utilities::startup( argc, argv ); + Utilities::startup( argc, argv, true ); + + Utilities::MPI comm( MPI_COMM_WORLD ); + int rank = comm.getRank(); + int nprocs = comm.getSize(); + + { // Limit scope so variables that contain communicators will free before MPI_Finialize + + if (rank == 0){ + printf("********************************************************\n"); + printf("Running LBPM Nernst-Planck Membrane solver \n"); + printf("********************************************************\n"); + } + // Initialize compute device + int device=ScaLBL_SetDevice(rank); + NULL_USE( device ); + ScaLBL_DeviceBarrier(); + comm.barrier(); + + PROFILE_ENABLE(1); + //PROFILE_ENABLE_TRACE(); + //PROFILE_ENABLE_MEMORY(); + PROFILE_SYNCHRONIZE(); + PROFILE_START("Main"); + Utilities::setErrorHandlers(); + + auto filename = argv[1]; + //ScaLBL_StokesModel StokesModel(rank,nprocs,comm); + ScaLBL_IonModel IonModel(rank,nprocs,comm); + ScaLBL_Poisson PoissonSolver(rank,nprocs,comm); + ScaLBL_Multiphys_Controller Study(rank,nprocs,comm);//multiphysics controller coordinating multi-model coupling + + bool SlipBC = false; + + // Load controller information + Study.ReadParams(filename); + + // Load user input database files for Navier-Stokes and Ion solvers + //StokesModel.ReadParams(filename); + + // Setup other model specific structures + //StokesModel.SetDomain(); + //StokesModel.ReadInput(); + //StokesModel.Create(); // creating the model will create data structure to match the pore structure and allocate variables + //comm.barrier(); + //if (rank == 0) printf("Stokes model setup complete\n"); + + IonModel.ReadParams(filename); + IonModel.SetDomain(); + IonModel.ReadInput(); + IonModel.Create(); + IonModel.SetMembrane(); + comm.barrier(); + if (rank == 0) printf("Ion model setup complete\n"); + fflush(stdout); + + // Create analysis object + ElectroChemistryAnalyzer Analysis(IonModel); + + // Get internal iteration number + //StokesModel.timestepMax = Study.getStokesNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + //StokesModel.Initialize(); // initializing the model will set initial conditions for variables + //comm.barrier(); + //if (rank == 0) printf("Stokes model initialized \n"); + //fflush(stdout); + + //IonModel.timestepMax = Study.getIonNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + IonModel.timestepMax = Study.getIonNumIter_NernstPlanck_coupling(IonModel.time_conv); + IonModel.Initialize(); + IonModel.DummyFluidVelocity(); + comm.barrier(); + if (rank == 0) printf("Ion model initialized \n"); + // Get maximal time converting factor based on Sotkes and Ion solvers + //Study.getTimeConvMax_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + Study.time_conv_MainLoop = IonModel.timestepMax[0]*IonModel.time_conv[0]; + + //----------------------------------- print out for debugging ------------------------------------------// + if (rank==0){ + for (size_t i=0;i +#include +#include +#include +#include +#include +#include +#include + +#include "models/IonModel.h" +#include "models/PoissonSolver.h" +#include "models/MultiPhysController.h" +#include "common/Utilities.h" +#include "analysis/ElectroChemistry.h" + +using namespace std; + +int main(int argc, char **argv) +{ + // Initialize MPI and error handlers + Utilities::startup( argc, argv ); + Utilities::MPI comm( MPI_COMM_WORLD ); + int rank = comm.getRank(); + int nprocs = comm.getSize(); + + { // Limit scope so variables that contain communicators will free before MPI_Finialize + + if (rank == 0){ + printf("*************************************************\n"); + printf("Running LBPM Nernst-Planck solver \n"); + printf("*************************************************\n"); + } + // Initialize compute device + int device=ScaLBL_SetDevice(rank); + NULL_USE( device ); + ScaLBL_DeviceBarrier(); + comm.barrier(); + + PROFILE_ENABLE(1); + //PROFILE_ENABLE_TRACE(); + //PROFILE_ENABLE_MEMORY(); + PROFILE_SYNCHRONIZE(); + PROFILE_START("Main"); + Utilities::setErrorHandlers(); + + auto filename = argv[1]; + ScaLBL_IonModel IonModel(rank,nprocs,comm); + ScaLBL_Poisson PoissonSolver(rank,nprocs,comm); + ScaLBL_Multiphys_Controller Study(rank,nprocs,comm);//multiphysics controller coordinating multi-model coupling + + // Load controller information + Study.ReadParams(filename); + + // Load user input database files for Ion solvers + IonModel.ReadParams(filename); + IonModel.SetDomain(); + IonModel.ReadInput(); + IonModel.Create(); + + // Create analysis object + //ElectroChemistryAnalyzer Analysis(IonModel.Dm); + + + //IonModel.timestepMax = Study.getIonNumIter_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + IonModel.timestepMax = Study.getIonNumIter_NernstPlanck_coupling(IonModel.time_conv); + IonModel.Initialize(); + IonModel.DummyFluidVelocity(); + + // Get maximal time converting factor based on Sotkes and Ion solvers + //Study.getTimeConvMax_PNP_coupling(StokesModel.time_conv,IonModel.time_conv); + // Get time conversion factor for the main iteration loop in electrokinetic single fluid simulator + Study.time_conv_MainLoop = IonModel.timestepMax[0]*IonModel.time_conv[0]; + + //----------------------------------- print out for debugging ------------------------------------------// + if (rank==0){ + for (size_t i=0;i #endif - #undef MPI_CLASS #define MPI_CLASS Utilities::MPI #define MPI_ASSERT ASSERT @@ -1195,6 +1195,144 @@ void testCommDup( UnitTest *ut ) #endif } +class gpuWrapper{ +public: + gpuWrapper(MPI_CLASS MPI_COMM, int MSG_SIZE); + ~gpuWrapper(); + void Send(double *values); + void Recv(double *values); + double *sendbuf, *recvbuf; +private: + MPI_Request req1[1],req2[1]; + MPI_CLASS comm; + int sendCount; + int recvCount; + int rank, rank_x, rank_X, nprocs; + int sendtag, recvtag; +}; + +gpuWrapper::gpuWrapper(MPI_CLASS MPI_COMM, int MSG_SIZE){ + comm = MPI_COMM.dup(); + rank = comm.getRank(); + nprocs = comm.getSize(); + sendCount=MSG_SIZE; + recvCount=MSG_SIZE; + ScaLBL_AllocateZeroCopy((void **) &sendbuf, sendCount*sizeof(double)); // Allocate device memory + ScaLBL_AllocateZeroCopy((void **) &recvbuf, sendCount*sizeof(double)); // Allocate device memory + rank_X = rank+1; + rank_x = rank-1; + if (rank_x < 0) rank_x = nprocs-1; + if (!(rank_X < nprocs)) rank_X = 0; +} + +gpuWrapper::~gpuWrapper(){ + ScaLBL_FreeDeviceMemory(sendbuf); + ScaLBL_FreeDeviceMemory(recvbuf); +} + +void gpuWrapper::Send(double *values){ + sendtag = recvtag = 130; + ScaLBL_CopyToDevice(sendbuf,values,sendCount*sizeof(double)); + req1[0] = comm.Isend(sendbuf,sendCount,rank_x,sendtag+0); + req2[0] = comm.Irecv(recvbuf,recvCount,rank_X,recvtag+0); +} + +void gpuWrapper::Recv(double *values){ + comm.waitAll(1,req1); + comm.waitAll(1,req2); + ScaLBL_DeviceBarrier(); + ScaLBL_CopyToHost(values,recvbuf,recvCount*sizeof(double)); +} + +// Test GPU aware MPI +void test_GPU_aware( UnitTest *ut ) +{ + constexpr size_t N = 1024*1024; + constexpr size_t N_msg = 64; + bool test = true; + // Get the comm to use + MPI_CLASS comm( MPI_COMM_WORLD ); + int rank = comm.getRank(); + int size = comm.getSize(); + try { + // Initialize the device + int device = ScaLBL_SetDevice(rank); + NULL_USE( device ); + // create wrapper for communications + gpuWrapper gpuComm(comm, N); + // Allocate and initialize the buffers + size_t bytes = N*sizeof(double); + double *device_send[N_msg] = { nullptr }; + double *device_recv[N_msg] = { nullptr }; + double *host_send[N_msg] = { nullptr }; + double *host_recv[N_msg] = { nullptr }; + for ( size_t k=0; kpasses("GPU aware MPI" ); + } else { + std::cout << "MPI is NOT GPU aware" << std::endl; + ut->failure("GPU aware MPI" ); + } +} + // This test will test the MPI class int main( int argc, char *argv[] ) @@ -1513,6 +1651,9 @@ int main( int argc, char *argv[] ) std::cout << std::endl; } + // Test GPU aware MPI + test_GPU_aware( &ut ); + } // Limit the scope so objects are destroyed // Finished testing, report the results