Instead of unconditionally issuing MPI_Abort if we encounter a fatal
exception, we try to test whether all processes have experienced this
exception and if this is the case just terminate nomally with a exit
code that signals an error. We still use MPI_Abort if not all
processes get an exception as this is the only way to make sure that
the program aborts.
This approach also works around issues in some MPI implementations
that might not correctly return the error.
Multiple messages like this are gone now:
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
[] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Bu we still see something like this:
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[35057,1],0]
Exit code: 1
Program threw an exception: [/home/mblatt/src/dune/opm/opm-simulators/opm/simulators/timestepping/AdaptiveTimeSteppingEbos.hpp:586] Solver failed to converge after cutting timestep 11 times.
Simulation aborted: Solver failed to converge after cutting timestep 11 times.
Which seems more user friendly.
There is a strange interaction when using MPI and OpenMP on some
hardware/MPI implementations. I a serial run omp_get_num_procs() would
return the number of processors but when started under mpirun it would
always return 1.
With this we now allow users to use any amount of threads.
While we reported that we used the number of threads that were passed
on the command line, we never really used it for OpenMP but always
sticked to two unless environment variable OMP_NUM_THREADS was set.
Note that because the ThreadManager in opm-models would always use the
command line option and hence the linearizer would use that number of
Please note that the only use of OpenMP in opm-common (volume
calculation in EclipseGrid) is not effected by this as it happens
before we set the number of OpenMP threads.