There cannot happen any collective blocking communication within a
parallel try-catch clause if exceptions might be thrown before the
communication. The communication has to either be reached by all
processes or no processes.
Although not declared as such, prepareTimeStep seems to be an internal
function (despite usage in a test) and hence error control can be done
in code calling it.
There was the following problem with the try-catch approach taken:
The calling site `BlackoilWellModel::assemble` looked like this:
```
OPM_BEGIN_PARALLEL_TRY_CATCH();
{
if (iterationIdx == 0) {
calculateExplicitQuantities(local_deferredLogger); // no parallel try-catch
prepareTimeStep(local_deferredLogger); //includes parallel try-catch
}
updateWellControls(local_deferredLogger, /* check group controls */ true);
// Set the well primary variables based on the value of well solutions
initPrimaryVariablesEvaluation();
maybeDoGasLiftOptimize(local_deferredLogger);
assembleWellEq(dt, local_deferredLogger);
}
OPM_END_PARALLEL_TRY_CATCH_LOG(local_deferredLogger, "assemble() failed: ",
terminal_output_);
```
calculateExplicitQuantities had no parallel-try-catch clause inside,
but prepareTimeStep had one.
Unfortunately, calculateExplicitQuantities might throw (on some
processors). In that case non-throwing processors will try to trigger a
collective communication (to check for errors) in
prepareTimeStep. While the one throwing will move to the
OPM_END_PARALLEL_TRY_CATCH_LOG macro at the end and also trigger a different
collective communication. Booom, we have a deadlock.
With this patch there is no (nested parallel)-try-catch clause in the
functions called. (And if an exception is thrown in prepareTimeStep, it
will be logged as being an assemble failure).
The other option would have been to add parallel-try-catch clauses
to all functions called. That would have created a lot more
synchronization points limiting scalability even further.
Not a big fan of Macros but here at least they seem ot be the only
option. The problem is that the catch clauses must all catch the same
exceptions that have a entry in ExceptionType, because they might be
nested. In addition we did not have a catch all clause, which is added
now and is needed in case a called method throws an unexpected exception.
Expect non-reference type shared pointers arguments instead of references
to shared pointer. This will make it clear to the caller that the called
function is making a copy of the pointer for its own use and not trying
to modify the original pointer of the caller.
For 10 Million cell problems my compute server (with 128 GB Ram)
starts to swap, when I use debugging tools in parallel runs. I assume
that this might get an issue for others, too.
Now we consistently use unordered_map for the mapping.
Adds a new constructor to Main.hpp that takes shared pointers to Deck,
EclipseState, Schedule, and SummaryConfig. This makes it possible to
share these variables with Python without worrying about lifetime issues
of the underlying C++ objects. For example, a Python script can first
create an opm.io.schedule.Schedule object which is modified from Python.
Then, assume the same Python script creates an
opm.simulators.BlackOilSimulator which is initialized with the same
schedule object. Since the underlying C++ object is a shared pointer,
the Schedule object in Python may go out of scope (get deleted by Python)
without having the C++ schedule object being deleted. And the Python
BlackOilSimulator may continue to be used after the Python Schedule object
has been deleted since it still has a valid C++ schedule object.