initial checkin -- error reporting architecture

git-svn-id: svn+ssh://svn.gnucash.org/repo/gnucash/trunk@6417 57a11ea4-9604-0410-9ed3-97b8803252fd
This commit is contained in:
Linas Vepstas 2001-12-29 17:53:51 +00:00
parent da601a847d
commit 0bfd17442e

189
src/doc/backend-errors.txt Normal file
View File

@ -0,0 +1,189 @@
Handling Backend Communications Errors
--------------------------------------
Architectural Discussion
December 2001
Proposed/Reviewed, Linas Vepstas, Dave Peticolas
Problem:
--------
What to do if a serious error occurs in a backend while
GnuCash is being used? For example, what happens if the connection
to the SQL server is lost, because the SQL server has died, and/or
because there is a network problem (unplugged ethernet cable, etc.)
Discussion:
-----------
There are a set of macros in the Postgres backend that check for
a Postgres error, and completely shut down the connection to the
Postgres server whenever even a minor error occurs. This is
excessively harsh. How to do better?
The "Handle it Automatically in the Backend" idea:
--------------------------------------------------
Detect the error in the backend, and do something 'intelligent'
in the backend, trying to recover from it. What one does depends on
the actual context (depending one what is going on in the code at that
point.) In other words, implement automatic session-reconnection in
the backend.
To do this, you can't just handle the errors in the macros (SEND_QUERY,
FINISH_QUERY, etc) since it depends on the context and how much work
you've sent to the postgres process so far. One error that would
be nice to be able to recover from is a simple loss of connection (the
postmaster gets killed and restarted). This might require one to
'replay' some last few queries,
The "Generic Handler, Report it to the User" idea:
--------------------------------------------------
There's a simple, direct thing we should get working first:
Go ahead and close the connection, but then return to the engine
in some nice way, let the engine report the error by GUI, and then
allow the user to initiaite a new session (or maybe try to do it
automatically): and do all this without deleting all the accounts
and transactions.
Its some fair amount of work just to untangle the flow of control
for this case, and leave gnucash in a usable state without having
an open session.
I like this for several reasons:
-- its generic, it can handle any backend error anywhere in the code.
You don't have to second-guess based on whether some recent query
may or might not have completed.
-- I beleive that reconnect will be quicker, because you won't need
reload piles of accounts and transactions.
-- If the user can't reconnect, then they can always save to a file.
This can be a double bonus if done right: e.g. user works on laptop,
saves to file, takes laptop to airport, works off-line, and then
syncs her changes back up when she goes on-line again.
Discussion:
----------
> Should the backend try reconnecting first, or just go ahead and
> return an error condition immediately? If the latter, then the
> current backend error-handling can just stay as it is and the gui
> codes need to add checks in several places, right?
The backend can try reconnecting automatically. But lets think through
what this implies, and we'll see its not that good an idea:
It will need to remember the user's password to reconnect (It currently
drops the passwd as a security precaution). I don't have an opinion
as to whether it should log the reconnect in the gncSession table.
I don't know if it should try to do a streamlined reconnect -- e.g.
skip checking the version numbers ... but maybe the SQL server was
rebooted (or at least, all users were kicked) precisely because the
version numbers changed ??
The problem with automatic reconnect from within the backend is that you
don't know quite where to restart... or rather, you have trouble getting
to the right place to restart. Take for example
pgendStoreTransaction (PGBackend *be, Transaction *trans)
{
/* lock it up so that we store atomically */
bufp = "BEGIN;\n"
"LOCK TABLE gncTransaction IN EXCLUSIVE MODE;\n"
"LOCK TABLE gncEntry IN EXCLUSIVE MODE;\n";
SEND_QUERY (be,bufp, );
FINISH_QUERY(be->connection);
pgendStoreTransactionNoLock (be, trans, TRUE);
bufp = "COMMIT;\n"
"NOTIFY gncTransaction;";
SEND_QUERY (be,bufp, );
FINISH_QUERY(be->connection); // << network error occurs here!!!
Well, you can't just re-login, and reissue the commit. You really need
to rewind to the begining of the subroutine. How can you do this?
Alternative 1) wrap this routine:
pgendStoreTransaction (PGBackend *be, Transaction *trans)
{
do {
pgendIfNotLoggedInThenReLogin(be);
pgendStoreTransactionOnceOnly(be, trans);
} while (NO_ERROR ! pgendGetError());
}
well, maybe not infinite loop, maybe three retries or something.
Alternative 2) throw an error, let some much higher layer catch it.
Well, approach 1) seems reasonable... until you think about what happens
if three retries doesn't cut it: then you have to throw an error
anyway, and hope the higher layer deals with it. So even if you
implement 1), you *still* have to implement 2) anyway.
So my attitude is to skip doing 1 for now (maybe we can add it later)
and just make sure that when we "throw" the error, it really does behave
like a throw should behave, and short-cuts its way up to where its
caught. The catcher should probably be a few strategic places in the
GUI, like wherever a xaccQuery() is issued, and wherever an
xaccTransCommitEdit() is issued (which is hopefully not a lot of
places ?).
What's the point of doing 2 cleanly? Because I suspect that most
network errors won't be automatically recoverable. Most likely,
either someone tripped over an ethernet cable, or the server crashed,
and you gotta call the sysadmin on the phone, etc. The goal is not
to crash the client when the network is down, but rather let the user
continue to work off-line (rather than a forced coffee break).
Alternately, user might take a forced coffee break, and 10 minutes
later, manually reconnects and resumes work ... without having to
stop & restart gnucash, without having to close and reopen a register,
re-run a report window, etc. Because its the re-opening of the
app that is the major pain in the butt.
How to Report Errors to the GUI
-------------------------------
> How would the engine->GUI error reporting happen? A direct callback?
> Or having the GUI always check for session errors?
We should use the session error mechanism for reporting these errors.
Note that the API allows a simple 'try-throw-catch' style error
handling in C. Because we don't/can't unwind the stack as a true
'throw' would, we need to make sure that when we "throw" the error,
it emulates this as best it can: it short-cuts its way up and out of
the engine, to where its caught in the GUI. The catcher should probably
be a few strategic places in the GUI, like wherever a xaccQuery() is
issued, and wherever an xaccTransCommitEdit() is issued.
Unfortunately, there are a *lot* of places where these calls are
issued, and therefore, its a lot of work to modify all of these places
to check for an error condition. It would simplify things if there
was also a callback medchanism.
Propose:
Maybe gnc-event.h should be extended to generate events for errors
as well ...
How about this idea:
change gnc_session_push_error() so that it calls
gnc_engine_generate_event (GUID_of_session, GNC_EVENT_ERROR)
The GUI would register a handler; the handler would call
gnc_session_get_error() to find out the details of the error; and
maybe put a popup on the screen, maybe set some flags so that the
GUI starts working differently...
This would save a *lot* of trouble of having to check the error code
in the zillion places where CommitEdit is called. Of course, if the
error occurs, then all the code that executes following the CommitEdit
is 'suspect', and is potentially buggy/non-robust in the face of that
error. Alligators lie here ...
============================== END OF DOCUMENT =====================