[Nsi-wg] ServiceException needs further details
Henrik Thostrup Jensen
htj at nordu.net
Wed Dec 21 05:44:27 EST 2011
On Thu, 15 Dec 2011, John MacAuley wrote:
> I took and action to start the error handling discussion so that we, as a group, can document
> the error messages and behaviors. I would like to start it off with when an NSIServiceExcepti
> on is returned as a SOAP fault to a request, and when it is returned in a specific failed resp
> onse message.
>
> OpenDRAC ...
[snip]
So a lot of these are more policy than mechanism, and could be subject to
change. Lets focus on the error codes.
What does the SVC prefix stand for? (and why a prefix at all, and why not
"ERR" or "ERROR", which would be somewhat more intuituve.
> Here is a list of error messages currently implemented in OpenDRAC. The list continues to exp
> and. I have kept the text generic with the specific error values being returned in the associ
> ated attribute list. We will also need to agree on the format of the message/errorId.
I think we also need a plan with the error codes and their classification.
Do we provide the errors in order to tell a user went wrong (in which case
a string will suffice), or do we provide error codes so a client can
intelligently handle some cases, or both?
The answer is probably the latter, with some semantics for errorId, which
can enable the client to automatically classify and potentially recover
from the error. The distinction between text and variables are somewhat
artifial and only makes sense for missing of invalid parameters. If we
assume that the error string is for humans only, the distincition between:
text: Missing parameters: Start time, Dest STP.
and
text: Missing parameters
variables: ["Start time", "Dest STP"]
Is just unneeded complexity. If a client is missing a parameter, it
probably won't be able to change the request and fill it out automatically
by looking at the error response. It should just be fixed and send the
parameter in the first place.
What could make sense is that the NSI agent replies that it understands
the request, but could for some reason not fulfill it, e.g., a path could
not be found. In this case the client could retry the request elsewhere.
However the client should not care about why the request could not be
fulfilled as it highly unlikely that it would usefull anyway (it would
however be usefull to provide back to the user, if a second or third
request fails consecutively).
> MISSING_PARAMETER, "SVC0001", "Invalid or missing parameter"
> UNSUPPORTED_OPTION, "SVC0002", "Parameter provided contains an unsupported value which MUST be
> processed"
How is this different from "invalid" in the previous? Is this a "i know
this should be supported, but it isn't" ?
Both of these would probably be equivalent to HTTP 400 (BAD_REQUEST), in
which case a request should not be retried with being modified. While I
can see the distinction between a missing, invalid, or unsupported, the
end result is the same - human intervention is needed.
In the case where the service knows the semantics, but hasn't implemented
it, the HTTP 501 (NOT_IMPLEMENTED) would be suitable.
> ALREADY_EXISTS, "SVC0003", "Schedule already exists for connectionId"
Maybe "CONNECTION_EXISTS" or "CONNECTION_CONFLICT" as name.
This would be equivalent to HTTP 409 (CONFLICT).
> DOES_NOT_EXIST, "SVC0004", "Schedule does not exists for connectionId"
Maybe "CONNECTION_NONEXISTENT".
This would be equivalent to HTTP 404 (NOT_FOUND)
> MISSING_SECURITY, "SVC0005", "Invalid or missing user credentials"
The termin "Missing security is highly misleading. I strong suggest
something else, perhaps: "UNUATHORIZED".
Would be equivalent to HTTP 401 (UNAUTHORIZED).
> TOPOLOGY_RESOLUTION_STP, "SVC0006", "Could not resolve STP in Topology database"
> TOPOLOGY_RESOLUTION_STP_NSA, "SVC0007", "Could not resolve STP to managing NSA"
3 or 4 consecutive nouns following each other make a rather poor error
name IMHO. Also topology and NSI are so interwoven, that we don't really
need the topology word. How about "UNKNOWN_STP"?
Do we expect the latter error to ever come up (we know the stp, but not
the nsa for it - i would call this a topology description error).
These would correspond to HTTP 422 (UNPROCESABLE_ENTITY).
> PATH_COMPUTATION_NO_PATH, "SVC0008", "Path computation failed to resolve route for reservation
Do we really need to say path twice? How about "NO_PATH_FOUND".
For http this would probably also be 422, though this one does not have a
clear fit.
> INVALID_STATE, "SVC0009", "Connection state machine is in invalid state for received message"
Invalid state has a bad ring to it. How about "INVALID_TRANSITION.
For http this would be 422, though 405/406 could be misued for them.
> INTERNAL_ERROR, "SVC0010", "An internal error has caused a message processing failure"
Would correspond to 500.
> INTERNAL_NRM_ERROR, "SVC0011", "An internal NRM error has caused a message processing failure"
The distinction between NSA and NRM is an artificial one, and in some
cases they are the same (e.g., OpenNSA can speak directly to JunOS boxes).
For the client, the result is the same: "The thing in the other didn't
work". For humans/operators the distinction is important, but I would say
the error code is for clients, and the error string for humans.
> STP_ALREADY_IN_USE, "SVC0012", "Specified STP already in use"
I would call this "STP_UNAVALABLE", as we are dealing with a time span for
the reservation. The "In use" reflects a current sitauation, which is
rarely the case for us. The message should be something like "Specified
STP not available in specified time span".
In HTTP this one is a bit tricky, but 422 is probably the best fitting.
> BANDWIDTH_NOT_AVAILABLE, "SVC0013", "Insufficent bandwidth available for reservation"
>
> Would people like to add to the list?
Maybe something for a connection which used to exist, but is now
terminated or no longer available. I know this could fall under
"DOES_NOT_EXIST", or "INVALID_STATE", but none of these actually capture
what happened.
This would be equivalent to 410 (GONE).
Furthermore, something stating the the resource is not available for the
specified user could be appropiate (corresponding to 401 (UNAUTHORIZED) in
http.
I've given mappings to HTTP status code the error codes. Most mappins are
straightforward, but a couple are a bit edge and can be discussed. While
not perfect, http codes are well understood by many developers, have clear
semantics for request retry and modification, and have been well tested
over a significant amount of time. Why do we need to invent our own? Of
course we would only adapt the 400/500 class codes as the other classes
does not make sense with our current protocol model.
Best regards, Henrik
Henrik Thostrup Jensen <htj at ndgf.org>
NORDUnet / Nordic Data Grid Facility.
More information about the nsi-wg
mailing list