[DFDL-WG] String literal syntax for hexBinary ?? - Re: String literals - various usage patterns thereof
Tim Kimber
KIMBERT at uk.ibm.com
Fri Apr 20 04:25:43 EDT 2012
Mike,
Nice write-up. We're in broad agreement on this - and I agree that my
wording could have been better on that first point.
Case X1 : Agreed.
Case X2 : Unusual, but not really controversial. All that matters is that
the returned xs:string conforms to the DFDL String Literal syntax,
regardless of how it was constructed.
Case X2.5 : I don't see any new problems here. In general. DFDL String
Literals will need to be pre-processed before being compared against the
data stream.
Case X4: I agree that we need to be clear on this point. I am equally
clear that a DFDL String Literals should not be processed if they ( appear
to ) occur in a DFDL expression. So we need another rule:
Rule 4 : The syntax of a DFDL expression is the same as the syntax of an
XPath 2.0 expression, and does not include the DFDL entities.
Case X4.5: The dfdl:property function is an interesting one. There will be
situations where the return value is undefined ( e.g. because the DFDL
expression refers to parts of the info set that do not exist yet ). In
cases where the property is static ( not a DFDL expression ) or the
expression is resolvable, it should return the lexical value of the string
literal - not the sequence of bytes that it would match ( that would
depend on the value of dfdl:encoding on the element/group in question ) .
Case X5 : Agreed.
Rule 2: The rule needs to be a lot broader. There are many usages of DFDL
expressions that do not require the result to be interpreted as a DFDL
String Literal. In fact, unless the specification specifically states that
the result is a DFDL String Literal ( or a list thereof ) the DFDL
processor should treat the result as a logical value of the type returned
by the DFDL (XPath 2.0 ) expression.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: Tim Kimber/UK/IBM at IBMGB
Cc: dfdl-wg at ogf.org
Date: 19/04/2012 16:01
Subject: Re: [DFDL-WG] String literal syntax for hexBinary ?? - Re:
String literals - various usage patterns thereof
On Thu, Apr 19, 2012 at 4:35 AM, Tim Kimber <KIMBERT at uk.ibm.com> wrote:
- DFDL expressions must not *contain* DFDL String Literals. They must be
valid XPath 2.0 expressions except that the list of allowable function
names includes the DFDL extension functions.
I'm pretty sure the above statement isn't right, or doesn't mean to me
what you intended.
Some expressions return string literals, and so their component parts must
be able to contain string literal syntax or fragments thereof. What we
don't want is for the semantics to require that those string literal
syntaxes be interpreted by the xpath processor.
Let me analyze this by cases. Below are what I think are the right
behaviors.
Case X1:
Appearing in dfdl:initiator="{ '%#234;' } The result for the initiator is
one character, exactly as if one had written dfdl:initiator="%#234;" That
is, the return value of the expression is then subsequently treated as a
string literal. So I could also return a whitespace separated list of
initiators if I wanted to.
The implications of this are that a few things one might want to return
from an expression will cause issues. Ex: suppose dfdl:separator="{...}"
and the expression wants to return a space character. In that case one
must check for that and return "%SP;" instead. Whitespace generally will
cause issues. Similarly "%" has to be "%%". This is a headache, but I
feel it is preferable to having different sets of rules for expression and
non-expression cases. Doing this escapifying does require a replace
function on strings, as has been pointed out elsewhere. Just a basic
replace might not be sufficient. We might want a dfdl:escapify(...)
function to deal with the all-varieties-of-whitespace issue.
Case X2:
Appearing in dfdl:initiator="{ fn:concat('%#23', '4;') }" also represents
one character, as it is the result of the xpath evaluation that we analyze
to see what it means.
I'm expecting this to be controversial. But again it is the result of the
expression that is a string 'literal'.
Case X2.5:
Suppose I have a header field. If the value is N, it means terminator is
ASCII null. So I want to write
dfdl:terminator="{ if (headerIndicator = 'N') then '%NUL;' else ';' }"
In that case I really do need to post process the expression to find the
%NUL; and convert to a zero codepoint value. I can't see any other way to
get the zero codepoint into the terminator expression in this case. This
case X2.5 doesn't introduce anything new, it's just amplifying the point
of case X2.
Case X3:
Appearing in <element name="foo" type="xs:string" dfdl:inputValueCalc="{
'%#234;' }"/>
I am pretty sure this is 6 characters. It's a string value. There is
nothing said about string literals here.
Case X4:
Appearing in <sequence dfdl:separator="{ if ('%#x2c;' = ',') then ';' else
'!' }">....</sequence>
The above would appear to need to interpret the dfdl string literals as
soon as they are created down within the expression. That is the right
thing, but I suggest we could live without this.
We need to be very clear if we want to say only the result of an Xpath is
ever interpreted for dfdl entities and then only for certain properties.
Case X4.5
Ouch check this out:
<sequence dfdl:initiator="{ '%#x2c;' }" dfdl:terminator=","
dfdl:separator="{ if (dfdl:property('initiator') =
dfdl:property('terminator')) then ';' else '!' }"> .... </sequence>
Does dfdl:property return the value after or before entities have been
replaced?
I'm assuming here it returns the "value" of the property, i.e., any
expressions have been evaluated. But has the entity substitution been
done?
I believe the right answer here is that the value of the property is the
value before DFDL entities have been replaced. That prevents a referential
transparency gap, and a bunch of totally bizarre stuff like people using
delimiters just to get the entities substitution done, asking for the
value of them with dfdl:property(...), and then redefining the delimiter
back to say "". (Basically, we want to avoid exposing the implementation's
entity processing behavior as a user-visible behavior.)
Case X5:
Appearing in <element name="bar" type="xs:string" default="{ '%#234;' }"/>
it's 12 characters, because it's not even an expression when it appears in
XSD string literal context.
I'm not expecting any controversy here. This seems weird, but it is part
of being embedded properly in XSD.
Summary:
I think there are rules we need to articulate.
Rule 1: if a DFDL property takes an expression in addition to other
literal syntax (enum, or string literal of some kind), then the expression
can return a string containing the same syntax as the enum or string
literal that the property accepts, and it is interpreted the same way.
We do have one exception to this already unfortunately, which is we don't
allow an expression to return "" in case of delimiters (thereby
dynamically shutting off the use of the delimiter).
(Side note: I no longer require this restriction. I asked for this, and I
still think it's probably a good idea, but my concern when I asked for
this restriction was based on implementation concerns. Much more
implementation thought has gone into this now, and the planned
implementation technique can handle this, so I don't see a requirement
here anymore. Apologies for flip-flopping on this issue.)
Rule 2: in a DFDL xpath expression that returns a string value
(inputValueCalc - is this the only case?) the value is not examined for
DFDL entities.
Rule 3: dfdl:property returns the value of a property before any DFDL
entities replacements have been done.
So dfdl:textStandardDecimalSeparator="{ fn:concat('%#x2', 'c;') }" works,
creates a %#x2c; which is the codepoint for a comma I believe.
but... dfdl:textStandardDecimalSeparator="{ if (fn:concat('%#x2', 'c;')
= ',') then ',' else ' %SP;' }" the predicate fails because the
intermediate result of the concat is not examined for DFDL entities, so
the result is %SP;. That entity is however interpreted correctly as a
space character because the final result of the expression IS examined for
entities.
- A DFDL expression is sometimes allowed to *return* a DFDL String
Literal. In this case, the returned value is an xs:string that conforms to
the DFDL String Literal syntax. But that does not apply to your example
because the dfdl:inputValueCalc must return a value ( an XML value ) that
is valid for the type of the element.
Agreed. I had to argue myself into it, but I do think this is right now.
I think that corresponds to your answer a) ; 'DEADBEEF' is a valid
xs:hexBinary lexical value.
This issue seems orthogonal to me now. I do agree that if XSD allows
"DEADBEEF" as a literal for the default value of a hexBinary, then DFDL
expressions should do the same.
...mikeb
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20120420/e1a52d09/attachment.html>
More information about the dfdl-wg
mailing list