[DFDL-WG] A selection of example data formats
Steve Hanson
smh at uk.ibm.com
Wed Jun 15 08:52:23 CDT 2011
Hi Mike
More replies but this time I'll keep them together here as the Word doc
would get hard to read....
Tim and I have been thinking on similar lines as your "have enough
properties to determine that the length is zero". In addition to your
examples there are also:
- lengthKind="prefixed" and prefix length is 0
- lengthKind="explicit" and lengthCount expression evaluates to 0
Using the same sectioning as the document...
-------------------------------------------------
a) Fixed length, no delimiters
We agree that there should be no defaulting when the length is > 0.
Need to decide whether the length = 0 case implies defaulting, we think it
does as the property determines that the length is zero
b) Fixed length, only parent has delimiters
This boils down to whether we need to detect early termination. Spec and
yourself are clear that scanning is off when parsing fixed length. I'd
like to hear what Steph has to say on this.
c) Fixed length, initiators
You want to treat this the same as un-initiated fixed length. OK, but more
on this later under i)
e) Delimited, separators required
We agree that defaults should be applied when adjacent separators
encountered
f) Delimited, separators suppressed at end
We agree that defaults should be applied when adjacent separators
encountered and at the end
g) Delimited, initiators, separators required
We agree that defaults should be applied when adjacent separators
encountered
i) Delimited, initiators, separators suppressed
You want defaults to be applied when an element is entirely absent (B in
the example)
Tim and I struggle to differentiate this case from c). At the start of B
processing, there is nothing in the data to indicate B and the next thing
is C's initiator. So why is the defaulting rule different?
Take this one step further - my data is fixed length, initiated and the
parent has a suppressed separator - so which of c) and i) applies?
How does the parser know when a group has ended?
One of Tim's rules was when an enclosing delimiter is found. That is not
always the case. Tim suggested that if the immediate parent had
lengthKind="implicit" then we would not be looking for delimiters. I
believe your YES was agreeing with that? We would say it is also true if
the immediate parent had lengthKind = "explicit" or "pattern" too.
What is the algorithm for selecting the next occurrence?
Tim and I discussed this, and there is not an issue here. The
occursCountKind always tells you the number to expect (which might be
'don't know' if occursCountKind = "parsed" in which case we just
speculatively parse).
When parsing a group with separatorPolicy=suppressed, is every group
member a 'point of uncertainty'?
Agree with your statement.
----------------------------------
Other things to discuss:
Defaulting complex elements when parsing
The spec says that if zero length content is obtained for a complex
element then it is defaulted, which means the element's complex type is
walked and default values are sent to the infoset for required elements.
It is an error if any required elements do not have a default value. A
simpler alternative is to create just the element in the infoset with no
children, but this would fail validation if switched on.
Separator position
Any rules that we agree on must take into account infix v prefix v
postfix. In practice this determines how an element is 'bound' to a
separator. Prefix it is bound to the beginning, postfix it is bound to the
end, infix it is bound to the beginning except for the first element (need
to check with Steph is that is how WTX does it).
Regards
Steve Hanson
Architect, Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From:
"Mike Beckerle" <mbeckerle.dfdl at gmail.com>
To:
Tim Kimber/UK/IBM at IBMGB
Cc:
Steve Hanson/UK/IBM at IBMGB
Date:
11/06/2011 01:56
Subject:
RE: A selection of example data formats
My comments on your examples. I had to turn it into a word doc to
reasonably put my commentary inline into this.
I think the concept of an element declaration being classified into:
· Can be defaulted from nothing
· Can be defaulted from empty content (but requires some framing
to determine that the content is empty)
· Cannot be defaulted (requires at least some content bits,
possibly also some framing)
… I think this is something we’re in need of in the spec.
If the element can be defaulted from nothing, and it is required, and we
have nothing, i.e., no bits meaning that we have enough properties to
determine that the length is zero, then we default it to get the infoset
value. If it’s optional, then we don’t default it, and nothing goes into
the infoset.
This begs the question of “have enough properties to determine that the
length is zero”.
E.g., of this: end of data, end of parent, this element has no delimiters,
but lengthKind=delimited and a parent delimiter was immediately
encountered which terminates the element after zero bits.
lengthKind=”pattern”, lengthPattern=”a*”, and the data has no “a”
characters, so the length comes out zero, and no bits are consumed.
Recursively, length is zero for a group requires same properties to hold
inductively and for the group itself.
I’m not sure I’ve got all the cases here, but it’s something like this.
That’s all for my brain on DFDL today…..
From: Tim Kimber [mailto:KIMBERT at uk.ibm.com]
Sent: Thursday, June 02, 2011 4:41 PM
To: mbeckerle.dfdl at gmail.com
Cc: Steve Hanson
Subject: A selection of example data formats
Mike,
Steve asked me to forward this text file that I have put together. I put
it together as background material for our discussions about the parsing
of DFDL elements and groups.
Key issues:
- The specification uses the terms 'empty', 'missing' and 'known not to
exist' in reference to elements. We need to work out what these terms mean
so that the spec can be made clearer.
- In my opinion, the terms 'missing' and 'known not to exist' should not
have different meanings - it invites criticism. If 'missing' means
something different from 'known not to exist' then we need a different
word or phrase.
- The application of default values for missing required elements in the
parser is problematic. I think Steve may have sent you an email about
this, so I won't outline the issues here ( Steve, please can you forward
your email to me ).
Disclaimer : This set of data formats does not highlight all of the
unresolved questions around the parsing of groups - only the ones that
were in play at the time I produced the document.
regards,
Tim Kimber, Common Transformation Team,
Hursley, UK
Internet: kimbert at uk.ibm.com
Tel. 01962-816742
Internal tel. 246742
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20110615/0fe5354d/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Formats to consider.doc
Type: application/octet-stream
Size: 50688 bytes
Desc: not available
Url : http://www.ogf.org/pipermail/dfdl-wg/attachments/20110615/0fe5354d/attachment-0001.obj
More information about the dfdl-wg
mailing list