[DFDL-WG] outputValueCalc and unparse example
Steve Hanson
smh at uk.ibm.com
Tue Jun 9 05:42:16 CDT 2009
Mike
I'd like to state what we said were the use cases yesterday. There were
three. .
Use case 1
Element "val" is fixed length, length known at design time and provided by
dfdl:length="x" on input and output.
On output, the infoset data for "val" is padded to the length.
Use case 2
Element "val" is fixed length, length known at runtime and provided by
dfdl:length="{..\len}" on input and output.
On output the infoset provides the value for element "len".
On output, the infoset data for "val" is padded to the length according to
the rules for dfdl:lengthKind='explicit'/'implicit'.
Use case 3
Element "val" is really variable length, length only known once the data
is serialised, and provided by dfdl:length="..\len" on input.
On output the value of element "len" is set only once the length of "val"
is known.
On output, the infoset data for "val" is not padded to the length.
You've added a variation to use case 3 in your example, where there is a
need to add some padding. Let's call it use case 4.
Alan and I have explored an alternative, where dfdl:length is always used
for all use cases. The difference for use case 3 & 4 is that the value of
element "len" is only set during the processing of "val". Instead of using
a flag, with accompanying output length property, to signal case 3 & 4, we
use an extra parameter on dfdl:length() that says whether to use padding
or not when dfdl:lengthKind="explicit"/"implicit". Note that any escape
scheme must and will be taken into account (to answer your question).
For use case 3 when no padding is needed you example simplifies to the
following. When "len" is encountered, there is an outputValueCalc that
references "val" so the unparser defers the setting of the value of "len".
When it gets to "val", it knows it must work out its unpadded length, and
set that in "len", before doing any length related processing for "val".
<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
dfdl:length(../val, false) !-- false => no pad
}" />
... many elements in between ....
<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
For use case 4 when some padding might be needed you example simplifies to
the following. When the unparser starts to process "val", it works out
the unpadded length, uses it in the expression and generates the value for
"len". When it does the length processing for "val" it pads to the value
of "len".
<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
fn:ceiling(dfdl:length(../val, false) div 4) * 4
}" />
... many elements in between ....
<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
A variation on use case 4 is when we need to pad to a minimum length.
<sequence>
<element name="len" type="int"
dfdl:outputValueCalc=
"{
fn:min(dfdl:length(../val, false), 20)
}" />
... many elements in between ....
<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:length="{ ../len }"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
You might be tempted to ask why the minimum is explicitly added. It's
because, as currently spec'd, xs:minLength facet (and dfdl:outputMinLength
for non-strings) are not used when dfdl:lengthKind="explicit". We could
change this but it does make the padding rules more complicated. We opted
for leaving the padding rules simpler.
Yesterday we also dsicussed whether implict/explicit needed to change.
With the above scheme we think a change is not necessary.
Regards
Steve Hanson
Programming Model Architect
WebSphere Message Brokers
Hursley, UK
Internet: smh at uk.ibm.com
Phone (+44)/(0) 1962-815848
"Mike Beckerle" <mbeckerle.dfdl at gmail.com>
Sent by: dfdl-wg-bounces at ogf.org
09/06/2009 04:30
Please respond to
mbeckerle.dfdl at gmail.com
To
<dfdl-wg at ogf.org>
cc
Subject
[DFDL-WG] outputValueCalc and unparse example
I did not get as far as I wanted to on this issue. I would like to discuss
this example:
<sequence>
<element name="len" type="int"
dfdl:fillByte="%#r0;"
dfdl:outputValueCalc=
"{
dfdl:representation-output-length(../val)
}" />
... many elements in between ....
<element name="val" type="string"
dfdl:encoding="utf-8"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="bytes"
dfdl:useLengthForOutput="false"
dfdl:length="{ ../len }"
dfdl:outputLength="{
fix:ceiling(
dfdl:representation-inherent-length(.) div 4
) * 4
}"
dfdl:textTrimKind="padChar"
dfdl:textStringJustification="left"
dfdl:textPadCharacter="%#r0;"
/>
</sequence>
You will notice I added a dfdl:outputLength property, and a
dfdl:representation-output-length() function and
dfdl:representation-inherent-length().
I am accepting candidates for better names for these properties and
functions. We need to distinguish these 3 concepts:
1) inherent length – of the infoset item without reference to any facets,
and with out respect to escape sequences, padding or truncation.
(TBD: think about escape sequences? Is this right)
2) output target length – the length of the box we’re filling in with the
data value representation. The box can be bigger or smaller than the
inherent length, which implies use of padding/filling, or truncation.
3) input length – length of the box we’re getting when parsing. The
inherent length of the value after parsing can be smaller than the length
of the box due to removal of escape characters, and the trimming of
padding.
--
dfdl-wg mailing list
dfdl-wg at ogf.org
http://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/dfdl-wg/attachments/20090609/ad1da997/attachment-0001.html
More information about the dfdl-wg
mailing list