[DFDL-WG] lengthUnits bits not allowed for strings, binary floats, hexBinary
Steve Hanson
smh at uk.ibm.com
Fri Jul 25 06:13:00 EDT 2014
Please bear in mind that if this proposal is accepted and an erratum
created, IBM can give no timescale as to when IBM DFDL will be updated to
support it. So you are advised to continue specifying lengths of text
elements in units of characters, accompanied by a suitable comment, for
interoperability.
Regards
Steve Hanson
Architect, IBM DFDL
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: "dfdl-wg at ogf.org" <dfdl-wg at ogf.org>,
Date: 23/07/2014 23:35
Subject: [DFDL-WG] lengthUnits bits not allowed for strings, binary
floats, hexBinary
Sent by: dfdl-wg-bounces at ogf.org
DFDL spec currently says this w.r.t. lengthUnits property:
'bits' may only be used for xs:boolean, xs:byte, xs:short, xs:int,
xs:long, xs:unsignedByte, xs:unsignedShort, xs:unsignedInt, and
xs:unsignedLong simple types with binary representation.
This feels like a hold over from when we only had strings made up of 8-bit
byte code-units. Now that we have 7-bit and 6-bit characters, this
restriction seems unnecessary, and is in fact awkward because many specs
specify the lengths of strings in bits (which are used universally for the
length of everything in these formats). This is a real concern. In many
cases DFDL Schemas will be generated from other specifications by
programs. Having to conditionally convert the length as specified into
different units for strings is just one more place to have to test, one
more way the DFDL schema doesn't obviously match the specification from
which it was derived, etc.
Similarly:
'bytes' must be used for type xs:hexBinary.
'bytes' must be used for types xs:float and xs:double with binary
representation.
These are to prevent the user misunderstanding the limitations of these
types. I.e., that we dont support hexbinary that is not a multiple of
8-bits in size, and float and double that are not exactly 4 and 8 bytes
respectively.
But now this restriction just seems annoying. If my data format
specification base has all these values in bits, then it is painful when
creating a DFDL schema to have to transform the values for just those
element declarations that are of these types.
I'm not suggesting we lift the actual restrictions. I'm good with
hexBinary requiring whole bytes, and that float and double are exactly
32-bits and 64-bits respectively. I just think having to use bytes as the
length units is just arbitrary. We thought it would be preventing people
from making mistakes, but in fact it is likely to have the opposite
effect, forcing them to have to interpret the length differently based on
a type that might not even be defined in the same file where they see the
dfdl:length property. Consider:
<element name="x" type="foo:xType" dfdl:length="448"/>
Is that 448 correct? It depends on the definition of foo:xType. If it's a
simple type derived from string, then length units has to be characters or
bytes, but in all the formats where I see these 448's. They are measured
in bits. This is 56 bytes, holding 64 characters. But when I write out
this element I don't have the information right there to know whether to
divide by 8 or 7 or not without knowledge of the type.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20140725/85554de2/attachment.html>
More information about the dfdl-wg
mailing list