[DFDL-WG] Fw: Action 205: whitespace in DFDL annotations
Steve Hanson
smh at uk.ibm.com
Wed Jul 10 05:06:09 EDT 2013
Suman
Please can you review the proposal below, we would like to close on this
on the next WG call.
We are particularly interested in why you chose xs:string for DFDL string
literal, but then xs:token for List of DFDL string literal, as that means
there is different whitespace behaviour for the same DFDL string literal
depending on which property it is used in, which does not sound right.
<xsd:simpleType name="DFDLStringLiteral">
<xsd:restriction base="xsd:string">
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="ListOfDFDLStringLiteral">
<xsd:list itemType="xsd:token"/>
</xsd:simpleType>
I was expecting to see:
<xsd:simpleType name="DFDLStringLiteral">
<xsd:restriction base="xsd:token">
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="ListOfDFDLStringLiteral">
<xsd:list itemType="dfdl:DFDLStringLiteral"/>
</xsd:simpleType>
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
----- Forwarded by Steve Hanson/UK/IBM on 10/07/2013 09:49 -----
From: Steve Hanson/UK/IBM
To: dfdl-wg at ogf.org,
Date: 09/07/2013 12:02
Subject: [DFDL-WG] Action 205: whitespace in DFDL annotations
For discussion on today's WG call. Action 205 was raised to ensure that
DFDL 'property types' are declared with XML Schema types that provide the
correct whitespace handling behaviour. The XML Schema types of the various
DFDL 'property types' are given in Part 1 of IBM's Schemas-for-DFDL.
The question boils down to whether a 'property type' should be an
xs:string or xs:token. The former preserves whitespace, the latter
normalizes and trims. (Note that xs:NMTOKEN is intended for attributes
only, so should not be used for DFDL properties as they can be expressed
in attribute or element forms.)
My recommendation is;
- Enumeration changed from xs:string to xs:token (reason: to match XSDL
enums and trim leading/trailing whitespace)
- DFDL regular expression stays as xs:string (reason: regex may contain
literal white space)
- DFDL string literal changed from xs:string to xs:token (reason:
currently inconsistent with List of DFDL string literal)
- List of DFDL string literal stays as list of xs:token
- DFDL expression changed from xs:token to xs:string (reason: XPath may
contain non-ignorable whitespace)
Further:
- DFDL regular expression should not trim leading/trailing whitespace
- DFDL expression should trim leading whitespace before { and trailing
whitespace after }
- The enum of DFDL property names should be based on xs:token
The xs:unions for DFDL properties that can be two or more of the above
may/will need the member ordering reviewed.
Example:
<xsd:simpleType name="BinaryFloatRepEnum_Or_DFDLExpression">
<xsd:union>
<xsd:simpleType>
<xsd:restriction
base="dfdl:DFDLExpression" />
</xsd:simpleType>
<xsd:simpleType>
<xsd:restriction
base="dfdl:BinaryFloatRepEnum"/>
</xsd:simpleType>
</xsd:union>
</xsd:simpleType>
Usually in a union, the most restrictive member is placed first. With the
current types, the above has xs:token followed by xs:string, in accordance
with this practice. But the recommendation changes the types of both
members, so that the above becomes xs:string followed by xs:token.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Suman Kalia/Toronto/IBM at IBMCA,
Cc: dfdl-wg at ogf.org, Mike Beckerle <mbeckerle.dfdl at gmail.com>
Date: 27/03/2013 16:38
Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now
regex is xs:string, expression is xs:token
Suman
Looking at the XML schema-for-schemas, and doing a test in the XSD editor
in eclipse, XSD enumeration facets are modelled as xs:NMTOKEN and not
xs:string, like DFDL enums. XSD is perfectly happy to strip/collapse white
space. I think therefore that we should be doing the same for DFDL enum
properties. I don't see any harm in this - an enum is a contiguous
sequence of non-whitespace characters anyway, so any leading/trailing
whitespace is harmless.
Looks like XSD pattern facet is modelled as xs:string, preserving white
space. We should do the same for DFDL regex properties.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Steve Hanson/UK/IBM
To: Suman Kalia <kalia at ca.ibm.com>,
Cc: dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org, Mike Beckerle
<mbeckerle.dfdl at gmail.com>
Date: 19/03/2013 17:41
Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now
regex is xs:string, expression is xs:token
The type called DFDLExpressionOrPatternOrNothing only makes sense for use
in one place - the element value of an assert or discriminator.
Regards
Steve Hanson
Architect, IBM Data Format Description Language (DFDL)
Co-Chair, OGF DFDL Working Group
IBM SWG, Hursley, UK
smh at uk.ibm.com
tel:+44-1962-815848
From: Suman Kalia <kalia at ca.ibm.com>
To: Mike Beckerle <mbeckerle.dfdl at gmail.com>,
Cc: dfdl-wg at ogf.org, dfdl-wg-bounces at ogf.org
Date: 19/03/2013 17:29
Subject: Re: [DFDL-WG] whitespace in DFDL annotations: right now
regex is xs:string, expression is xs:token
Sent by: dfdl-wg-bounces at ogf.org
Mike - I am not sure but my gut feeling is that it would start with the
most restrictive one first. i.e If empty string ( assuming it has
length facet 1) - would match Nothing , then xsd:token which is
restricted form of xs:string. I think you are going to get string with
white spaces collpsed ( xsd:token) if it not empty string. You can run
few tests to see the behavior..
Suman Kalia
IBM Canada Lab
WMB Toolkit Architect and Development Lead
Tel: 905-413-3923 T/L 313-3923
Email: kalia at ca.ibm.com
For info on Message broker
http://www.ibm.com/developerworks/websphere/zones/businessintegration/wmb.html
From: Mike Beckerle <mbeckerle.dfdl at gmail.com>
To: dfdl-wg at ogf.org,
Date: 03/19/2013 12:43 PM
Subject: [DFDL-WG] whitespace in DFDL annotations: right now regex
is xs:string, expression is xs:token
Sent by: dfdl-wg-bounces at ogf.org
This came up on the call. The schemas I have for DFDL annotations have
DFDLRegularExpression as an xs:string, and DFDLExpression as an xs:token.
I have no clue what a union of these types behaves like. But we have a
union called DFDLExpressionOrPatternOrNothing which is a 3-way union of
DFDLExpression, DFDLRegularExpression, and EmptyString (which is also
derived from xs:string but has length facet of 0 as well.
--
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
--
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg --
dfdl-wg mailing list
dfdl-wg at ogf.org
https://www.ogf.org/mailman/listinfo/dfdl-wg
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20130710/aac2f7c9/attachment-0001.html>
More information about the dfdl-wg
mailing list