[DFDL-WG] clarification needed - ambiguity about empty string and optional element
Mike Beckerle
mbeckerle.dfdl at gmail.com
Wed Aug 1 14:42:07 EDT 2018
I omitted that dfdl:emptyValueDelimiterPolicy is 'both' here, though no
dfdl:initiator nor dfdl:terminator are defined.
Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
www.tresys.com
Please note: Contributions to the DFDL Workgroup's email discussions are
subject to the OGF Intellectual Property Policy
<http://www.ogf.org/About/abt_policies.php>
On Wed, Jul 11, 2018 at 8:16 AM, Mike Beckerle <mbeckerle.dfdl at gmail.com>
wrote:
> Consider this data of 4 characters:
>
>
> foo;
>
>
> Consider this schema where the default format is the basic general set of
> text-oriented defaults.
>
>
> <xs:element name="ex_infix" dfdl:lengthKind="implicit">
>
> <xs:complexType>
>
> <xs:sequence dfdl:separator=";" dfdl:separatorSuppressionPolicy="
> anyEmpty" dfdl:separatorPosition="infix">
>
> <xs:element name="x" type="xs:string" dfdl:lengthKind="delimited"/>
>
> <xs:element name="y" type="xs:string" minOccurs="0"
>
> dfdl:lengthKind="delimited"
>
> dfdl:occursCountKind="implicit"/>
>
> </xs:sequence>
>
> </xs:complexType>
>
> </xs:element>
>
>
>
> This is in a current Daffodil unit test, and produces this infoset:
>
>
> <ex_infix><x>foo</x><y/></ex_infix>
>
>
> That is, an empty string element is created for element 'y'.
>
>
> I'd like to know what IBM DFDL produces as the infoset for this example.
>
>
> I believe the DFDL spec is actually self-contradictory and so ambiguous
> here about what is the right behavior.
>
>
>
> - DFDL Spec 14.2.1 description of anyEmpty: "...any occurrences that
> have zero length representation MAY be omitted from the data, along with
> their associated separator."
> - Note that it says "may", not "must be". So anyEmpty is "lax" in
> insisting that the zero-length elements aren't present.
> - This doesn't clarify anything for us. But it admits the
> possibility that the ";" separator appears even if the 'y' element
> occurrence is determined to not exist.
>
>
>
> - DFDL Spec 9.3.1.1 says an element is known to exist if it has the
> nil, empty, or normal representation
> - In the example, element 'y' is zero-length which is either empty
> or normal representation since a string can have "" (empty string) as a
> value.
> - Since the 'y' element decl does not specify a XSD default value,
> the concept of 'empty' and defaulting doesn't apply here, so a zero-length
> string is a normal representation, and according to this section, it is
> known-to-exist.
> - This contradicts 9.4.2.2 below.
>
>
>
> - DFDL Spec 9.3.1.3 says "Note: based on the above, when processing a
> sequence for which a separator is defined, the presence of a match in the
> data for the separator is not sufficient to cause the parser to determine
> that an associated component is known-to-exist." It then refers you to
> 14.2.1
> - I don't think this changes anything. Again it just admits that
> the separator ";" can appear even without the following element. I.e., I
> think it just allows for lax processing of excess separators.
>
>
>
> - DFDL Spec 9.4.2 Element Defaults When Parsing - Subsection 9.4.2.2
> Simple element (xs:string or xs:hexBinary) (Emphasis below is
> mine)
> - Here's the excerpted text:
> -
> - "Required occurrence:* If the element has a default value*
> then an item is added to the infoset using the default value, otherwise an
> item is added to the Infoset using empty string (type xs:string) or empty
> hexBinary (type xs:hexBinary) as the value. Optional occurrence:
> If dfdl:emptyValueDelimiterPolicy is not 'none'[12]
> <http://daffodil.apache.org/docs/dfdl/#_ftn12> then an item is
> added to the Infoset using empty string (type xs:string) or empty hexBinary
> (type xs:hexBinary) as the value, *otherwise nothing is added to
> the Infoset. *
>
> Note: *To prevent unwanted empty strings *or empty hexBinary values
> from being added to the Infoset, use XSD minLength > '0' and a dfdl:assert
> that uses the dfdl:checkConstraints() function, to raise a processing
> error."
> - Note that the language states "if the element has a default
> value" - which denotes that the section is dealing with both defaultable
> AND non-defaultable elements, and is not exclusively discussing defaultable
> elements as the title of 9.4.2 would imply.
> - The second statement is about optional occurrences, and it does
> not qualify what it says on defaultable element or not. Hence, I read the
> "nothing is added to the infoset" as applies whether or not the element is
> defaultable. So a zero length (ZL) string is never going to create an
> empty-string value for an optional element.
> - However, this contradicts the note about preventing unwanted
> empty strings. That note is only sensible if optional elements of
> zero-length will get added to the infoset and extra steps are required to
> force a facet check to prevent them.
>
>
> Unless I'm missing another place in the DFDL spec that clarifies this, I
> think we need to revise this area to make things clearer.
>
>
> But first we have to pick which is the intended semantics. In the example
> above, which infoset is the one we want:
>
>
> <ex_infix><x>foo</x><y/></ex_infix> (empty string as normal
> representation takes priority over optionality)
>
> or
>
> <ex_infix><x>foo</x></ex_infix> (optionality takes priority over
> empty string as normal representation)
>
>
> Either way I think this change is needed:
>
> - Section 9.4.2 - change section title to "Element Defaults and
> Optionality When Parsing"
>
> But a bunch of other clarifications are also needed.
>
> Today Daffodil 2.1.0 implements the first behavior.
> <ex_infix><x>foo</x><y/></ex_infix> with the empty 'y' element.
>
> What does IBM DFDL do?
>
>
>
>
>
>
>
>
>
> Mike Beckerle | OGF DFDL Workgroup Co-Chair | Tresys Technology |
> www.tresys.com
> Please note: Contributions to the DFDL Workgroup's email discussions are
> subject to the OGF Intellectual Property Policy
> <http://www.ogf.org/About/abt_policies.php>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ogf.org/pipermail/dfdl-wg/attachments/20180801/8cd275b1/attachment.html>
More information about the dfdl-wg
mailing list