[glue-wg] When is data stale?
Paul Millar
paul.millar at desy.de
Tue Apr 21 11:17:49 EDT 2015
Hi Florido,
Thanks for your reply; my comments below.
On 21/04/15 11:53, Florido Paganelli wrote:
> I also have the feeling the discussion is becoming a bit sterile. We can
> make the GLUE2 spec better but I hardly understand how Paul definitions
> without using the actual terms we want to define could help.
Sorry, it was meant only as an aide towards writing good descriptions.
It's certainly not a requirement.
> On 2015-04-20 19:46, Paul Millar wrote:
>> Put another way, the concept of 'information being created' is too loose
>> a term: it could mean almost anything, so defines nothing.
>>
>
> Well, this is a rhetorical game and not a scientific discussion anymore
> IMHO. I understand you want a definition out of the practical
> implementation, and since you seem to like riddles, I will avoid the
> words creation and time (at this point a mere exercise of wording)
> here it is:
>
> The CreationTime is the number of seconds elapsed since the Epoch
> (00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970)
> formatted as described in the GLUE2 document when BOTH these two are true:
> 1) the GLUE2 record for a GLUE2 entity is being generated
> 2) the data contained in the record, that is, the data that describes
> the entity the record refers to, is being collected.
Great, thanks for taking the time to define this.
> I see no fallacy nor circularity. It's a definition. It does
> NOT require the knowledge of provider, resource- whatever-BDII
Yes, absolutely.
> Of course, if you want to be really picky there is a time drift between
> 1) and 2) because a Turing machine is sequential. But we can avoid this
> discussion I hope...
Certainly, despite evidence to the contrary, I don't want to nitpick.
Now, I believe your definition also applies to a site-level BDII. When
it refreshes information, it generates a new record and populates this
with information it collects from the resource-level BDII. Conditions
1) and 2) are satisfied, so the site-level BDII may set CreationTime.
There's a (translational?) symmetry between a site-level BDII fetching
information from resource-level BDIIs, and a resource-level BDII
fetching information from info-providers.
Having said that, the problem only appears in hierarchical systems, like
BDII. So, perhaps having a hierarchical profile document would be a
better way of solving this.
> I can provide a similar definition for Validity if you like... but I
> will shift to Stephen's suggestion that this is community-driven, but
> it's not because of the model, it's because what is "Valid" is community
> driven, and by experience I can tell it will be even if you try to
> define it otherwise!
I guess it's unclear to me what should happen if CreationTime+Validity
is in the past. From what others have said, it seem we make no claims
what this means; the client must decide.
My naïve thinking was that, if information is updated periodically and
CreationTime+Validity is in the past then the data should be considered
"stale" as it should have been updated by now.
> Maybe the only real outcome of this discussion is Jens' comment that
> 'Validity' was a bad name! :D
Yeah, I think that's true!
[..]
>> To me, this points to a deficiency in GLUE 2.
>
> I do not see the needs to describing it in the model. One describes that
> in an implementation of a hierarchical information system (today only
> BDII and maybe EMIR, which nobody uses)
>
> Otherwise we need a model that takes into account hierarchical
> propagation of information (as mentioned before, an aggregation model)
>
> But for me having the above in the GLUE2 model sounds like if physicist
> should describe the Standard Model in terms of the pieces of paper,
> emails, research papers, people, historical events needed to describe
> the physics in it...
:-D
OK, perhaps this could be in a separate document (a profile?) that
describes a hierarchical GLUE system? That could refine concepts, like
CreationTime, describe how aggregation happens, etc.
This would avoid "polluting" GLUE-2 base document with these
hierarchy-specific issues.
>>> [...]
>>> I don't see that this is different to any attribute - what you
>>> publish needs to be driven by the use cases. It wouldn't be
>>> especially difficult to publish a different Validity for each object
>>> type, or even for e.g. different batch systems, but unless you have
>>> something to specify the use there's nothing to motivate such a
>>> varying choice.
>>
>> My use-case was what you might expect: allowing detection of a
>> particular failure mode. Specifically, the information publishing "got
>> stuck" at one site. The details don't matter, but the result was old
>> ("stale") data continued to be re-published.
>>
>
> In ARC, we decided long time ago that the information system should NOT
> be used as a monitor for the information system itself. If one does that
> it does it at his own risk; the reason lies behind the fact that the
> information system is more like a business card. It presents services to
> users. It might fake some of the information to please the
> users/communities needs, or to hide faults
> in the system in a way that the overall system still works (and this is
> what actually happens!)
>
> Using the information system as a monitoring tool requires a different
> approach, namely, the information system itself must be able to
> self-diagnose. Apart from the philosophical question if this is even
> possible, for ARC this is difficult because the information system is
> part of/triggered by other parts of the middleware: if the middleware
> dies the infosys dies with it. This is not up to GLUE2 to define, and is
> not part of most current architectures, and to me it indicates that
> proper monitoring should be done with third party tools. As a matter of
> fact that claim applies to most software.
>
> So if you want to know if the information publishing "got stuck" you'd
> better be a good sysadmin and use a decent process monitoring tool, let
> it be Nagios or a simple cronjob that sends emails...
As with all things: hindsight is 20-20 and failure modes oft choose the
gaps in monitoring.
In this particular case, the "mechanical" refresh process was working
correctly, with the site-level BDII fetching data correctly. Direct
monitoring of BDII/LDAP object creation time (the built-in
'createTimestamp' attribute) would not have revealed any problem.
Publishing CreationTime and Validity (with the semantics of
now()>CreationTime+Validity => problem) would have allowed a script to
detect the problem.
This isn't to say this is the only way of achieving this, nor that it is
necessarily the best way; however, it did seem to fit with the idea of
CreationTime and Validity.
Publishing just the CreationTime allows a script to detect the problem,
provided it happens to know the refresh period. Although this is less
idea, it's probably the best I can do, given everyone else feels
Validity has a different meaning.
Cheers,
Paul.
More information about the glue-wg
mailing list