[Nsi-wg] Pathfinding, Labels, and Topology ([still] a bit long)

Wed Nov 30 17:44:39 CST 2011

Rebuttals in line:-)

On 11/30/11 5:30 AM, Jeroen van der Ham wrote:
> Hello,
>
> I think it's most important to identify the requirements that we have for the topology, and work from there:
>
> - We need to have some distribution method for topology
> - This method must be maintainable for changes in the network, so it should allow updates.
> - It must be possible to request a connection from port A with VLAN X to port B with VLAN X.
> - It must be possible to request a connection from port A with VLAN X to port B with VLAN Y.
The first two I am in agreement with.  The second two I will argue are 
not real requirements of topology - they reflect some conventional 
notions of traditional signaling protocols and assume specific 
technology.  Try to remember that the objective of NSI is to build 
*connections* - not VLANs per se.  In NSI we have an abstracted model of 
a "connection" as a conduit for transporting payload data between two 
endpoints.   These connections simply ride atop the infrastructure 
whatever it is.   So the VLAN itself is not critical to NSI.

I assert the following:
-   In an *ethernet* environment and traditional protocols you might 
expect this to be necessary, but its not a broad based requirement for 
NSI.  We need to generalize the sentiment in order to keep the abstractions.
- We *can* reserve a connection from a specific port and VLAN by 
associating the switch, port, vlan etc with and STP.   As long as the RA 
somehow maps the VLAN to the STP then the RA places that STP as the 
endpoint.  Simple.  The CS protocol can do this.

Further, since NSI addresses the inter-domain problem - where external 
agents do not pry into local affairs but ask politely for services to be 
provided - we have specified NSA/NRMs to deal with local pathfinding and 
resource allocation.   The remote pathfinder may not have sufficient 
information available to make a VLAN selection...indeed probably will 
not (VLAN selection is not simply finding an avaialble VLAN id.)   I 
would argue that the RA either knows apriori which STPs are associated 
with the VLANs it wants, or it doesn't care.  But the PA doesn't care.

The question is: does a remote pathfinder have access to the technology 
specific details and the current state information from the local 
network?  I.e. you must know both in order to short circuit the 
exhaustive search issue.   If it does, the remote pathfinder discerns 
the STP it needs from that topology information and makes a reservation 
request usng that STP.   It is a vlan specific request.   But it does 
not make the CS protocol vlan sensitive.

Even if the remote [RA] pathfinder knows which VLAN it wants and knows 
the associated STP but has no internal state information about the STPs, 
then it must still guess.

So the real issue is not whether we can request a specific VLAN, but 
knowing which STP represents a specific VLAN, on a particular port, on a 
particular switch, in a particular network _/and its state/_.  This is 
starting to be a lot of detailed information that is all internal to a 
foreign network.

>
> Nice to haves:
> - Dynamic availability information for both links and labels.
As stated above, unless you have this "availability" information for 
your labels (or *any* termination point), you will end up guessing at 
their availability - which means you still have not solved the 
exhaustive search problem.

But more generally, availability is in fact "state" information.   This 
is a *real* scaling challenge as state is myriad and changes often.  The 
different state values associated with a topological object as simple as 
a VLAN might be: a) is it operationally up or down? b) is it allocated 
or available? c) how much of the resource is available?    Flooding this 
information for 4000 different vlans on every port is impractical, let 
alone a whole network making this information available to any/every 
other curious agent.

I think we can find a middle ground and say we want to update *some* 
minimal state such as "operational availability", at some topological 
aggregation level, but even this is non-trivial given the related 
aspects across labels, ports, and/or other groupings.  And since this is 
proprietary information, you must be prepared to not have *any* such 
state associated with the topology you know about.

In one respect - learning/knowing about topology is itself a state 
update in itself, e.g. do you flood/broadcast/publish topology updates 
when a link goes up/down? or just when it is permanently added or 
removed from the infrastructure?    Topology and State distribution are 
two heads of the same detail+coherency monster and pose a multitude of 
serious scaling challenges.(!!)   So (IMO) this is useful to explore and 
we should consider this, but with en eye to the significant scaling 
issues in a large global multi-domain network.
>
> Do we also want to include a connection from port A to port B where you don't care about the label?
This suggestion is not really a topology issue but a CS Protocol issue - 
it breaks the "specific endpoint" (point to point) semantics of the 
Connection request.   Do you want a raw unlabeled connection? or a 
labeled connection but where you don't care which label?  Will a stacked 
label be acceptable (e.g. QinQ)? What if the port is not a basic 
ethernet port, e.g. what if the port is a WDM port carrying many 
differnent colors each with different framing? What would you specify as 
the "endpoint" for a "Connection" request?    If you don't care, then 
why can't the RA take the responsibility to simply select one regardless 
of the underlying labeling?  And thus you leave it to the local NRM to 
engineer the connection internally to its network between the two 
*specific* endpoints requested by the RA.   The easy answer here is to 
for the RA to not use tree segmentation at all but to specify a 
downstream endpoint and a chain request and let the local PF decide the 
local egress point.

If we are going to break the basic pt-to-pt "Connection" abstraction 
that requires specific endpoints such that the Termination points are no 
longer fixed but a set of acceptable [constrained] components of a 
connection, perhaps we should generalize it to treat such sets as 
/constraints/ on the connection rather than fundamental components of a 
connection.   This could actually work. (This is called /anycast/ in the 
literature or pt-to-anypt.)  We would need to review the CS protocol, 
but this model would still pose an abstracted "connection" but the 
abstraction gets a bit wierder:  It results in a ordered set of 
resources who's only requirement is adjecency.   This is an interesting 
prospect, but I would place it as a potential feature of version 3+, 
possibly along with pt-to-mp, negotiation, and volume requests.

What I think is required in the immediate term is the following:

- A *concise* means of expressing STPs that does not change their 
semantics.  They are still "Service Termination Points" and still map to 
particular internal topological constructs, but we find a more efficient 
representation that also integrates well with the adjacency 
advertisements (SDPs).  The recursive nature of various framing 
technologies and label stacking would tend to make labels (where they 
exist) good candidates for children nodes in a topology tree, but 
perhaps there is a mechanism that would allow us to create "summarized" 
children (?) in the topology representation that is more efficient means 
of expressing the semantics but leaves the abstraction and alone.  (For 
instance, if we expand STPs resources to describe a set of cannonically 
[sequential] tags, we can maintain the current elegance of the model, 
while at the same time providing a label functionality.)   Thoughts?

- We also need a means of expressing basic abstract topological objects 
(e.g. nodes, ports, links, agents) that allows for recursive relational 
descriptions and maintains the strict technology agnostic abstractions 
NSI Framework requires, yet allows us to complement this basic ontology 
with enhanced detail - under local advertisement control of course.   
Remote PF implementations can leverage whatever detail is available to 
optimize the PF/Reservation process.  Thoughts?

>
> On 30 Nov 2011, at 11:01, Jeroen van der Ham wrote:
>
>> Hello,
>>
>> On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:
>>> Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue.   We need to engage on this...
>>>
>>> Our topology model works just fine for path finding.  Period.  As is.   It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service.  Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings.   NSI is about *connections* - not VLANs, not waves, not LSPs...  The abstraction it presents to the user is a connection model that works regardless of underlying technologies.
>> Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance.
In most cases you could look at this as actually more likely to work.   
In SC topo if we had one VLAN in use (25%utilization), we had a 75% 
chance of a successful second choice. In the scenario above if we have 
one VLAN in use, we have a 99.92% chance of a correct hit for the second 
VLAN (!). And we would still have better chances for the first 1000 
VLANs we randomly choose!!!!   And if we have 1000 vlans in service (25% 
utilization) we still have a 75% chance of a successful choice.
>>
>> Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine".
Sure it does. Why not?   What would you do?  The (implicit) Transit 
Function of the networks in the path was "We can connect any port to any 
other port if resources are available."   That is a very powerful 
statement.   If that network cannot actually do that, then its not a 
failure of the topology or the protocol.

A remote pathfinder will never have a crystal ball and ultimately still 
must consult the local authoritative Resource Manager...So the remote PF 
(the aggregator in your example above) will *always* be subject to the 
local NRM rejecting the request no matter how specific or well informed 
you are.   Lables don't solve this basic problem that Remote PF is not 
authoritative - it *must* consult local agents and is dependent upon 
them confirming the path...or it fails.
>>
>> In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels.
>> Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms.
>> I do not think that that is a desirable solution.
While I am afraid and _/literally appalled (!)/_ that some NSAs may have 
indeed parsed the STP name for a vlan hint, this was incorrect and is 
easily broken.  It makes totally incorrect use of the topo information 
and is a really REALLY BAD assumption.  (I put that vlan info in the STP 
tag to make it easier for developers to debug things - not as a shortcut 
for anything...rest assured the next topo file will have no such human 
readability.)   This is like parsing an IP Hostname (www.google.com) to 
recover its IP address...it doesn't work.  I can easily create a 
topology that describes the same SC layout that breaks those 
implementations.  Would you trust other networks to be so exacting?   
STPs are symbolic references - they do not contain any technology 
specific information themselves.

We relied on humans to simply *optimize* the selection order - to 
"choose wisely".    If any arbitrary pair of STPs are requested, the PA 
should either reserve it or reject it.   And if the path finder in the 
remote NSA "chooses poorly" and is not robust enough to try another 
possible path, then that is a very weak implementation - not a flaw in 
either the NSI CS protocol or the Topology we used for SC.   Further, 
even the human end point educated guess could fail in many cases.   A 
reasonable pathfinder *must* be prepared to try alternate paths in the 
case of blocking conditions or take responsibility for not doing 
so...its not the standards responsibility to make sure resources are 
available in every network.

Indeed, we could have redefined the topology slightly to reflect the 
separate VLAN planes at, say, StarLight - by defining sepearate NSI 
Networks for each VLAN.   This would have made explicit in the topology 
the constraint that certain STPs cannot be cross-connected to other 
STPs.   Just as I get grief about the STPs enumeration, I also got 
seriously flamed for this approach as well.  BUT BOTH WORK!  All you 
need is a fundamentally simple pathfinder.  And this latter separated 
vlan planes approach works better than we had at SC because it expresses 
more topological constraints than the topology we actually used - and I 
bet most of the pathfinders would have eaten it up just fine.

So I don't want to hear that about seriously flawed implementations and 
weak pathfinders are the driver excuse for changing the topology model 
or the abstractions of the architecture.
>>
>> Let me reiterate:
>> The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially.
What do you mean by an "informed" decision?  Even if you knew all about 
the labels there is no guaranty that the other constraints on the 
connection are available. i.e. the endpoint (labeled or otherwise) is 
just one constraint that must be met for success.

The chance of finding a successful path is a function of the number of 
labels, the diameter of the network, *AND* the availability of those 
labels, *AND* the algorithm for selecting the trial order by the RA, 
*AND* most importantly the availablitiy of the other transit resources.  
Yes the worst case is exponential...but the *likelyhood* of the worst 
case is of equal importance.    The easiest way to reduce the lieklyhood 
of a worst case exhaustive search is to provide *MORE* STPs and do a 
random trial order.  This would make the likelyhood of a hit 
camparitively much higher.   Of course a better solution would be to 
have access to all topology state...but that poses equally exponentially 
complex issues and is not going to happen either.
>>
>> The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network.
While this would work, its not the *only* way to work.  Proof:  It 
worked for SC.
>>   The connections between those different networks will then depend on the label-swapping capabilities of those networks.
Sigh.  Lets face it: The reason VLANs pose a problem is that they block 
easily.  The better networks will implement label swapping switching 
technologies.     Flat vlans just don't scale well on a global basis.  
Particularly with existing conventional ethernet hardware.  For 
instance: Even if you knew VLAN 1780 was available between StarLight and 
NetherLight and also available between StarLight and ESnet, if 1780 was 
in use on the port facing JGNX it would be unavailable to any other 
crossconnect.   It would be blocked for your use between NL and Esnet.  
Which means you would have to select a different egress VLAN at NL *and* 
at ESnet.    So just knowing which VLANs are available on one port does 
not tell you if it is available internally or the likelyhood that it 
might be.    Its a crap shoot.  A guess.   A shot in the dark.  
Conventional Ethernet sucks for global provisioning.  Accept this my 
child and enlightenment will open your eyes. (:-)

Seriously, Label Swapping was designed to avoid this issue.   LS makes 
all labels "link local."  The label assignment in LS network is *not* 
based upon label availability within the network but on the link alone.  
Label swapping can be performed extremely fast and scales well - thus 
the success of MPLS.  If we have ethernet hardware that did VLAN 
swapping *and* per-port VLAN scoping, we would have label swapping.  
Flat VLANs become just a bad dream.  802.1ah (PBB) addresses this issue 
and others.
>> Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label).
Exactly.  At some point you will realize that pathfinding is not 
deterministic in an active network - you can optimize the process, but 
you cannot predict it or find an optimal path unless the global network 
is static and you know *ALL* the state details.   Anything short of this 
omniscience means that we have to accept the fact that we may encounter 
blocking in the network for any number of reasons and all we can do is 
try another path, or fail.

Path reservation is a two pass process: A high level candidate path 
selection followed by a low level confirmation pass...unless the 
confirmation process completes you cannot use it.   And there is no 
practical way to know apriori which paths will work.  You have to try.   
This is pathfinding.
>> Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks.
I am not sure agree with this.  Topology hiding and transfer functions 
make this a far simpler problem.  The overall complexity is not reduced, 
but we delegate responsibility to agents who have the deatiled 
information and authority to allocate the resources.  So the more 
topology and state you try to express the harder the problem becomes.   
At some point we have to accept that summarization is the only way we 
can hope to make this scale and that pathfinding will be a 
non-deterministic process - based on probablities of success, but 
guesses none the less.   We want to always "choose wisely" but 
understand that we won't always be so lucky.

Thanks for your dedication to this issue, Jeroen.   I appreciate your 
intensity.

Best regards
Jerry
>>
>> Jeroen.
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20111130/b9736beb/attachment-0001.html