[Nsi-wg] Pathfinding, Labels, and Topology ([still] a bit long)
Jerry Sobieski
jerry at nordu.net
Wed Nov 30 17:44:39 CST 2011
Rebuttals in line:-)
On 11/30/11 5:30 AM, Jeroen van der Ham wrote:
> Hello,
>
> I think it's most important to identify the requirements that we have for the topology, and work from there:
>
> - We need to have some distribution method for topology
> - This method must be maintainable for changes in the network, so it should allow updates.
> - It must be possible to request a connection from port A with VLAN X to port B with VLAN X.
> - It must be possible to request a connection from port A with VLAN X to port B with VLAN Y.
The first two I am in agreement with. The second two I will argue are
not real requirements of topology - they reflect some conventional
notions of traditional signaling protocols and assume specific
technology. Try to remember that the objective of NSI is to build
*connections* - not VLANs per se. In NSI we have an abstracted model of
a "connection" as a conduit for transporting payload data between two
endpoints. These connections simply ride atop the infrastructure
whatever it is. So the VLAN itself is not critical to NSI.
I assert the following:
- In an *ethernet* environment and traditional protocols you might
expect this to be necessary, but its not a broad based requirement for
NSI. We need to generalize the sentiment in order to keep the abstractions.
- We *can* reserve a connection from a specific port and VLAN by
associating the switch, port, vlan etc with and STP. As long as the RA
somehow maps the VLAN to the STP then the RA places that STP as the
endpoint. Simple. The CS protocol can do this.
Further, since NSI addresses the inter-domain problem - where external
agents do not pry into local affairs but ask politely for services to be
provided - we have specified NSA/NRMs to deal with local pathfinding and
resource allocation. The remote pathfinder may not have sufficient
information available to make a VLAN selection...indeed probably will
not (VLAN selection is not simply finding an avaialble VLAN id.) I
would argue that the RA either knows apriori which STPs are associated
with the VLANs it wants, or it doesn't care. But the PA doesn't care.
The question is: does a remote pathfinder have access to the technology
specific details and the current state information from the local
network? I.e. you must know both in order to short circuit the
exhaustive search issue. If it does, the remote pathfinder discerns
the STP it needs from that topology information and makes a reservation
request usng that STP. It is a vlan specific request. But it does
not make the CS protocol vlan sensitive.
Even if the remote [RA] pathfinder knows which VLAN it wants and knows
the associated STP but has no internal state information about the STPs,
then it must still guess.
So the real issue is not whether we can request a specific VLAN, but
knowing which STP represents a specific VLAN, on a particular port, on a
particular switch, in a particular network _/and its state/_. This is
starting to be a lot of detailed information that is all internal to a
foreign network.
>
> Nice to haves:
> - Dynamic availability information for both links and labels.
As stated above, unless you have this "availability" information for
your labels (or *any* termination point), you will end up guessing at
their availability - which means you still have not solved the
exhaustive search problem.
But more generally, availability is in fact "state" information. This
is a *real* scaling challenge as state is myriad and changes often. The
different state values associated with a topological object as simple as
a VLAN might be: a) is it operationally up or down? b) is it allocated
or available? c) how much of the resource is available? Flooding this
information for 4000 different vlans on every port is impractical, let
alone a whole network making this information available to any/every
other curious agent.
I think we can find a middle ground and say we want to update *some*
minimal state such as "operational availability", at some topological
aggregation level, but even this is non-trivial given the related
aspects across labels, ports, and/or other groupings. And since this is
proprietary information, you must be prepared to not have *any* such
state associated with the topology you know about.
In one respect - learning/knowing about topology is itself a state
update in itself, e.g. do you flood/broadcast/publish topology updates
when a link goes up/down? or just when it is permanently added or
removed from the infrastructure? Topology and State distribution are
two heads of the same detail+coherency monster and pose a multitude of
serious scaling challenges.(!!) So (IMO) this is useful to explore and
we should consider this, but with en eye to the significant scaling
issues in a large global multi-domain network.
>
> Do we also want to include a connection from port A to port B where you don't care about the label?
This suggestion is not really a topology issue but a CS Protocol issue -
it breaks the "specific endpoint" (point to point) semantics of the
Connection request. Do you want a raw unlabeled connection? or a
labeled connection but where you don't care which label? Will a stacked
label be acceptable (e.g. QinQ)? What if the port is not a basic
ethernet port, e.g. what if the port is a WDM port carrying many
differnent colors each with different framing? What would you specify as
the "endpoint" for a "Connection" request? If you don't care, then
why can't the RA take the responsibility to simply select one regardless
of the underlying labeling? And thus you leave it to the local NRM to
engineer the connection internally to its network between the two
*specific* endpoints requested by the RA. The easy answer here is to
for the RA to not use tree segmentation at all but to specify a
downstream endpoint and a chain request and let the local PF decide the
local egress point.
If we are going to break the basic pt-to-pt "Connection" abstraction
that requires specific endpoints such that the Termination points are no
longer fixed but a set of acceptable [constrained] components of a
connection, perhaps we should generalize it to treat such sets as
/constraints/ on the connection rather than fundamental components of a
connection. This could actually work. (This is called /anycast/ in the
literature or pt-to-anypt.) We would need to review the CS protocol,
but this model would still pose an abstracted "connection" but the
abstraction gets a bit wierder: It results in a ordered set of
resources who's only requirement is adjecency. This is an interesting
prospect, but I would place it as a potential feature of version 3+,
possibly along with pt-to-mp, negotiation, and volume requests.
What I think is required in the immediate term is the following:
- A *concise* means of expressing STPs that does not change their
semantics. They are still "Service Termination Points" and still map to
particular internal topological constructs, but we find a more efficient
representation that also integrates well with the adjacency
advertisements (SDPs). The recursive nature of various framing
technologies and label stacking would tend to make labels (where they
exist) good candidates for children nodes in a topology tree, but
perhaps there is a mechanism that would allow us to create "summarized"
children (?) in the topology representation that is more efficient means
of expressing the semantics but leaves the abstraction and alone. (For
instance, if we expand STPs resources to describe a set of cannonically
[sequential] tags, we can maintain the current elegance of the model,
while at the same time providing a label functionality.) Thoughts?
- We also need a means of expressing basic abstract topological objects
(e.g. nodes, ports, links, agents) that allows for recursive relational
descriptions and maintains the strict technology agnostic abstractions
NSI Framework requires, yet allows us to complement this basic ontology
with enhanced detail - under local advertisement control of course.
Remote PF implementations can leverage whatever detail is available to
optimize the PF/Reservation process. Thoughts?
>
> On 30 Nov 2011, at 11:01, Jeroen van der Ham wrote:
>
>> Hello,
>>
>> On 29 Nov 2011, at 03:47, Jerry Sobieski wrote:
>>> Hi everyone - this is long (sorry) but this is a long held response to the so called "label" issue. We need to engage on this...
>>>
>>> Our topology model works just fine for path finding. Period. As is. It is critical for people to understand this. It works for flat VLANs, it works for swapping, it works for essentially any network connection service. Try to get away from conventional signaling ways of thinking and look at how NSI is positioned to do this and the power it brings. NSI is about *connections* - not VLANs, not waves, not LSPs... The abstraction it presents to the user is a connection model that works regardless of underlying technologies.
>> Our current topology model works in that it has a 1 in 4 chance of getting the right VLAN across a network is acceptable. However, we're still using only 4 VLANs, once we go to 4096, we get to a 1 in 4096 chance.
In most cases you could look at this as actually more likely to work.
In SC topo if we had one VLAN in use (25%utilization), we had a 75%
chance of a successful second choice. In the scenario above if we have
one VLAN in use, we have a 99.92% chance of a correct hit for the second
VLAN (!). And we would still have better chances for the first 1000
VLANs we randomly choose!!!! And if we have 1000 vlans in service (25%
utilization) we still have a 75% chance of a successful choice.
>>
>> Pathfinding currently is done by the Aggregator NSA handling the request. He looks at the current topology, sees thousands of parallel SDPs and for crossing several domain boundaries, he just has to pick one randomly. I don't know about you, but I don't consider that "working just fine".
Sure it does. Why not? What would you do? The (implicit) Transit
Function of the networks in the path was "We can connect any port to any
other port if resources are available." That is a very powerful
statement. If that network cannot actually do that, then its not a
failure of the topology or the protocol.
A remote pathfinder will never have a crystal ball and ultimately still
must consult the local authoritative Resource Manager...So the remote PF
(the aggregator in your example above) will *always* be subject to the
local NRM rejecting the request no matter how specific or well informed
you are. Lables don't solve this basic problem that Remote PF is not
authoritative - it *must* consult local agents and is dependent upon
them confirming the path...or it fails.
>>
>> In the demonstration at SC we relied on the human to make requests from one endpoint to another endpoint, using the same VLAN. I have not seen any requests made using different VLAN labels.
>> Also, I have seen and heard that NSA implementations used the last part of the ID to figure out the correct label and use that in their pathfinding algorithms.
>> I do not think that that is a desirable solution.
While I am afraid and _/literally appalled (!)/_ that some NSAs may have
indeed parsed the STP name for a vlan hint, this was incorrect and is
easily broken. It makes totally incorrect use of the topo information
and is a really REALLY BAD assumption. (I put that vlan info in the STP
tag to make it easier for developers to debug things - not as a shortcut
for anything...rest assured the next topo file will have no such human
readability.) This is like parsing an IP Hostname (www.google.com) to
recover its IP address...it doesn't work. I can easily create a
topology that describes the same SC layout that breaks those
implementations. Would you trust other networks to be so exacting?
STPs are symbolic references - they do not contain any technology
specific information themselves.
We relied on humans to simply *optimize* the selection order - to
"choose wisely". If any arbitrary pair of STPs are requested, the PA
should either reserve it or reject it. And if the path finder in the
remote NSA "chooses poorly" and is not robust enough to try another
possible path, then that is a very weak implementation - not a flaw in
either the NSI CS protocol or the Topology we used for SC. Further,
even the human end point educated guess could fail in many cases. A
reasonable pathfinder *must* be prepared to try alternate paths in the
case of blocking conditions or take responsibility for not doing
so...its not the standards responsibility to make sure resources are
available in every network.
Indeed, we could have redefined the topology slightly to reflect the
separate VLAN planes at, say, StarLight - by defining sepearate NSI
Networks for each VLAN. This would have made explicit in the topology
the constraint that certain STPs cannot be cross-connected to other
STPs. Just as I get grief about the STPs enumeration, I also got
seriously flamed for this approach as well. BUT BOTH WORK! All you
need is a fundamentally simple pathfinder. And this latter separated
vlan planes approach works better than we had at SC because it expresses
more topological constraints than the topology we actually used - and I
bet most of the pathfinders would have eaten it up just fine.
So I don't want to hear that about seriously flawed implementations and
weak pathfinders are the driver excuse for changing the topology model
or the abstractions of the architecture.
>>
>> Let me reiterate:
>> The current NSI implementation is completely unaware of labels. This makes it near impossible to make informed decisions about paths crossing several domains. For each domain a path crosses the chance of finding the right path decreases exponentially.
What do you mean by an "informed" decision? Even if you knew all about
the labels there is no guaranty that the other constraints on the
connection are available. i.e. the endpoint (labeled or otherwise) is
just one constraint that must be met for success.
The chance of finding a successful path is a function of the number of
labels, the diameter of the network, *AND* the availability of those
labels, *AND* the algorithm for selecting the trial order by the RA,
*AND* most importantly the availablitiy of the other transit resources.
Yes the worst case is exponential...but the *likelyhood* of the worst
case is of equal importance. The easiest way to reduce the lieklyhood
of a worst case exhaustive search is to provide *MORE* STPs and do a
random trial order. This would make the likelyhood of a hit
camparitively much higher. Of course a better solution would be to
have access to all topology state...but that poses equally exponentially
complex issues and is not going to happen either.
>>
>> The only way to make label unaware pathfinding work is by making 4096 versions of each of the different domains in the global network.
While this would work, its not the *only* way to work. Proof: It
worked for SC.
>> The connections between those different networks will then depend on the label-swapping capabilities of those networks.
Sigh. Lets face it: The reason VLANs pose a problem is that they block
easily. The better networks will implement label swapping switching
technologies. Flat vlans just don't scale well on a global basis.
Particularly with existing conventional ethernet hardware. For
instance: Even if you knew VLAN 1780 was available between StarLight and
NetherLight and also available between StarLight and ESnet, if 1780 was
in use on the port facing JGNX it would be unavailable to any other
crossconnect. It would be blocked for your use between NL and Esnet.
Which means you would have to select a different egress VLAN at NL *and*
at ESnet. So just knowing which VLANs are available on one port does
not tell you if it is available internally or the likelyhood that it
might be. Its a crap shoot. A guess. A shot in the dark.
Conventional Ethernet sucks for global provisioning. Accept this my
child and enlightenment will open your eyes. (:-)
Seriously, Label Swapping was designed to avoid this issue. LS makes
all labels "link local." The label assignment in LS network is *not*
based upon label availability within the network but on the link alone.
Label swapping can be performed extremely fast and scales well - thus
the success of MPLS. If we have ethernet hardware that did VLAN
swapping *and* per-port VLAN scoping, we would have label swapping.
Flat VLANs become just a bad dream. 802.1ah (PBB) addresses this issue
and others.
>> Even with that solution, it is still hard, due to availability, and correlations between paths (if I use 10gb on one label, I can't use it again on a different label).
Exactly. At some point you will realize that pathfinding is not
deterministic in an active network - you can optimize the process, but
you cannot predict it or find an optimal path unless the global network
is static and you know *ALL* the state details. Anything short of this
omniscience means that we have to accept the fact that we may encounter
blocking in the network for any number of reasons and all we can do is
try another path, or fail.
Path reservation is a two pass process: A high level candidate path
selection followed by a low level confirmation pass...unless the
confirmation process completes you cannot use it. And there is no
practical way to know apriori which paths will work. You have to try.
This is pathfinding.
>> Note also that the number of domain descriptions will increase exponentially as soon as we start considering multi-layer networks.
I am not sure agree with this. Topology hiding and transfer functions
make this a far simpler problem. The overall complexity is not reduced,
but we delegate responsibility to agents who have the deatiled
information and authority to allocate the resources. So the more
topology and state you try to express the harder the problem becomes.
At some point we have to accept that summarization is the only way we
can hope to make this scale and that pathfinding will be a
non-deterministic process - based on probablities of success, but
guesses none the less. We want to always "choose wisely" but
understand that we won't always be so lucky.
Thanks for your dedication to this issue, Jeroen. I appreciate your
intensity.
Best regards
Jerry
>>
>> Jeroen.
>> _______________________________________________
>> nsi-wg mailing list
>> nsi-wg at ogf.org
>> http://www.ogf.org/mailman/listinfo/nsi-wg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.ogf.org/pipermail/nsi-wg/attachments/20111130/b9736beb/attachment-0001.html
More information about the nsi-wg
mailing list