S/A - Intels Thunderbolt pointless? USB3 design upward of 25 Gbps!

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Diamond Member
Feb 6, 2011
3,086
3,929
136
Well most of those interconnects are everything but copper nowadays (is there anything over 10gbit copper links available right now?). They've been fiber for years now for range reasons alone but even the cost for high speed copper connections (CX4 or something) is quite prohibitive. I dimmly remember asking what the Infiniband interconnect for our small cluster cost and the number wasn't pretty.

Technically we could fit every PC for the last few years with an interconnect that has a three times higher bandwidth than what Intel promises for TB, the real problem is as always cost. So while Intel surely could sell TB for 100gbit today, who'd buy it at the prices they'd target?

And TB wasn't engineered to solve the problems of on-chip communication, so I don't see what that has to do with this discussion? Intel and others are researching quite heavily in that area as well for obvious reasons (optics would offer high speed, low energy communication without those pesky analog problems like parasitic inductance), but I'd think the similarities end there

completely missed my point, take a NEXUS 7018 16 slots that can each have 230 gigs(10 X 23Gbps channels) of switching bandwidth in a full mesh, so ((16X15)/2)230 or 55Tbit a sec. The copper interconnect traces are design to handle 3 times that or around 76Gbps per channel or 165tbps of backplane interconnect.

so that shows that fibre doesn't really actually benfit throghput right now over short connections. the second point is that creating a ASIC that can forward that kind of data is extremely hard especally if you have big lookup tables and rewriting to do. So creating a link that has performance X is one thing creating the chips that forward data at that speed is anothing thing all together. to go back to the previous example it has 55tbps of intreconnect but right now has a max thoughput of 3.6tbps or 6% and that is limited by ASIC performance.
 
Last edited:

Voo

Golden Member
Feb 27, 2009
1,684
0
76
completely missed my point, take a NEXUS 7018 16 slots that can each have 230 gigs(10 X 23Gbps channels) of switching bandwidth in a full mesh, so ((16X15)/2)230 or 55Tbit a sec. The copper interconnect traces are design to handle 3 times that or around 76Gbps per channel or 165tbps of backplane interconnect.
Yeah so you can stuff several copper lanes together to get a large bandwidth, didn't dispute that. But the cost for those things is still prohibitive

the second point is that creating a ASIC that can forward that kind of data is extremely hard especally if you have big lookup tables and rewriting to do. So creating a link that has performance X is one thing creating the chips that forward data at that speed is anothing thing all together. to go back to the previous example it has 55tbps of intreconnect but right now has a max thoughput of 3.6tbps or 6% and that is limited by ASIC performance.
Wait when did we start talking about that? Sure that's a problem especially for routers that span networks, although you could argue that BGP and small packet sizes are the larger part of the problem there. Also that's essentially a memory speed and cost problem again (you can only put the FIB into SRAM), the CPU would be perfectly fine with much more packages if it didn't have to wait for slow DRAM, so that's a quite specialiced problem.. many other problems where you wouldn't have to lookup data in gigantic tables for every packet
 
Last edited:

itsmydamnation

Diamond Member
Feb 6, 2011
3,086
3,929
136
Yeah so you can stuff several copper lanes together to get a large bandwidth, didn't dispute that. But the cost for those things is still prohibitive
that copper traces can run at 78gpbs, and it isn't that expensive once you factor that there are 1200 point to point connections in each fabric module

Wait when did we start talking about that?
when i was pointing out that fibre/Cu at this distance doesn't matter the ASIC has to be able to forward at that rate.

Sure that's a problem especially for routers that span networks, although you could argue that BGP and small packet sizes are the larger part of the problem there.
well my example was a switch and i was talking optimal packet size and onyl switching performance (only mac lookup/rewrite, no L3) because an ASIC can only forward so many PPS regardless of its size.

Also that's essentially a memory speed and cost problem again (you can only put the FIB into SRAM), the CPU would be perfectly fine with much more packages if it didn't have to wait for slow DRAM,
no forwarding is done in software the RIB is compiled into hardware, whats the latency for a lookup i dont know, but i doupt its much as a 250k FIB has the same performance as 1million entry FIB, if the FIB lookup was the bottleneck it doesn't grow 4 times the size without having a performance impact, the fact that it is also cut though switched instead of store and forward would lead you to think this lookup is quick.

so that's a quite specialiced problem.. many other problems where you wouldn't have to lookup data in gigantic tables for every packet
your assuming thats the bottleneck, the funny thing is, i know someone who will know the answer to this, i also know he probably wont be able to give an answer :sneaky:.

but now that we have gone far more off topic then my inital intenson.....
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
no forwarding is done in software the RIB is compiled into hardware, whats the latency for a lookup i dont know, but i doupt its much as a 250k FIB has the same performance as 1million entry FIB, if the FIB lookup was the bottleneck it doesn't grow 4 times the size without having a performance impact, the fact that it is also cut though switched instead of store and forward would lead you to think this lookup is quick.
What do you mean with the RIB is compiled into hardware? The RIB is hold in DRAM which is exactly the bottleneck I'm talking about - its size together with the fact that accesses are quite unpredictable makes it hard to cache and DRAM itself isn't especially fast.

I mean I can cite rfc4984 to support my claims that DRAM is the real bottleneck here if you want ;)
DRAM speed, however, only grows about 10% per
year, or 1.2x every 2 years [DRAM] [Molinero]. This is an issue
because BGP convergence time is limited by DRAM access speeds.[...]
As a result, BGP convergence
time degrades at the routing table growth rate, divided by the speed
improvement rate of DRAM. In the long run, this is likely to become
a significant issue.

And @FIB: As I said, it's held in SRAM (at least in higher end routers) so the difference in size doesn't matter from a performance point (much; as always larger = slower), although it's extremely cost sensitive.

But yep we got really offtopic there ;)
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,086
3,929
136
What do you mean with the RIB is compiled into hardware? The RIB is hold in DRAM which is exactly the bottleneck I'm talking about - its size together with the fact that accesses are quite unpredictable makes it hard to cache and DRAM itself isn't especially fast.
there are two different types of forwarding methods hardware and sorftware.

now software forwarding is done by a CPU, with both the RIB and the FIB are held in Dram. the CPU does all the lookups and possibley the rewrites as well. routers that use this forwarding method include, cisco ISR's, cisco ASR's, 7200's, low end juniper (E series?) etc

hardware based forwarding, the forwarding of a packet is done generally by the ingress hardware(forwarding engine). The CPU maintains the control/mangement plane. The CPU builds the RIB in software/Dram, it then builds the FIB and forwards a copy to each forwarding engine where it is placed in memory (regardless of what type they actually use) on the forwarding ASIC itself.

Now i used to know how it actually built the FIB but i have been informed by Cisco that it no longer works the way it used to and no they wont tell me :(, but the FIB on this ASIC's looks completely different then the FIB on a software based forwarder. so going back to the N7018 example assuming all M1 line cards there are 128 copies of the FIB at any one time on the system. as a packet ingresses, the forwarding engine does a lookup on its local FIB, does all the rewrites and then sends it towards the egress port. Switches and routers that use distributed hardware forwarding are Ciscoo 6500 with dcef, Nexus switches, Cisco 7600's Juniper M, Juniper T etc.


And @FIB: As I said, it's held in SRAM (at least in higher end routers) so the difference in size doesn't matter from a performance point (much; as always larger = slower), although it's extremely cost sensitive.

Interesting fact, cost of an M1 line card with 256k entry FIB's , $X. Cost of an M1 line card with 1024K entry FIB's, $X. ie the exact same price. the forwarding rate in PPS doesn't change between the two different size FIB's and they have the same cost, so it seems logical to assume that atleast for under 1 million entry tables the FIB lookup isn't the bottleneck. There are 100G layer 3 hardware based forwarding interfaces around for a few platforms but i dont know how they work, but logically the FIB lookup would have to be 10 times faster.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Meh, personally dont own 6 external hard drives in raid, nor wish to.
I already have a cable that goes from my graphics card to the monitor.

I dont really care if you can daisy chain them.

Would I pay 15$ extra to have thunderbolt on a motherboard? not as it stands atm, I have no need for it.

I suspect its this way for ALOT of users.

Which is my point, I dont doubt its a shiny new tech thats slightly better than USB3 currently.... question is if its worth the cost? not for me, I dont want to daisy chain 4 pc monitors together, nor do I own 6 hdd external drives in raid to make use of it's speed (which usb3 could match, if they made chips for it).

Ya . Never said You would enjoy ,but a large video array for home intertainment . This be pretty cool. Movies and music video library ya I see tis as very useful. Not so much for self . But Ya I see this as cool.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
now software forwarding is done by a CPU, with both the RIB and the FIB are held in Dram. the CPU does all the lookups and possibley the rewrites as well. routers that use this forwarding method include, cisco ISR's, cisco ASR's, 7200's, low end juniper (E series?) etc

hardware based forwarding, the forwarding of a packet is done generally by the ingress hardware(forwarding engine). The CPU maintains the control/mangement plane. The CPU builds the RIB in software/Dram, it then builds the FIB and forwards a copy to each forwarding engine where it is placed in memory (regardless of what type they actually use) on the forwarding ASIC itself.
Good to know, but by your own definition both of those approaches have one thing in common: They store the RIB in DRAM and have to access it regularly (and I don't see any reason why the cache behavior should be different). And that's where the bottleneck comes into play.

Interesting fact, cost of an M1 line card with 256k entry FIB's , $X. Cost of an M1 line card with 1024K entry FIB's, $X. ie the exact same price.
And Intel's extreme CPUs really do cost them 10times more than their other products ;) Sure they not necessarily have to pass on the cost - or it could just be a market segmentation scheme with the same built in hw.

the forwarding rate in PPS doesn't change between the two different size FIB's and they have the same cost, so it seems logical to assume that atleast for under 1 million entry tables the FIB lookup isn't the bottleneck.
Yes the FIB lookup is still not the bottleneck because its still hold in SRAM, the problem is the DRAM (SRAM is an order of magnitude faster than DRAM; that's cache vs memory in a normal CPU). Using several local FIBs makes also sense because the smaller the SRAM the faster it can be (no linear relation) e.g. compare L1 to L3 caches.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
Interesting fact, cost of an M1 line card with 256k entry FIB's , $X. Cost of an M1 line card with 1024K entry FIB's, $X. ie the exact same price. the forwarding rate in PPS doesn't change between the two different size FIB's and they have the same cost, so it seems logical to assume that atleast for under 1 million entry tables the FIB lookup isn't the bottleneck. There are 100G layer 3 hardware based forwarding interfaces around for a few platforms but i dont know how they work, but logically the FIB lookup would have to be 10 times faster.

I hope you are not basing this on listing prices, because they are wildy inaccurate.

Eg. the listing price for a Juniper MX80 is +$100.000...but the price we paided was more like $13.000 per unit.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,086
3,929
136
I hope you are not basing this on listing prices, because they are wildy inaccurate.

Eg. the listing price for a Juniper MX80 is +$100.000...but the price we paided was more like $13.000 per unit.
doesn't matter if i base it off list, 50 off, or not for resale because everything is relative ( i was keeping the example within a single Cisco product line)

Good to know, but by your own definition both of those approaches have one thing in common: They store the RIB in DRAM and have to access it regularly (and I don't see any reason why the cache behavior should be different). And that's where the bottleneck comes into play.
not really, how often does a routing table change especially for a switch that is unlikely to be holding an internet BGP table, months :D. no lookups are done against the RIB, also the FIB doesn't become invalid because the RIB has changed, once the RIB is up to date, its pushed out to the FIB's. (the same way the RIB doesn't become invalid because OSPF saw a new database descriptor).

http://saloperie.com/docs/BRKARC-3470.pdf
have a look around slide 21, i have power points of these with more detail and that are newer i just dont think im allow to publish them..... but the point is the forwarding path has nothing to do with the CPU's the RIB or anything in DRAM.
 
Last edited:

JimKiler

Diamond Member
Oct 10, 2002
3,561
206
106
So Intel is touting TB as the uber cable to rule them all but is not compatible with USB? Then this will take a long time to gain mass acceptance since all current phones are USB

Hopefully unlike USB they can agree to stick with one or two at the max connectors and prevent proprietary connectors.
 
Last edited: