Internal loopback bandwidth

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
Hi,
I'm wondering what the internal loopback bandwidth of a system is, and is it dependent on the PCI bus, the network adapter, or the system speed?

I ask because I'm sending and receiving UDP packets on the same system (sending on localhost and receiving on localhost also) and my NIC is an integrated 100Mbps on my NF3 mobo, and I'm getting a speed or about 30MB/s, which is higher than 100Mbps, but not as fast as I expected. I'm starting a timer on the receipt of the first packet and stopping the timer after the entire transfer completes, and I'm using relatively large files(30+MB).

What kinda max transfer speeds should I be expecting?
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
I don't think it's the hard drive, because the timer is started upon receipt of the 1st packet on the receiver side, and stopped after the transfer completes, also on the receiver side. The hard drive isn't accessed between the start and stop of the timers.
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
well you're talking bandwidth here....

If the info never leaves the computer then the constraint is in the computer.

Think about the amount of code/drivers/etc that you are working from. I'm talking from a computer system point of view where the slowest componet is the hard drive
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
OK, I see your point, but let's say that in this case the HDD was taken out of the picture. What else might limit the bandwidth to ~30MB/s? The code that I'm working with simply sends UDP packets in packets of 1KB 512 at a time, receive an ACK, then sends another 512 packets. The server receives these packets 512 at a time, sends an ACK, the waits for the next 512 packets. All this keeps happening till the data transfer completes.

I guess what I'm asking is, is it possible to conclude anything from the measured bandwidth, where the bottleneck is? (the transfer software? the cpu? the memory? the drivers?)
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
well you've got a pretty poor way of moving data.

one packet, wait for ack
one packet, wait for ack
one packet, wait for ack

So all that waiting adds up. To test overall bandwidth one way transmission without ack'ing would be preferred. Or some kind of window mechanism where you sends dozens or hundreds of packets before needing an ack.
 

eigen

Diamond Member
Nov 19, 2003
4,000
1
0
Originally posted by: spidey07
well you've got a pretty poor way of moving data.

one packet, wait for ack
one packet, wait for ack
one packet, wait for ack

So all that waiting adds up. To test overall bandwidth one way transmission without ack'ing would be preferred. Or some kind of window mechanism where you sends dozens or hundreds of packets before needing an ack.

Thats what he is doing.
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
haha yeah, I'm sending 512 packets, each one of 1KB size, then waiting for an ACK. If i was sending just a 1KB packet and waiting for an ACK then perhaps the bandwidth makes sense...
 

randal

Golden Member
Jun 3, 2001
1,890
0
71
Do keep in mind that you are theoretically consuming 2x the bandwidth on the PCI bus - UDP packets go out across the bus, then those same packets take up bandwidth again as the re-cross the bus on their way back in ... all of that is -if- it's actually leaving driver land and hitting the bus. If that *is* true, then if you're doing ~30MB/sec, you're actually doing right around 60MB/sec, which is 480mbps. Considering that the PCI bus (@ 33mhz) is 1056mbps wide, you're hitting nearly 50% utilization for just doing transfers. The rest of the bus is probably consumed by things like processing that data, moving memory around, HD access, video?, everything else on your machine.

That's all if it's actually hitting the bus. If it's not, then you should be seeing 1000mbps easily. $.02
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
That's the kind of explanation that I'd like to see. How would I know if it's actually hitting the bus though? Does an internal loopback test on localhost(127.0.0.1) actually reach the PCI bus?
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: Goi
That's the kind of explanation that I'd like to see. How would I know if it's actually hitting the bus though? Does an internal loopback test on localhost(127.0.0.1) actually reach the PCI bus?

You would have to talk to someone who knows more about the actual driver..

what platform are you running on?

If you are using Linux and the builtin loopback device for ethernet then you could just look at the code for the loopback driver to figure it out.
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
I'm running this on WinXP on a Chaintech VNF3-250 motherboard based on the NF3-250 chipset and the integrated Realtek RTO8100C PCI NIC. My software is a simple client-server model that sends and receives UDP packets on 127.0.0.1
 

randal

Golden Member
Jun 3, 2001
1,890
0
71
OH, if it's on Localhost, then windows should be handling that at the kernel level; it won't hit the PCI (southbridge) bus. Weird ... I would expect a lot more throughput than that. Have you tried tuning your packet size and seeing if that makes a difference? Also, what kind of CPU utilization are you seeing while it's doing this test?
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
Well, the server is in a while(1) loop waiting on a select() so CPU utilization is close to 100%. I'm not sure how to get it down though. I'm only doing a recvfrom when the read buffer shows that something is being received.
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
OK, after somemore simulations, I have some rather weird results...

I tried varying my window size(the 512 packets) and my packet size(the 1KB size). Keeping the packet size at 1KB, I've found that a window size of 512 is about as large as I want it. Beyond that I get more dropped packets than it does good. Bringing it down to a really small value like 8 hurts performance though.

However, here's the kicker. I then kept the window size a constant 512 and increased a the packet size from 1024 to 1280 bytes. I got a healthy increase from the original 23MB/s to ~28MB/s. Encouraged, I increased it to 1440 bytes, and got slightly over 30MB/s. I thought that this would probably be the max since I thought that the MTU of packets was something like 1.5KB including header information. However, I went a bit crazy and tried a 10KB packet size and got a transfer rate of like 90KB/s! Then I went really crazy and tried a 40KB packet size and got over 130MB/s.

Can anyone tell me why such large packet sizes are good? Don't packets have MTUs? I don't quite get my results...

Thanks!

p.s. in case you're wondering why my original 30MB/s in my first post became 23MB/s, well, I don't know either. The original 30MB/s figure I got was from a couple months back. Perhaps I was measuring something else or something, but now with 1KB/512 packets I'm only getting 23MB/s.
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
well packet size and MTU are only closely related.

MTU is maxium layer 3 and up (for ethernet and most interfaces this is 1500 bytes...add 14 bytes for layer 2 header and checksum and you have the maximum ethernet frame of 1514 bytes)

If you send a packet of data larger then the MTU can handle (IP layer, plus your payload which could include TCP or other layer 4 info) then the IP layer will fragment the packet and just send the chunks of the packet.

So ideally a larger packet size is better (somewhat, to go into detail would be better suited by research) giving less overhead for layers 3 and 4.

This reassembly of fragmented packets is done in the driver and could result in better performance for you since you're using a loopback. The loopback may not have any real MTU because it is an internal interface. MTU is defined by the phyiscal interface (ethernet, FDDI, WAN, token ring, etc)
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
I see. So if I actually connect 2 computers together over ethernet a larger packet size (specifically packet sizes that are larger than the MTU) might not be better? I don't get why there would be such a huge difference in bandwidth (over a factor of 4) between a 1KB packet and a 40KB packet under the loopback test though. It's not like I'm waiting for anything to happen after a packet is sent...my sender is still sending the packets as fast as it can...or is it? If I'm only getting 23MB/s on a loopback test with 1KB packets, wouldn't that mean that that's the max amount of bandwidth I'm able to get out of any interface with 1KB packets? I.e. even if I use Gigabit ethernet I won't be able to get more than 23MB/s with 1KB packets? That doesn't seem right to me...
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
It depends.

Generally fragmenting an IP packet is a bad thing because it adds the overhead of another IP header. But in your case since you "may" not have a real MTU it could be a good thing since the packet may not be fragmented.

Make sense?

If we're talking real world here going across a wire you want to avoid fragmentation of a packet in transit.

Every IP packet needs processing, so if you had loads of IP packets totalling 60K of data vs 1 60K packet of data then...

60K of data spread over 1000 byte packets = approx. 60 packets and each and every packet must be processed and re-assembled.
60K of data spead over 60K byte packet = 1 packet and associated processing.

see how this could cause unusual, even exponential results depending on how the driver/kernal handles it? The fact that you are pegging the CPU (as I thought) means you are CPU bound which leads me to believe the larger the packet you can send the better.

Also there is the acknoledgement factor and if you are using TCP or not as that adds even more overhead/processing. You mentioned window size which is generally a TCP paramenter on how much data can be sent before an ACK is expected - this is to handle lost packets and allow the TCP alogrithym to size the window accordingly. In your case you are going to a loopback meaning you should never have any loss which means you can use a window size of the max (which is a couple megabytes I believe...used to max out at 2^16 but with modern OS they use a TCP window scaling option to go much, much larger)
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
Well, I'm not using TCP. I'm actually using UDP, but I'm implementing a window size to handle lost packets as well since UDP Is connectionless. My window size is a fixed 512 packets(no sliding window), and ACK/NAKs are for an entire window.

If there's so much processing associated with packets, how is Gbe ever able to come close to its theoretical max of 133MB/s? Assuming packets are kept to below 1.5KB, the max I'm able to get even with loopback is ~30MB. I guess I'm just confused as to why a 1KB packet size would be that slow. Surely the kernel/driver can't be that slow could it?
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
Originally posted by: Goi
Well, I'm not using TCP. I'm actually using UDP, but I'm implementing a window size to handle lost packets as well since UDP Is connectionless. My window size is a fixed 512 packets(no sliding window), and ACK/NAKs are for an entire window.

If there's so much processing associated with packets, how is Gbe ever able to come close to its theoretical max of 133MB/s? Assuming packets are kept to below 1.5KB, the max I'm able to get even with loopback is ~30MB. I guess I'm just confused as to why a 1KB packet size would be that slow. Surely the kernel/driver can't be that slow could it?

All depends on how the kernal handles it.

Good to know that you are using UDP as that takes the processing/overhead out of the question.

I think you just answered your question - you are using a fixed window size of 512 packets. Processing of IP packets takes time and processor cycles.

Gigabit ethernet is able to approach its speeds with raw processing power and with the help of jumbo frames. I forget the going rate but to really fill a gigabit network card you need a 4 way box that isn't running windows. Network throughput has never been MS operating systems strong point and quite frankly it cripples it.