prioritising UDP packet with semi real-time data

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
We disagree here then.
I don't think collisions are important. Collisions are part of the Ethernet spec. They are intended. They are part of the mechanism. Over the years, people have tried to smack-talk Ethernet for marketing reasons. I know IBM sales-men did, because they wanted to sell their overexpensive Token-Ring interfaces. (See my remark about Token Acquire Time earlier in this thread). It seems sales people of ethernet-switches have done it all over again.

I disagree. And 500-2000 ms is not what you'd usually call time-sensitive on a LAN.

By "UDP driver" I mean the lines of code in the kernel that deal with anything UDP related. In this case it means accepting a bunch of bytes from a write(2) systemcall. Encapsulating the bytes in a UDP header, and heaving it over to the IP driver (the lines of code that deal with IP). Although inside Unix kernels this split doesn't exist so clearly.
One scenario that might be happening is that these lines of code drop the packets even before handing it over to the Ethernetcard.

I think this scenario is more viable than a problem with collisions. But it shows we disagree here. Think2 can examine the problem in more detail. (Look at counters, do you see collisions ? Look at the wire, do packets make it out over the wire ? Etc). It would be cool if he reported back here once he found out what actually happens.

I can tell from your responses that you are not reading or understanding how this works.

Again: Collisions only apply on half duplex hub environments. It no longer applies in a switched full duplex environments. It is turned off at the card level. Collision detection was even removed from the 10GB Ethernet because "hubs" are no longer supported. I do not care about token ring, nor Banyon Vines. It was common real life experience that hubs, pushed beyond a certain utilization incurred a collision penalty. This is simple math. The closer the wire is to 100% the higher chance that a frame would collide. Back in the early 1990's computers had a hard time filling a 10Mbps pipe so this wasn't and issue. When 10Mbps switches appeared, the issue went away.

From the statement of "the issue only appears when the computer is uploading the "10 to 4MB files" this would indicate link saturation colliding off other members. It happens and was expected back in the 1990's. Today collisions are not expected and are remnant of an now antiquated technology.

I acknowledge that this might not be the main issue but your complete disregard for it and telling the him that "this can't be it" does not help him. I am not smack talking ethernet or what ever you are coming up with. I am smack talking the fact that he is running 20 year old gear. The 10Mb link is likely also seeing tons of port floods, arp broadcasts, ip broadcasts and other junk that is not helping the matter. There is a reason CSMA/CD was removed from the Ethernet spec.
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
Anyway, after my last post, I realized that the custom protocol that Think2 is dealing with is probably a simple stop-and-wait protocol. So everything I wrote about flow-control is probably nonsense.

Also from reading the OP's post again, I missed the fact that the files being copied are not being copied by the custom protocol. But probably by Windows filesharing or something. I must be blind. Everything I said about flow-control is not applicable at all.

Yes, so the problem might just be a saturated network. Collisions or not, if the network is saturated, then indeed packets with a 500-2000 ms response-requirement might not make it, just because the wires are full. Even if you remove collisions, the network is still full, and some packets will not make it. Best solution would indeed be to have 1) have a dedicated network for the application, or 2) have switches at the remote devices that can do QoS.

Because I can't keep my mouth shut. As far as I can remember, the number at which collisions become a problem is 90%+. This might sound bad. But you should realize that at 100%, you have a problem anyway. And even with a high number of collisions, the network still keeps functioning relatively well. I am sure you will disagree.
 
Last edited:

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Because I can't keep my mouth shut.

Same here. Gets me in trouble at times. :<

I do agree that CSMA/CD does work. It just gets... wonky the farther out you get in the specs. IE to many segments and it goes wonky. [IE the old 3-4-5 rule that no one remembers anymore] To many packets.... wonky. Crappy "NE2000" generic NICs that thing "Random" = "Always 100ms"... wonky. I do agree I think the controllers are likely doing a fire and forget(pray) so who knows.

CSMA/CD is what is being used in wireless now so it isn't going away. It is just gone in the wired arena for the most part if not completely in most shops.
 

think2

Senior member
Dec 29, 2009
223
2
81
Thanks for all the comments guys. I'm still investigating but it might take 2 or 3 more days as it's just a background task at the moment. I want to find out what's going on even though hubs and 10 Mbps links won't be used with our system, just in case it's a bug.

Our LAN at work has 3 or 4 "outer switches" feeding into one central switch that the file server is connected to. The devices involved in the testing I'm doing are all connected to the same outer switch so nothing goes via the central switch.

I don't know what you mean by network saturation but let's say that the outer switch is called switch A. When 40 MB of data is sent from the file server through switch A and down the 10Mbps link, switch A might become so full that it throws some of our packets away - is that what you mean by network saturation? If one of our devices wants to send a message towards switch A when there's lots of data coming the other way, it seems to me that our device should get to send its packet within 100 milliseconds - there's only 3 devices connected to the hub and they all have "equal access". Anyway, I'll let you know if I find out what's happening. The ethernet chip we're using is a CP2200.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Thanks for all the comments guys. I'm still investigating but it might take 2 or 3 more days as it's just a background task at the moment. I want to find out what's going on even though hubs and 10 Mbps links won't be used with our system, just in case it's a bug.

Our LAN at work has 3 or 4 "outer switches" feeding into one central switch that the file server is connected to. The devices involved in the testing I'm doing are all connected to the same outer switch so nothing goes via the central switch.

I don't know what you mean by network saturation but let's say that the outer switch is called switch A. When 40 MB of data is sent from the file server through switch A and down the 10Mbps link, switch A might become so full that it throws some of our packets away - is that what you mean by network saturation? If one of our devices wants to send a message towards switch A when there's lots of data coming the other way, it seems to me that our device should get to send its packet within 100 milliseconds - there's only 3 devices connected to the hub and they all have "equal access". Anyway, I'll let you know if I find out what's happening. The ethernet chip we're using is a CP2200.

Network saturation means exactly that. (switch A becomes so full it starts to throw stuff away)

Can you clarify.... are you building these controllers and custom coding them? The CP2200 is a "MAC on a chip" and includes drivers and a TCP/IP stack.
 

think2

Senior member
Dec 29, 2009
223
2
81
Yes, we designed the controller and build them ourselves and we wrote the software.

[Edit] - the guy who wrote the software has since left so someone else is looking after it. I wasn't happy with the idea of telling customers they need to have no more than 0.1 percent packet loss to use our system (it's not a complete enough description) and I don't like mysteries so I want to understand why we're getting discards - even though half duplex won't occur in the field.

We won't get saturation when the network is dedicated to our controller but on a shared network it seems that it might so I also want to see if there's some way of handling the saturation better than what we do at the moment.
 
Last edited:

robmurphy

Senior member
Feb 16, 2007
376
0
0
With CSMA CD collisions become a problem way before 90%. Collisions cause re-transmission and that causes more re-transmissions.

The only reason I have used a hub in the last few years is to take traces.

Its also very difficult to buy new hubs these days as hardy anyone makes them. I've not seen a new one on sale for about 6 years or more.

Rob.
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
I wonder if people still keep telling each other you can't have more than 50 routers in an OSPF area ?
 
Last edited:

think2

Senior member
Dec 29, 2009
223
2
81
Still investigating but... with two of our devices (X and Y), a PC and the switch in the central server room connected to the hub, when a large amount of data is being sent from the server to the PC via the hub, the two devices X and Y have difficulty getting messages to each other. Which I guess is either because the central switch starts transmitting just slightly sooner than X and Y so monopolizes the line or the switch overpowers our devices and doesn't see collisions properly.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Still investigating but... with two of our devices (X and Y), a PC and the switch in the central server room connected to the hub, when a large amount of data is being sent from the server to the PC via the hub, the two devices X and Y have difficulty getting messages to each other. Which I guess is either because the central switch starts transmitting just slightly sooner than X and Y so monopolizes the line or the switch overpowers our devices and doesn't see collisions properly.

Have you tried a full duplex switch at all? Now that I know that you are rolling your own I would think wireshark traces might give you some better ideas what is going on.
 

think2

Senior member
Dec 29, 2009
223
2
81
Yep, we are going to try a full duplex switch in the next couple of days and we might do a wireshark trace too.
 

robmurphy

Senior member
Feb 16, 2007
376
0
0
Remember if you take a trace using one of the switch ports all you will see is the broadcast packets. If the switch is managed you may be able to set up a monitor port for traces.

If you take traces on the present hub you also need to be aware you will not see any collisions that occur. The reason for this the collision means the ethernet frame will fail its FCS (checksum), and so will not be read by the NIC used for taking the trace.

Rob.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Remember if you take a trace using one of the switch ports all you will see is the broadcast packets. If the switch is managed you may be able to set up a monitor port for traces.

If you take traces on the present hub you also need to be aware you will not see any collisions that occur. The reason for this the collision means the ethernet frame will fail its FCS (checksum), and so will not be read by the NIC used for taking the trace.

Rob.

All NICs will see the jamming signal. Some can be set to report. Wireshark will set it if it can.

And just to put it out there: A failed FCS is not a collision, it is a corrupted frame that gets "silently" dropped at the NIC (it will increment the error counter.) A collision is detected in half-duplex via a listening loop in the MAC. Basically the transmitting NIC will also listen to the line. If it "hears" anything other than what it is transmitting, it will emit the jamming signal, back off, listen and retry after a random time. Remember that only half duplex uses CSMA/CD.
 
Last edited:

think2

Senior member
Dec 29, 2009
223
2
81
The problem goes away completely when we use a switch
http://www.dlink.com.au/products/?pid=DIR-120
With a 200 MB file transfer taking 25 seconds, our devices don't do any retries at all on the IP network. However, this doesn't make it any clearer to me as to how we can advise our customers as to when they can use our devices on a shared network as some switches may be better than others at handling saturation. We might have to specify that the system has to have no more than 0.1 percent packet loss under prolonged saturation loading anywhere on the network.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Have you done any tracing? Can you verify that the 10/half hub you were using is good? If it works on the switch I think that rules out the cabling. Honestly if you want it to "work", you need to use TCP. TCP will make sure it gets there. UDP is fire and forget / pray.
 

think2

Senior member
Dec 29, 2009
223
2
81
No we haven't done any tracing yet with the hub. Do you think we would learn something useful if we did? The hub has a red "collision" LED that is mostly on (didn't actually watch too closely) while a large file is being transferred. Since our device doesn't need to work in half duplex mode, I thought that even if there's some problem with the CP2200 chip that stops it getting equal access to the bus during heavy load, there's nothing we can do about it.

I'm not sure whether switching to TCP is viable at this stage or how long it would take to implement. The rate of data sent between any two of our devices is quite low so doesn't need the "send ahead" and rate management that TCP provides. In default mode, our device sends each data packet to every other device on the network e.g. 20 other devices means 20 packets to send. As far as I can tell, TCP detects a lost packet by failure to receive an ack after which it retries - which is the same as what we're doing with our UDP packets, so I'm not sure that TCP would be any better at handling congestion than what we do at the moment. I think we'll probably increase the number of retries that we do. I guess some network switches might discard UDP packets before TCP packets which would be an advantage to TCP.

I'm trying learn more about congestion management. I found these articles
http://www.linktionary.com/c/congestion.html
http://www.ietf.org/rfc/rfc2914.txt

If TCP slows down when it detects congestion after a few seconds, that should hopefully allow our packets to get through too, unless the interval between our retries wasn't fast enough to get in during the non congested periods. I don't know whether VOIP, video streaming etc can cause network congestion but we can probably require that a dedicated network is used if the network is frequently congested.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
If your application is truly semi-realtime, UDP is a perfectly acceptable protocol. TCP makes sure your packets get to their destination in one piece, but the latency can be a bit unpredictable.

If you want to guarantee performance, junk the hubs and get switches that support 802.1p and can place 802.1p-tagged frames in the appropriate priority queues. Managed switches can automatically tag frames within a particular VLAN as having a higher priority. If there's congestion, the switch will send frames in higher priority queues first no matter when that frame was received, which will minimize the impact of congestion.

As for customer specifications, you've already described how your protocol works, as well as its timing and retry limits. Include that in your spec and let the customer work out how to achieve it.
 

think2

Senior member
Dec 29, 2009
223
2
81
If your application is truly semi-realtime, UDP is a perfectly acceptable protocol. TCP makes sure your packets get to their destination in one piece, but the latency can be a bit unpredictable.

If you want to guarantee performance, junk the hubs and get switches that support 802.1p and can place 802.1p-tagged frames in the appropriate priority queues. Managed switches can automatically tag frames within a particular VLAN as having a higher priority. If there's congestion, the switch will send frames in higher priority queues first no matter when that frame was received, which will minimize the impact of congestion.

As for customer specifications, you've already described how your protocol works, as well as its timing and retry limits. Include that in your spec and let the customer work out how to achieve it.

Thanks. I looked up what 802.1p is and found there's a 3 bit "priority code point" PCP field in the mac layer that a layer 2 switch can set.


Basically to answer the OP... This is how you do it. Drop the hubs, get switches that can QoS at around layer 3, then make them bump the IP addresses of the industrial controllers to the top of the queue. This is exactly the same as VoIP except you would use the IPs rather than DSCP.

Our devices don't (normally) have static IP addresses (I think) - we allow it though.

I read in this document that there's a TOS (type of service) field in the IP layer.

http://www.westermo.com/dman/Document.phx/Manuals/Manuals%20for%20UK%20only/ontime/Articles/VoIP%20Drives%20Realtime%20Ethernet.pdf

Any idea what we should set the TOS field to so that a layer 3 switch can/will prioritize our packets. The port number used at the UDP layer by our devices is field configurable. The CP2200 device we're using doesn't appear to allow us to set the PCP field in the mac layer.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
Yhe CP2200 device we're using doesn't appear to allow us to set the PCP field in the mac layer.

The priority code and the corresponding queues are configured on the switches. Some host nodes can tag frames as well, but hosts that participate in a QoS scheme generally use higher-level schemes like DiffServ.
 

think2

Senior member
Dec 29, 2009
223
2
81
The priority code and the corresponding queues are configured on the switches. Some host nodes can tag frames as well, but hosts that participate in a QoS scheme generally use higher-level schemes like DiffServ.

Thanks. I just learnt that TOS (type of service) that I mentioned has been replaced with DSCP (DiffServ). I'm thinking we might need to allow the 6 bit DSCP value to be field configurable.