• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

serial transfers

Cogman

Lifer
So for my computer IO class, I am dealing with data transfer via the parallel port. For this assignment, we are to treat the parallel port as a serial port.

I currently have a data transfer method that works completely using only 2 pins (a command pin and a data pin, for each computer). The method I am currently using will change the data on the line, and wait for the computer on the other side to say "Yes, I got the last bit.".

This method works perfectly, however, I can't shake the feeling that I am going slower then I should be (We are supposed to try to get this going as fast as possible).

With this method, I get about 35,000 bps. When I disable verification to check and see how fast the output is able to switch, I see that the maximum speed is about 200,000 bps.

So here is what I've been thinking to get a faster transfer. One method is to have a timer in each of the computers measuring how long it should take for each bit to be sent. This requires the computer to time their reading speed and their writing speed, then communicate the worst case across the line (using the first implementation method). After that, I could set the control pin to high to signify a write is coming through, then wait for the other computer to respond with a high control pin. Once the response is received, I can drop the control pin and send the next eight bits across the line in the agreed intervals.

This should eliminate the need to do verifications across the board, however, is much more complex then the current system is, and more prone to error. Would this system of communication be faster though?

Other ideas are along the lines of having one computer control a 3rd pin as a clock and then reading the data at high pulses and changing at low pulses. This would be less complex, though a reasonable clock speed will still have to be communicated some how.

Any suggestions for a faster system of data transfer (No, parallel transfer isn't an option 😛 we already did that one)? At most I would like to have only 3 pins for each computer to communicate across.

*cliffs*

* Serial data transfer, what method is the fastest.
* I spout off what I am doing and what I think is a viable alternative.
 
Last edited:
Well there are many examples of serial communications protocols out there that it may be helpful to look at for ideas. I definitely agree that validating each bit will be too inefficient. I2C, for example, validates by bytes and only uses two wires, one for commands/clocking and one for data. I'd recommend taking a look:

http://en.wikipedia.org/wiki/I%C2%B2C

I'm sure there's some trade-off between errors and block size that would fix some optimum for your specific setup, but you'd have to experiment to find it.
 
Start off by finding out how fast your software can read from the port in the first place -- that will upper bound your bps.
 
Well there are many examples of serial communications protocols out there that it may be helpful to look at for ideas. I definitely agree that validating each bit will be too inefficient. I2C, for example, validates by bytes and only uses two wires, one for commands/clocking and one for data. I'd recommend taking a look:

http://en.wikipedia.org/wiki/I²C

I'm sure there's some trade-off between errors and block size that would fix some optimum for your specific setup, but you'd have to experiment to find it.

This seems simple and effective, a parity bit like dial-up modem software used back in the day. Reducing the checks to 1/8 the current amount might almost double your speed.
 
Start off by finding out how fast your software can read from the port in the first place -- that will upper bound your bps.

Well, I have a sneaking suspicion that reading isn't the only limiting factor. This isn't a straight through parallel connection, rather, I am using two 8255 in between to create the different ports. I imagine that has some lag time for writing as well.

BTW when I say that I validate bits, I mean that I pulse a signal to say that I've recorded the bits down and am ready for the next, not that I echo the bit back or anything.

Either way, I have some time to waist today, so that will probably be what I do. (benchmarking different bit transfer methods). Thanks for the suggestions guys.

*edit* The more I think about it, the more I think that the teacher probably won't mind some "Magic" numbers. This could vastly simplify the design. I'll probably just use the windows Hi-res timers to keep the timing, and then record how long a read/write takes, after that, I should be able to just say "Read, wait, read" or "Write wait Write". because of the way I'm set up, I'll still have to acknowledge that I am receiving, but that shouldn't be too bad, its one check per byte vs 8 checks per byte.
 
Last edited:
Could just send a parity bit after each byte, and then have teh receiver ack if it matches?

not perfect, of course..
 
Parity is OK; but if you want to get complicated you might want to look at Hamming codes, which is what ECC RAM uses.
Particularly popular is the (72,64) code, a truncated (127,120) Hamming code plus an additional parity bit, which has the same space overhead as a (9,8) parity code.
 
Parity is OK; but if you want to get complicated you might want to look at Hamming codes, which is what ECC RAM uses.

Error correcting codes are fine on parallel channels with tons of bandwidth. If you're going for maximum performance on a bandwidth-constrained serial channel, profile your channel and find out how much loss to expect, and use retransmissions+detection for correction. Between two machines with a decent cable, I wouldn't be surprised if your loss rate was pretty low.

If thats the case, and since you're making your own protocol, there's no need to limit yourself to byte transfers. Transfer a few kilobytes at a time, and send a hash at the end instead of parity. Recompute hash in a separate thread at the other end to make sure you didn't take any bit errors along the way.

To decide how big to make your chunks, maximize your goodput. Your goodput is:
(1.0-P(fail)) * SendRate
Where both P(fail) and SendRate are functions of the chunk size. SendRate probably grows with chunksize, as does P(fail), so there could is usually an optimal point at which your goodput is maximized on your channel.

As I said before, your limiting factor on speed will probably be determined by the interface to the port at some level. Parallel ports usually don't have high-speed SERDES attached to them (if yours does, I want your motherboard). If you're polling the port to read from it/write to it, figure out the round-trip time on each read/write call. Obviously, your peak throughput is the inverse of this time. If you're using interrupt-based signalling, there's an equivalent measurement to be made by logging the interrupts and measuring mean time between interrupts.
 
ahh, just noticed you mentioned you have 8255's inbetween... are they absolutely necessary?

Cause they're going to add a lot of overhead.
 
I suspect your limitation is at the CPU. Polling, spin locking, port IO to slow hardware, interrupts per bit, synchronous communication, etc. You're not going to be able to match the speed of a true hardware serial port without a hardware UART with FIFO buffers, shift registers, and interrupts built for true high speed asynchronous communication. The 8255 is designed for slow parallel communication, it's not going to handle high speed serial very well like a 16550.
 
Last edited:
I suspect your limitation is at the CPU. Polling, spin locking, port IO to slow hardware, interrupts per bit, synchronous communication, etc. You're not going to be able to match the speed of a true hardware serial port without a hardware UART with FIFO buffers, shift registers, and interrupts built for true high speed asynchronous communication. The 8255 is designed for slow parallel communication, it's not going to handle high speed serial very well like a 16550.

Dude, the CPU is NOT the limiting factor. In every test that I've performed, the CPU is always waiting for the ports on the other end to update.

As for the 8255 comment someone else had. Yes, the 8255 are absolutely necessary. They are part of the lab constraints.
 
Well you'll just have to accept that it's going to be slow then, because with the 8255 there, it will take three writes to flip a bit...

I don't quite understand the premise of the class... what's the fastest way of doing it wrong? 🙂
 
Well you'll just have to accept that it's going to be slow then, because with the 8255 there, it will take three writes to flip a bit...

I don't quite understand the premise of the class... what's the fastest way of doing it wrong? 🙂

🙂 lol, I think it is just to get us to experiment with the different methods of serial transfer. Some will be faster then others.
 
Ok... so what is causing the sender to take so long?

Could be a couple of things. My guess is that it is a combination of waiting for the parallel port to update, and waiting for the 8255 to refresh with the new data. A lot has to happen before the signal becomes stable.
 
Could be a couple of things. My guess is that it is a combination of waiting for the parallel port to update, and waiting for the 8255 to refresh with the new data. A lot has to happen before the signal becomes stable.

OK, then lets go back to basics. How long does it take to do an handshake operation?

Have machine Alice transition a pin from low to high. As soon as machine Bob sees the pin go high, have Bob transition a different pin from high to low. Have Alice record the time for the whole operation.

Repeat that a thousand times with opposite polarity for Alice and Bob.

Along the way, check both machines' voluntary context switch counts. They should be at or near zero. getrusage()

Once we've figured out the minimum round trip time, we can approximate the one-way latency as half of that. 1/Latency is an upper bound on achievable BPS.

Now, to figure out where the bottlenecks are, we have to change some variables. Basically, the time in our experiment is divided like so:

T = Talice-cpu-send + Talice-parport-send + Tbob-parport-recv + Tbob-cpu-recv + Tbob-cpu-send + Tbob-parport-send + Talice-parport-recv + Talice-cpu-recv

We can't really change the time taken by the parallel port hardware without changing the hardware itself. We also can't easily measure that overhead. But we can measure CPU overhead. Therefore, the above formula has four unknowns: Talice-parport-send, Tbob-parport-recv, Tbob-parport-send, and Talice-parport-recv

Also, we can change Bob's behavior. So repeat the experiment with a configurable delay between Bob seeing Alice's message and Bob responding with his own.

AKA
T = Talice-cpu-send + Talice-parport-send + Tbob-parport-recv + Tbob-cpu-recv + Tbob-delay + Tbob-cpu-send + Tbob-parport-send + Talice-parport-recv + Talice-cpu-recv

In fact, repeat the experiment three times, with three configurable delays, D1, D2, and D3 inserted.

This will leave you with a total of four linear equations of four unknowns... which you can solve with linear algebra to find out how much overhead is coming from your hardware.
 
Dude, the CPU is NOT the limiting factor. In every test that I've performed, the CPU is always waiting for the ports on the other end to update.

Oops, of course not. I said that wrong, in the context that the CPU is probably doing absolutely nothing most of the time (see the stuff I said after that).

There isn't going to be much you can do about the hardware. You might look into some kind of entropy encoding or something to pack the data and minimize bit transfers. Sending raw data literally bit per bit is going to be the slowest.

With 3 lines I'm thinking of something that looks like RZ encoding to cut the switching time in half on each individual line to allow more stabilization time at high speeds, or even a differential scheme.
 
Last edited:
After your profiling is done, if it turns out you have excess CPU time, take a feather from NetZero's hat and use your CPU time to compress both ends of the pipe before you send it. Voila! Instant bandwidth!
 
Are there performance requirmetns for this assignment?

Making this a curiosity at this point?

Just my curiosity is all. Sounds like a fun experiment to be doing!
 
Get Ralf Browns interrupt list and write directly to the 8255's using assembly and interrupts.

... INT is a privileged instruction. Plus, I don't gain anything by using it, I basically have direct access to the port using the MDE drivers that we already have.

We've moved past the 8255's already, it turns out that even doing mine the dumb way, it was faster then everyone else in the class. (cause I'm cool like that 🙂)

We've started doing transfers using the USB port and the FT2232 USB development boards (These things are PITA's, If you try to read, write, or just query the general status of the board too often, they will go into an irreversible state that you literally have to unplug/replug in the USB cable).

We are doing a networked system now with multiple computers, to do communication I've made a token ring network. I take advantage of the fact that reads are faster then writes and constantly monitor the port. Since we are just sending text messages, I use the upper most bit as a clock bit. I then read the port as fast as I can, monitoring that clock bit. If it changes, I record the data and wait for the next change.

The biggest issue I'm now having is "How often is too often to read from the port". As they don't die instantly, they will error out fairly randomly. (Which really kills a token ring network 🙂). Basically I'm just trying to slow down reads to a point where the thing doesn't error out on me (I don't have to worry about writes luckly)
 
Back
Top