Excellent read cmetz, love the history. I don't dispute anything you say, but would like to add the following counterpoint.
First, the impact of CPU utilization is going to extremely different when you compare a production server vs. a home computer. In a production server, you probably need to optimize every cycle, and tossing them away in the networking is going to hurt. In a home computer, you can also want to save CPU cycles, and I know some enthusiasts will have cases where it really matters, but in many cases, people will not see a significant difference. So on a production server, it probably makes sense to study NIC's and drivers in minute detail to get best performance / CPU cycle and not worry about even hundreds of dollars, whereas in a consumer setup, generally the reverse will be true.
Second, CPU utilization is going to scale with load. If nothing's coming through the pipe, then the CPU will be mostly idle. So in a case such as internet downloads at typical consumer rates, CPU's are going to be mostly idle. Even when you stress a 100 Mb/s line, a modern CPU is probably not going to see much load from networking.
With gigabit, things change further, but I contend that for the typical consumer, it's still not a huge issue, whereas the reverse is true with production servers.
I have on hand 3 budget consumer gigabit NIC's, which I've measured. (Dirt cheap with rebates is fairly accurate. No slight is intended to the $20 Intel -- if I'd seen one at the time, I'd probably have bought it.)
1. MachSpeed / SysKonnect SK-9521 using a Marvell chip.
2. TrendNet using a RealTek 8169_8000 family chip.
3. Built-in NVidia NForce 430 gigabit MAC with Marvell PHY.
Throughput measured using TTCP benchmark.
1. 89 MB/s
2. 98 MB/s
3. 113 MB/s
CPU utilization for above transfer measured via PerfMon.
1. 19%
2. 52%
3. 38%
4.5 GB file transfer to RAID timed in script.
1. 53 MB/s
2. 60 MB/s
3. 62 MB/s
4.5 GB file transfer to single IDE timed in script.
1. 44 MB/s
2. 44 MB/s
3. 45 MB/s
(There is some trashing in this case, in waiting for the drive/flushing the cache, I presume. These numbers were observed, but there was variability -- these are representative.)
So what do we conclude? From a CPU utilization / server perspective, the Marvell/SysKonnect card is best, the NVidia comes in second, and the RealTek/TrendNet last. (NVidia 100% worse, RealTek 174% worse than the Marvell.)
From a throughput perspective, NVidia comes first, RealTek second, and the Marvell last.
NVidia comes first from one "balanced" perspective -- highest throughput with middle CPU utilization, and also first when CPU utilization does not matter, as is the case often for a consumer, e.g. during simple file transfers.
Now when you consider the IDE transfer case, which is more typical, all of the network cards come out roughly the same in throughput, and the CPU utilization impact will also be lowered due to reduced throughput. In this case, the choice of cards doesn't matter at all, except for extra dimensions such as compatibility, and more importantly consideration about potential saturation of the PCI bus in some circumstances.
I don't present these numbers and conclusions as definitive. There are lots of other choices and variations out there, and we haven't seen what the Intels can do for example, and details in hardware, drivers, or software or test data might skew things differently.
However, in this case, for me, the conclusions are different from what you might advise from a server/commercial perspective. If I didn't have built-in gigabit, and cared mostly about large file transfers, I'd take the RealTek over the Marvell, despite its poor CPU utilization. I find it strange defending an apparently poor implementation, but that's where my logic takes me...
- with apologies to the OP if he finds this uninteresting.