Originally posted by: Peter
VirtualLarry, burst writes to AGP cards are _NOT_ faster than writes to system RAM.
They should be, in most cases, because the AGP port interface on the chipset includes a buffered write queue - it can accept the write cycle from the CPU immediately, and return immediately, without having to wait for the DRAM memory controller or a PCI device. It also doesn't have to do any cache-snooping accesses. At worse, they would be the same speed, assuming likewise write-buffering in the chipset. You claimed that they will be slower.
Originally posted by: Peter
Like it or not, when the CPU writes into the card's frame buffer, no accelerated AGP cycles are happening, but normal (66 MHz) PCI cycles. AGP cycles happen only when the GPU accesses system RAM. The _only_ exception is on systems that have "AGP Fast Writes" actually enabled and working. (Show me one.)
For a normal, correctly operating, AGP-capable system (AGP system board and AGP video card), having "Fast Writes" enabled is in fact the norm. My system works fine with it enabled, so did my last system. Like I said, the only way that CPU writes over the AGP bus to video-card memory would be that slow, is if you were running a V3, or have all of the performance features of AGP essentially disabled.
Read
this thread thread, to see the real-world effects of Fast Writes, and how many people have their AGP systems
correctly configured, rather than
crippled.
Originally posted by: Peter
The immense speed of AGP cards' own RAM is only relevant when the local GPU is rendering; for CPU-rendered content like web pages it doesn't help zip.
Keep in mine your original assertion, that I am refuting here:
Originally posted by: Peter
VL, of course an ISA card is going to stink at that - for the same reason why shared-RAM graphics is faster than AGP graphics at 2D: BANDWIDTH.
My assertion is that is completely wrong, shared-RAM integrated graphics do NOT have higher bandwidth than AGP - if they did, then why aren't all of the high-end gaming systems using integrated graphics, instead of AGP cards? Why isn't AlienWare shipping systems with big stickers on the side, proudly proclaiming "Integrated shared-memory graphics inside!"?
Because it's not true.
In fact, 2D graphics take a lot less bandwidth than 3D, and most people understand that on nearly any video card today, 2D graphics performance is not an issue, because it takes so little bandwidth overall.
Originally posted by: Peter
This is where your theory falls apart. It's a well known fact, with equally well known effects like integrated-VGA chipsets being an order of magnitude faster in displaying HDTV video content.
I would like to see some benchmarks that prove that; I don't believe it. Integrated shared-memory graphics subsystems rob the
entire system of bandwidth; the fact that HDTV (assuming compressed-video decoding here on the order of the WMV-HD discussed in my linked thread, correct me if I'm wrong) is so CPU/bandwidth-intensive in the first place, robbing X-percent of system bandwidth off of the top, would seem to place systems with shared-memory integrated graphics subsystems at a distinct disadvantage.
That is the one point that you have consistently refused to acknowledge, that integrated shared-memory graphics steal overall system-memory bandwidth, right off the top. That's what makes them slower, period, for any system task, not just graphics. That is why the system I mentioned in my post ran UT slower, even though I wasn't even using the integrated shared-memory graphics to play the game, it was simply still active in hardware and stealing memory-access cycles in the background.
Originally posted by: Peter
In a nutshell: Of course, system memory is slower than today's AGP card memory. However, _CPU_ access to system RAM is lots faster than CPU access to RAM on the AGP bus.
Again, I totally disagree. That's only true, if you completely cripple your AGP system, in which case it might as well not be an AGP system at all, but rather a PCI one.
With AGP
correctly functioning, you have a
directly-accessable, write-buffered, non-cache-coherent port, to a very high-speed pool of dedicated graphics memory on the AGP video card. The overall speed/bus width of that memory gives peak theoretical bandwidth numbers that ensure that even if the GPU is also accessing that memory, for both drawing/acceleration functions and refresh tasks, that there is still enough bandwidth leftover, that CPU host write accesses, for the most part, are unhindered.
Compare that to accessing normal system DRAM, which has to wait for slower PCI bus devices that may want to access it, wait for the memory controller to open/close DRAM pages, bus sniffing cycles to ensure cache coherency, etc.
With integrated shared-memory graphics, it's even worse - system bandwidth is stolen for display-refresh tasks, and the CPU accesses to the same memory have to compete with that, along with the rest of the system devices, PCI, etc.
Originally posted by: Peter
This, as it happens, is one of the major reasons why GPU makers would switch to PCI-Express rather today than tomorrow: PCI-Express has the same bandwidth in both directions.
That is irrelevant to this discussion, I already mentioned that reads over the AGP bus are slow. Bi-directionality has nothing to do with write-bandwidth testing, which was the discussion here, as far as I can see.
Originally posted by: Peter
Shall I pick further, being a BIOS engineer? PLE133T is absolutely identical to PLE133, apart from the CPU bus side supporting the lower signalling voltage of the Tualatin. No functional changes were made. Nada. Zip. Niente. "Blade3D" was the name of the discrete part; CyberBlade is the integrated version of the same graphics core.
That doesn't really surprise me at all, that's why I said " (Not that it makes a whole lot of difference in terms of performance though, really)", because I figured that they would be more or less the same part. As far as the graphics core, I've seen chipset specs describing both Blade and Blade3D integrated graphics, I assumed that they were different models. I'm willing to concede that I may be incorrect about that, since I didn't do any background checking.
Originally posted by: Peter
Sure, you get 5 MB/s out of the ISA bus when you overclock it to almost twice the intended speed of 8 MHz. Does that negate my ~2MB/s number? No. 32-bit writes will be broken down to consecutive 16-bit ISA writes by the system south bridge, busses further up the heirarchy will be less loaded if you send 32-bit writes. No surprise there.
"Classical" ISA bus speed was specced at ~8Mhz, but most ISA cards at the time were explicitly specificed as supporting ISA bus speeds up to (usually) 12.5Mhz. (Remember, 486sx-25 systems with ISA busses were popular back then, say, in Packard Bell machines and Compaqs and whatnot, so SYSCLK/2 was a common divisor to derive the ISA clock from. Likewise 33/40Mhz / 3, which yielded, of course, 11Mhz or 13.3Mhz.)
Being a low-level ASM/C/C++ programmer with professional game dev experience, I'm quite aware that the memory controller breaks down writes, but it was quite interesting to me back in the day that even though the bus bandwidth should have been the constraining factor there, because of the inefficiencies of the 386's architecture, it was faster to use 32-bit writes than consecutive 16-bit writes, even on an un-cached CPU.
Originally posted by: Peter
I suggest you download a simple diagnostics program like ctcm7.exe (from ftp.heise.de), and run a "CTCM7 /VID" benchmark. That'll give you a very precise measurement of CPU-to-VGA frame buffer write performance. You'll be surprised.
I'd rather write my own, honestly, but I think that you should consider comparing buffered write speeds, to both video and system memory, on systems both with and without integrated shared-memory graphics, and with and without those graphics being actively used for display. Make sure to properly configure your AGP, Fast Writes enabled, etc.
PS. Don't forget those benchmarks, showing that integrated shared-memory video is faster for HD video decoding than a properly-configured modern AGP video card.
