Web Page Rendering Speed with Low-End Graphics

imported_Brown

Junior Member
Jun 17, 2004
5
0
0
I was wondering just how much of a difference a low versus high end graphics chipset is going to make when browsing the web. I have always had a fierce aversion to integrated graphics, and I have noticed on a few systems with built-in graphics that web pages don't render well, especially during scrolling. I have a friend who wants me to build a cheap sytem for his parents. I was thinking we might could skip the graphics card to save them some money, since they will rarely use the computer for anything but the internet. Not having ever built a system with integrated graphics before, are there any other issues I should be aware of?

Just on a side note, would integrated graphics have any significant effect on playing video files and dvd's?

Thanks in advance,
Jason Brown
jbrownos@earthlink.net
 

breklin

Junior Member
Aug 20, 2004
4
0
0
A good 64MB card would do them fine, or a low-end 128MB card. It's all about rendering the page. You will see a definite difference when you have a better VC when viewing any 2D screen. I would recommend going to pricewatch.com and finding a decently priced (under $100) 128MB card.
 

Viper96720

Diamond Member
Jul 15, 2002
4,390
0
0
get an nforce/nforce 2 board with integrated graphics. That should be good enough for browsing.
 

Marsumane

Golden Member
Mar 9, 2004
1,171
0
0
Originally posted by: Viper96720
get an nforce/nforce 2 board with integrated graphics. That should be good enough for browsing.

I agree with the above. The integrated graphics on those boards are sufficient for what you need.
 

imported_Brown

Junior Member
Jun 17, 2004
5
0
0
Thanks for the recommendations guys. I did some checking and found out that the nforce2 IGP chipset is paired with a Geforce4 MX, (a lot better than those crappy Intel Extreme Graphics chipsets I might add) which should be more than enough to smoothly run web pages and even some low-end 3d. My only other question is does it come with built-in graphics memory or is it going to allocate some of main memory to graphics?
 

imported_Brown

Junior Member
Jun 17, 2004
5
0
0
Oh yeah, almost forgot, if anyone has any specific recommendations for a motherboard with the Nforce 2 chipset, feel free to post them as I am not that familiar with lower-end boards. I'm partial to Asus these days myself, so I was looking at the A7N8X-VM/400.
 

railer

Golden Member
Apr 15, 2000
1,552
69
91
....for surfing the web? Is this a joke question? Get a Trident 512k ISA card and you'll be fine. Whoever says you need a 128 meg video card to surf the web deserves to have his house burned down.
If you noticed scrolling problems with integrated graphics, it's because SOMETHING WAS WRONG with that PC. <shakes head and staggers away>. I feel like that goose in the AFLAK commercial with Yogi Berra, when he's walking out of the barber shop.....
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
It's all CPU-rendered 2D, and in fact, shared-RAM graphics are FASTER at that than separate graphics. This is because the CPU writes to the shared VGA RAM at the same speed it writes to the system RAM, which is quite a bit faster than doing a CPU-to-AGP write. The latter, if FastWrites are unavailable which is the usual case, are just normal 66 MHz PCI cycles with a maximum bandwidth of around 220 MB/s. System RAM typically has five times the write throughput these days.
 

stnicralisk

Golden Member
Jan 18, 2004
1,705
1
0
Originally posted by: breklin
A good 64MB card would do them fine, or a low-end 128MB card. It's all about rendering the page. You will see a definite difference when you have a better VC when viewing any 2D screen. I would recommend going to pricewatch.com and finding a decently priced (under $100) 128MB card.

At work we use integrated graphics and its just fine. It is funny to see people still spreading the VRAM is everything attitude even though it is clearly not. For videos and web browsing he will never need over a 32mb card yet you recommend a 128mb card. He didnt say he wants to play doom 3!

Brown for a really good bang for the buck graphics card I suggest Radeon 9500/9550. They can be had for under 50 bucks and it will be worth it over the integrated graphics.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,571
10,207
126
Originally posted by: railer
....for surfing the web? Is this a joke question? Get a Trident 512k ISA card and you'll be fine. Whoever says you need a 128 meg video card to surf the web deserves to have his house burned down.
If you noticed scrolling problems with integrated graphics, it's because SOMETHING WAS WRONG with that PC. <shakes head and staggers away>. I feel like that goose in the AFLAK commercial with Yogi Berra, when he's walking out of the barber shop.....

Have you actually tried using an ISA card to browse the web? An un-accelerated one at that? Please.
I know that you are joking, but .. it's pretty painful. Trying installing Windows XP on a MediaGX system. There are no W2K/XP video drivers for it, so it runs in completely un-accelerated software mode at 800x600 16bpp. It's painful - you get to watch the windows being drawn in!

One of the problems that integrated shared-memory video causes, is not just slower graphics (even with an "accelerated" integrated-graphics chipset), but that the simple fact that shared-memory robs overall system performance, in a way that is extremely noticable to everything. So it's not the chipset that is the problem for laggy web browser performance, but that it is using shared memory. *Any* shared-memory integrated graphics will cause that lag. (NF2 with dual-channel excepted, because that adds enough memory bandwidth to the system to largely offset the lag caused by the integrated video.)

For another example, I was using a system once that had integrated shared-memory Trident Blade3D graphics (Via PLE133T chipset, I think it was). Even after dropping in a PCI TNT1 card to play UT with, on a fresh install of WinXP, you could tell something was clearly wrong. Turns out, there is no auto-disable of the onboard graphics when you put a PCI video card in, so you have to disable it in Device Manager instead. Once that was done, frame rates in UT returned to normal, for that class of system. Previously, frame rates were cut in half, all because of the shared memory. Even though the integrated graphics wasn't even being used, the fact that it was "active" on the system, was robbing it of performance.

The moral of the story? Integrated shared-memory video sucks hardcore. Friends don't let friends buy systems with integrated shared-memory video.
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
VL, of course an ISA card is going to stink at that - for the same reason why shared-RAM graphics is faster than AGP graphics at 2D: BANDWIDTH.

You'll never get more than roughly 2 MB/s written to an ISA card. PCI hits the ceiling at around 100, AGP at 220, integrated graphics may be above 1000 MB/s depending on the type of RAM used.

Did you know that the vast majority of systems, overall, are being sold and used (!) with integrated graphics? Users of separate graphics cards are a minority - and there's a reason. If you're not doing any 3D, then you're fine with integrated VGA. Perfectly fine.

Ye olde Trident CyberBlade aka VIA PLE133 chipset has its own bandwidth problem - it's a stoneage chipset, actually the very first integrated-graphics chipset for the Pentium II. You get a stoneage graphics unit, and it's running on 66 to 133 MHz SDRAM depending on the CPU you have. Bandwidth here is worse than on AGP. Technology has moved on since. Quite a bit.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,571
10,207
126
I think that you would be surprised. The video RAM that is on modern AGP cards, is orders-of-magnitude faster than system RAM. In many cases, burst writes to video card memory over the AGP port, are faster than writing to system RAM. Reads, on the other hand, are much slower. TechReport did some analysis of that quite some time ago.

And for the ISA card, I think that I benched it at nearly 5MB/s, on a 386DX-40 system with a 16-bit ISA bus running at ~13Mhz, with a Diamond SpeedStar 24X card. (Not that it matters, but I thought that might be an interesting data-point. Surprisingly, even over a 16-bit ISA bus, it's much faster to do 32-bit writes than multiple 16-bit ones, at least according to my ASM code.)

Speaking in terms of real-world performance, shared-RAM integrated graphics is slower, and always will be slower, than one using off-board AGP graphics with non-shared video memory. Not just in video performance, but also overall system performance. The reason is because of the system RAM and bus bandwidth consumed, in order to simply refresh the screen. That refresh also has priority over everything else in the system, which can interfere with burst transfers to/from RAM over the PCI bus, etc. It slows down the entire system, by a noticable percentage.

Your numbers there for both PCI and AGP are low, especially the AGP one. AGP is much faster than just 2 x 133MB/s PCI, at least with any modern card. Perhaps you are talking about the V3 AGP, which only treated the AGP connector like a 66Mhz PCI bus? The numbers you claim, are what you would get, if you disabled your GART driver, and disabled AGP Fast Writes. Not a common scenario.

Also, I didn't say Trident Blade, I said Blade3D. The Blade was the predecessor chipset. Likewise, I said PLE133T, with Tualatin support, not the original PLE133. (Not that it makes a whole lot of difference in terms of performance though, really.)

In fact, to make a point - if system RAM were so much faster than AGP memory (which I claim it is NOT), then why aren't more modern AGP cards using AGP texturing from system memory, like the old AGP 1x/2x cards were designed to? Reason - onboard video card RAM is much faster than system RAM. The fact that the host CPU can also write to that memory in a burst/buffered non-cache-coherent way though a direct-access port is also what gives it better performance than system RAM.

Likewise, in nearly every video benchmark that I've seen, the integrated graphics fall far short of their equivalent off-board AGP graphics scores.

(I will again note that the NF2's integrated graphics is the exception to the rule, that while the graphics performance is not better than an add-in AGP card, it also doesn't lag the system, due to the additional otherwise-useless-for-an-Athlon memory bandwidth provided by the dual-channel RAM.)
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
Originally posted by: Brown
Oh yeah, almost forgot, if anyone has any specific recommendations for a motherboard with the Nforce 2 chipset, feel free to post them as I am not that familiar with lower-end boards. I'm partial to Asus these days myself, so I was looking at the A7N8X-VM/400.

DON'T GET THE ASUS!! My friend bought one and it has no overclocking options whatsoever. You can't even adjust your memory timings or voltage.

He ended up with the Abit NF7-M and was very happy with it. :)
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
VirtualLarry, burst writes to AGP cards are _NOT_ faster than writes to system RAM. Like it or not, when the CPU writes into the card's frame buffer, no accelerated AGP cycles are happening, but normal (66 MHz) PCI cycles. AGP cycles happen only when the GPU accesses system RAM. The _only_ exception is on systems that have "AGP Fast Writes" actually enabled and working. (Show me one.)

The immense speed of AGP cards' own RAM is only relevant when the local GPU is rendering; for CPU-rendered content like web pages it doesn't help zip.

This is where your theory falls apart. It's a well known fact, with equally well known effects like integrated-VGA chipsets being an order of magnitude faster in displaying HDTV video content.

In a nutshell: Of course, system memory is slower than today's AGP card memory. However, _CPU_ access to system RAM is lots faster than CPU access to RAM on the AGP bus.

This, as it happens, is one of the major reasons why GPU makers would switch to PCI-Express rather today than tomorrow: PCI-Express has the same bandwidth in both directions.

Shall I pick further, being a BIOS engineer? PLE133T is absolutely identical to PLE133, apart from the CPU bus side supporting the lower signalling voltage of the Tualatin. No functional changes were made. Nada. Zip. Niente. "Blade3D" was the name of the discrete part; CyberBlade is the integrated version of the same graphics core.

Sure, you get 5 MB/s out of the ISA bus when you overclock it to almost twice the intended speed of 8 MHz. Does that negate my ~2MB/s number? No. 32-bit writes will be broken down to consecutive 16-bit ISA writes by the system south bridge, busses further up the heirarchy will be less loaded if you send 32-bit writes. No surprise there.

I suggest you download a simple diagnostics program like ctcm7.exe (from ftp.heise.de), and run a "CTCM7 /VID" benchmark. That'll give you a very precise measurement of CPU-to-VGA frame buffer write performance. You'll be surprised.
 

MDE

Lifer
Jul 17, 2003
13,199
1
81
Originally posted by: SickBeast
Originally posted by: Brown
Oh yeah, almost forgot, if anyone has any specific recommendations for a motherboard with the Nforce 2 chipset, feel free to post them as I am not that familiar with lower-end boards. I'm partial to Asus these days myself, so I was looking at the A7N8X-VM/400.

DON'T GET THE ASUS!! My friend bought one and it has no overclocking options whatsoever. You can't even adjust your memory timings or voltage.

He ended up with the Abit NF7-M and was very happy with it. :)
The thing is for his parents who will do nothing more than browse the web. You could underclock the CPU all the way down to 500MHz and they probably wouldn't notice.
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
There's few thingsin building computers that are more stupid than overclocking someone else's system. That spells out "support nightmare".
 

VirtualLarry

No Lifer
Aug 25, 2001
56,571
10,207
126
Originally posted by: Peter
VirtualLarry, burst writes to AGP cards are _NOT_ faster than writes to system RAM.
They should be, in most cases, because the AGP port interface on the chipset includes a buffered write queue - it can accept the write cycle from the CPU immediately, and return immediately, without having to wait for the DRAM memory controller or a PCI device. It also doesn't have to do any cache-snooping accesses. At worse, they would be the same speed, assuming likewise write-buffering in the chipset. You claimed that they will be slower.

Originally posted by: Peter
Like it or not, when the CPU writes into the card's frame buffer, no accelerated AGP cycles are happening, but normal (66 MHz) PCI cycles. AGP cycles happen only when the GPU accesses system RAM. The _only_ exception is on systems that have "AGP Fast Writes" actually enabled and working. (Show me one.)

For a normal, correctly operating, AGP-capable system (AGP system board and AGP video card), having "Fast Writes" enabled is in fact the norm. My system works fine with it enabled, so did my last system. Like I said, the only way that CPU writes over the AGP bus to video-card memory would be that slow, is if you were running a V3, or have all of the performance features of AGP essentially disabled.
Read this thread thread, to see the real-world effects of Fast Writes, and how many people have their AGP systems correctly configured, rather than crippled.

Originally posted by: Peter
The immense speed of AGP cards' own RAM is only relevant when the local GPU is rendering; for CPU-rendered content like web pages it doesn't help zip.

Keep in mine your original assertion, that I am refuting here:
Originally posted by: Peter
VL, of course an ISA card is going to stink at that - for the same reason why shared-RAM graphics is faster than AGP graphics at 2D: BANDWIDTH.

My assertion is that is completely wrong, shared-RAM integrated graphics do NOT have higher bandwidth than AGP - if they did, then why aren't all of the high-end gaming systems using integrated graphics, instead of AGP cards? Why isn't AlienWare shipping systems with big stickers on the side, proudly proclaiming "Integrated shared-memory graphics inside!"?

Because it's not true.

In fact, 2D graphics take a lot less bandwidth than 3D, and most people understand that on nearly any video card today, 2D graphics performance is not an issue, because it takes so little bandwidth overall.

Originally posted by: Peter
This is where your theory falls apart. It's a well known fact, with equally well known effects like integrated-VGA chipsets being an order of magnitude faster in displaying HDTV video content.

I would like to see some benchmarks that prove that; I don't believe it. Integrated shared-memory graphics subsystems rob the entire system of bandwidth; the fact that HDTV (assuming compressed-video decoding here on the order of the WMV-HD discussed in my linked thread, correct me if I'm wrong) is so CPU/bandwidth-intensive in the first place, robbing X-percent of system bandwidth off of the top, would seem to place systems with shared-memory integrated graphics subsystems at a distinct disadvantage.

That is the one point that you have consistently refused to acknowledge, that integrated shared-memory graphics steal overall system-memory bandwidth, right off the top. That's what makes them slower, period, for any system task, not just graphics. That is why the system I mentioned in my post ran UT slower, even though I wasn't even using the integrated shared-memory graphics to play the game, it was simply still active in hardware and stealing memory-access cycles in the background.

Originally posted by: Peter
In a nutshell: Of course, system memory is slower than today's AGP card memory. However, _CPU_ access to system RAM is lots faster than CPU access to RAM on the AGP bus.

Again, I totally disagree. That's only true, if you completely cripple your AGP system, in which case it might as well not be an AGP system at all, but rather a PCI one.

With AGP correctly functioning, you have a directly-accessable, write-buffered, non-cache-coherent port, to a very high-speed pool of dedicated graphics memory on the AGP video card. The overall speed/bus width of that memory gives peak theoretical bandwidth numbers that ensure that even if the GPU is also accessing that memory, for both drawing/acceleration functions and refresh tasks, that there is still enough bandwidth leftover, that CPU host write accesses, for the most part, are unhindered.

Compare that to accessing normal system DRAM, which has to wait for slower PCI bus devices that may want to access it, wait for the memory controller to open/close DRAM pages, bus sniffing cycles to ensure cache coherency, etc.

With integrated shared-memory graphics, it's even worse - system bandwidth is stolen for display-refresh tasks, and the CPU accesses to the same memory have to compete with that, along with the rest of the system devices, PCI, etc.

Originally posted by: Peter
This, as it happens, is one of the major reasons why GPU makers would switch to PCI-Express rather today than tomorrow: PCI-Express has the same bandwidth in both directions.

That is irrelevant to this discussion, I already mentioned that reads over the AGP bus are slow. Bi-directionality has nothing to do with write-bandwidth testing, which was the discussion here, as far as I can see.

Originally posted by: Peter
Shall I pick further, being a BIOS engineer? PLE133T is absolutely identical to PLE133, apart from the CPU bus side supporting the lower signalling voltage of the Tualatin. No functional changes were made. Nada. Zip. Niente. "Blade3D" was the name of the discrete part; CyberBlade is the integrated version of the same graphics core.

That doesn't really surprise me at all, that's why I said " (Not that it makes a whole lot of difference in terms of performance though, really)", because I figured that they would be more or less the same part. As far as the graphics core, I've seen chipset specs describing both Blade and Blade3D integrated graphics, I assumed that they were different models. I'm willing to concede that I may be incorrect about that, since I didn't do any background checking.

Originally posted by: Peter
Sure, you get 5 MB/s out of the ISA bus when you overclock it to almost twice the intended speed of 8 MHz. Does that negate my ~2MB/s number? No. 32-bit writes will be broken down to consecutive 16-bit ISA writes by the system south bridge, busses further up the heirarchy will be less loaded if you send 32-bit writes. No surprise there.

"Classical" ISA bus speed was specced at ~8Mhz, but most ISA cards at the time were explicitly specificed as supporting ISA bus speeds up to (usually) 12.5Mhz. (Remember, 486sx-25 systems with ISA busses were popular back then, say, in Packard Bell machines and Compaqs and whatnot, so SYSCLK/2 was a common divisor to derive the ISA clock from. Likewise 33/40Mhz / 3, which yielded, of course, 11Mhz or 13.3Mhz.)

Being a low-level ASM/C/C++ programmer with professional game dev experience, I'm quite aware that the memory controller breaks down writes, but it was quite interesting to me back in the day that even though the bus bandwidth should have been the constraining factor there, because of the inefficiencies of the 386's architecture, it was faster to use 32-bit writes than consecutive 16-bit writes, even on an un-cached CPU.

Originally posted by: Peter
I suggest you download a simple diagnostics program like ctcm7.exe (from ftp.heise.de), and run a "CTCM7 /VID" benchmark. That'll give you a very precise measurement of CPU-to-VGA frame buffer write performance. You'll be surprised.

I'd rather write my own, honestly, but I think that you should consider comparing buffered write speeds, to both video and system memory, on systems both with and without integrated shared-memory graphics, and with and without those graphics being actively used for display. Make sure to properly configure your AGP, Fast Writes enabled, etc.

PS. Don't forget those benchmarks, showing that integrated shared-memory video is faster for HD video decoding than a properly-configured modern AGP video card. :)
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
You're still missing the point that all the "bandwidth stealing" and whatnot does not matter to the topic here - 2D rendering done by the CPU. 2D rendering of a surprisingly small amount of data.

You might want to have a look at a couple of ongoing discussions in video driver development, e.g. http://bugs.xfree86.org/show_bug.cgi?id=1292 - analysis of various performance problems in HDTV rendering from XFree.

Finally, please, don't try to lecture me on chipsets, busses, buffered writes and the likes. I have been an engineer in mainboard design ever since the ISA days, I know these things from the inside out. Literally.

And try the CTCM7 benchmark. c't magazine isn't made by idiots - and from our engineering work, I know the throughput numbers it gives for /vid are very very accurate. What it does is switch to a graphics mode, blast data from the CPU into the frame buffer, and measure throughput. With and without write combining enabled, through VGA compatibility window and through linear frame buffer. Try it.
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
So much for the summary. Now, the details.

Originally posted by: VirtualLarry
Originally posted by: Peter
VirtualLarry, burst writes to AGP cards are _NOT_ faster than writes to system RAM.
They should be, in most cases, because the AGP port interface on the chipset includes a buffered write queue - it can accept the write cycle from the CPU immediately, and return immediately, without having to wait for the DRAM memory controller or a PCI device. It also doesn't have to do any cache-snooping accesses. At worse, they would be the same speed, assuming likewise write-buffering in the chipset. You claimed that they will be slower.

Buffers help initially, for a short burst, but they don't help when you're trying to SUSTAIN throughput. Once they're full, the CPU will have to wait for write completion, while the data is leaving on the other end (at an undisputedly slower rate than the CPU bus can supply).

Originally posted by: Peter
Like it or not, when the CPU writes into the card's frame buffer, no accelerated AGP cycles are happening, but normal (66 MHz) PCI cycles. AGP cycles happen only when the GPU accesses system RAM. The _only_ exception is on systems that have "AGP Fast Writes" actually enabled and working. (Show me one.)

For a normal, correctly operating, AGP-capable system (AGP system board and AGP video card), having "Fast Writes" enabled is in fact the norm. My system works fine with it enabled, so did my last system. Like I said, the only way that CPU writes over the AGP bus to video-card memory would be that slow, is if you were running a V3, or have all of the performance features of AGP essentially disabled.
Read this thread thread, to see the real-world effects of Fast Writes, and how many people have their AGP systems correctly configured, rather than crippled.
Do NVidia or ATi, or any mainboard company, ship their systems with FastWrites enabled? No. And how many users know they should enable it, and how many know how to troubleshoot that if it doesn't work? A handful of enthusiasts do, the remaining 99.99 percent of people don't.

Originally posted by: Peter
The immense speed of AGP cards' own RAM is only relevant when the local GPU is rendering; for CPU-rendered content like web pages it doesn't help zip.

Keep in mine your original assertion, that I am refuting here:
Originally posted by: Peter
VL, of course an ISA card is going to stink at that - for the same reason why shared-RAM graphics is faster than AGP graphics at 2D: BANDWIDTH.

My assertion is that is completely wrong, shared-RAM integrated graphics do NOT have higher bandwidth than AGP - if they did, then why aren't all of the high-end gaming systems using integrated graphics, instead of AGP cards? Why isn't AlienWare shipping systems with big stickers on the side, proudly proclaiming "Integrated shared-memory graphics inside!"?

Because it's not true.
One last time: 2D rendering is NOT about how much bandwidth the GPU has on its RAM. It's about how much write throughput the CPU can achieve into the frame buffer. And that's what's massively faster on an integrated-graphics system with DDR RAM. Even if you do have fast writes, you'll never get more than 2 GB/s through the AGP. Today's DDR RAM solutions, let alone dual channel ones, are quite a bit faster.

In fact, 2D graphics take a lot less bandwidth than 3D, and most people understand that on nearly any video card today, 2D graphics performance is not an issue, because it takes so little bandwidth overall.
See above. It's growing into becoming an issue again. HDTV. PCI-Express cures that.

Originally posted by: Peter
This is where your theory falls apart. It's a well known fact, with equally well known effects like integrated-VGA chipsets being an order of magnitude faster in displaying HDTV video content.

I would like to see some benchmarks that prove that; I don't believe it. Integrated shared-memory graphics subsystems rob the entire system of bandwidth; the fact that HDTV (assuming compressed-video decoding here on the order of the WMV-HD discussed in my linked thread, correct me if I'm wrong) is so CPU/bandwidth-intensive in the first place, robbing X-percent of system bandwidth off of the top, would seem to place systems with shared-memory integrated graphics subsystems at a distinct disadvantage.

That is the one point that you have consistently refused to acknowledge, that integrated shared-memory graphics steal overall system-memory bandwidth, right off the top. That's what makes them slower, period, for any system task, not just graphics. That is why the system I mentioned in my post ran UT slower, even though I wasn't even using the integrated shared-memory graphics to play the game, it was simply still active in hardware and stealing memory-access cycles in the background.
Correct, but missing the point. Running UT, a massively CPU intensive task, does not compare to the TOPIC at all.
Compare that to accessing normal system DRAM, which has to wait for slower PCI bus devices that may want to access it, wait for the memory controller to open/close DRAM pages, bus sniffing cycles to ensure cache coherency, etc.
As if RAM controllers on graphics cards don't use the same paged DRAM technology. Cache coherency issues obviously aren't hindering CPU initiated cycles at all, since the CPU deals with that internally before it even presents the write to its own bus.
[/quote]
Originally posted by: Peter
This, as it happens, is one of the major reasons why GPU makers would switch to PCI-Express rather today than tomorrow: PCI-Express has the same bandwidth in both directions.

That is irrelevant to this discussion, I already mentioned that reads over the AGP bus are slow. Bi-directionality has nothing to do with write-bandwidth testing, which was the discussion here, as far as I can see.
[/quote]
Have we been discussing reads over AGP at all? No we haven't. You're distracting.

Originally posted by: Peter
Sure, you get 5 MB/s out of the ISA bus when you overclock it to almost twice the intended speed of 8 MHz. Does that negate my ~2MB/s number? No. 32-bit writes will be broken down to consecutive 16-bit ISA writes by the system south bridge, busses further up the heirarchy will be less loaded if you send 32-bit writes. No surprise there.

"Classical" ISA bus speed was specced at ~8Mhz, but most ISA cards at the time were explicitly specificed as supporting ISA bus speeds up to (usually) 12.5Mhz. (Remember, 486sx-25 systems with ISA busses were popular back then, say, in Packard Bell machines and Compaqs and whatnot, so SYSCLK/2 was a common divisor to derive the ISA clock from. Likewise 33/40Mhz / 3, which yielded, of course, 11Mhz or 13.3Mhz.)
No major brand system would ever ship with overclocked busses. There was a brief period when people attempted that, at the time when the 80286 got to 10 and then 12 MHz, but this was quickly disposed of because it became a support nightmare. Straight back to 8 MHz ISA it was, and that's where it stayed. Sure, some "enthusiast" BIOSes let you tweak it, but the shipping default was 8.something throughout, ever since the 386 days.

Being a low-level ASM/C/C++ programmer with professional game dev experience, I'm quite aware that the memory controller breaks down writes, but it was quite interesting to me back in the day that even though the bus bandwidth should have been the constraining factor there, because of the inefficiencies of the 386's architecture, it was faster to use 32-bit writes than consecutive 16-bit writes, even on an un-cached CPU.
That's because these CPUs didn't know about write combining. (In fact, the first to implement this was the Cyrix 6x86.) Hence, if you do 16-bit writes, they'll appear on the CPU and then PCI bus as single write cycles with half the bus width unused, and then pass through to ISA as they are. Do 32-bit writes, use up half as much CPU and PCI bus bandwidth, and have the south bridge produce two ISA cycles in quick succession.

Originally posted by: Peter
I suggest you download a simple diagnostics program like ctcm7.exe (from ftp.heise.de), and run a "CTCM7 /VID" benchmark. That'll give you a very precise measurement of CPU-to-VGA frame buffer write performance. You'll be surprised.

I'd rather write my own, honestly, but I think that you should consider comparing buffered write speeds, to both video and system memory, on systems both with and without integrated shared-memory graphics, and with and without those graphics being actively used for display. Make sure to properly configure your AGP, Fast Writes enabled, etc.

PS. Don't forget those benchmarks, showing that integrated shared-memory video is faster for HD video decoding than a properly-configured modern AGP video card. :)

[/quote]
Links given above. The benchmark utility doesn't tweak chipsets. It does use WC and UC MTRRs, and compares VGA window and linear-framebuffer performance. Write and read. Properly. No need to write your own, just run it and look at the numbers.