Write combining is important when data outgoing from the CPU are heading for an I/O device, like a VGA card. This is uncached area, meaning that the write data would go out one by one, as single 8-. 16- or 32-bit data. This makes very poor use of the CPU and PCI (or AGP, same thing) busses, since the best performance is only achieved when issuing larger chunks of data in one so-called burst, which is 8 chunks of 64-bit data.
Now you can teach a CPU which I/O allows write data to be collected into burst chunks - before being presented on the CPU bus - without affecting operation of the I/O device in question. VGA cards' linear frame buffers are a prime candidate for this - it's actual RAM, so that reordering and collecting write data does not have any side effects, and there are lots of write data going out.
This is PARTICULARLY effective with modern CPUs that have lots of buffers, deep FIFOs, and fast front side busses.