I feel like there are a few different things going on here. The unified L3 effectively doubles the L3 size as seen by any single process. So, if the game working set has high locality, but spills a bit from a 16MB L3, unifying the L3 to an effective 32MB will be a big help there. The second thing is that, from what rumors we've heard, the IF links between the CCD and the IOD retain the same number of pins. Before, there were two separate pathways between the two chips, one for one CCX, and one for the other. With a unified L3 and a single 8 core CCX, that means that there is a dual ported link between the CCD and the IOD, or, the single link has about twice the bus width. We also hear that the IF speed has increased. What the sum total of all of this means is that, in a situation where there are few heavy threads, such as in most games, there is a massive increase in available bandwidth between the individual core and the memory controller.
Granted, that's all based on rumors that have been passed around, but it certainly speaks to what we're seeing here.
I'd be interested in seeing the memory setup for that 5800x...