Pedramrezai,
>or solutions with dual 16x can deliver up to 10-15% more performance
This is similar to my aunt saying "never ever buy non-"BIO"-bananas, because the regular ones CAN get the insect poison stuff UP TO 17 times!". So did your banana get it 17 times or non at all? I didn't read any recent reviews about crossfire / SLI 8x/16x performance, because I don't care about it, so I have to ask you: does dual 16x deliver 10 - 15% more performance in fairly CPU-bound resolutions like 1024 and 1280? Because otherwise it doesn't matter for these tests.
>I am quite suspicious over some hefty optimizations in intel-cooked display driver
They're using standard drivers as far as I know / the reviewers said. Even if there was some Conroe-optimization done, don't you think it would find it's way into the regular drivers?
>this might be the beginning of a new SSEx game with unfair optimizations for a new technoogy.
So far I didn't read anything about special new instructions for Conroe. Seems a bit weird if intel put no SSE update in, but it may be possible. With that being said, Conroe did get a hefty SSE1/2/3 performance update: single cycle 128 Bit fp! (I guess for add and mul, but certainly not div) This is something quite important which doesn't require software optimization to work (though they would help to work well).
Please don't speak about a FSB when you're referring to the A64. FSB implies depedencies which aren't there with the A64. Same for the term "async" - there are not "async penalties" on A64. But back to topic: AM2 + DDR2 will help to close the gap between Conroe and A64 a bit, but possibly not as much as you think. Do you still remember, how much faster S939 was compared to S754? Normally 0 - 5%, sometimes (games) 10%. And that was already doubling the memory bandwidth. This time we need (still expensive) DDR2 800 4-4-4 to keep the latency constant and double the theoretical bandwidth again. This may well give games a 10% boost (quite important for these X2 - Conroe benchies), but it won't do anything for media encoding, rendering etc (thanks to clever algorithms).
Concerning cache: look at how amazingly well the Duron and Semporns do with their small caches. K7/8 has never been dependend on large L2 caches, thanks to the large L1 cache. Or negatively formulated: a bigger L2/3 won't help it as much as it does for intel. And while increasing the cache size further, you'll get diminishing returns. Add that to the fact that a shared L3 has to be slower than L2, and you won't get anything like "huge performance gains". Games and server apps should profit the most here, maybe with 5 - 10% as an educated guess.
But don't expect any increase in cache size before 65nm at the end of 2006 / beginning of 2007. The 1MB L2 90nm X2s are already large (=expensive) enough.
To sum things up: it's reasonable to expect AM2 to close the gap a bit, but Conroe is clearly the superior design, with it's P-M efficiency and it's damn powerful FPU. And remember that a 2.66GHz Conroe was compared to a 2.8GHz X2. Intel won't have much trouble to scale Conroe to 3GHz (EE on roadmaps). To catch up, AMD needs substatial changes in the A64 design, which we won't see before 65nm. One rumor I've seen talks about doubling the FPU units of K8. Together with the infrastructure to feed them with instructions, this might be what is necessary. Of course I'd like to include multi threading into such a design
Regards, MrS