• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

AMD talks Barcelona

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Originally posted by: Duvie
Originally posted by: Hard Ball
Originally posted by: Nemesis 1
Ya I know where K8l name came from and it wasn't the Inquirer. As for K8l being released at 2.6 ghz your dreaming. Duvie I believe was correct in what he believes it will be released at. That would hold true with AMD past releases.

Also keep in mind that. These first K8l's will be server. We won't see desktop processors till late in the 3rd or 4 qt.

Intel will have Wolfdales and Yorkfields out by than. With better sse4 instructions and High K with metal gates with the 4 core models at 3.5 ghz and 12mb. of shared cache between all 4 cores. The wolfdale is what I want clocked @ 4ghz with 6mb of shared cache.

I am sure Duvie will be eyeballing the Yorkfields.

It seems obvious that you are not basing any of your comments on anything more than your own speculation.



He is right about me eyeballing the Yorkfields!!! 🙂

Yes, 🙂 more or less. But that's clearly not the part of the comment that I'm referring to.
 
I know...LOL!!

i try not to get into the speculation game...Stop doing that sometime when I realized it is just a flamefest....I take all of this talk of 1.8x and 40% number has hype until I see numbers and facts to back it up...

I wait for numbers, and then numbers based on my experience I can extrapolate to some notiion of performance. That peaks my interest, but I still wait for the reviews.

 
Originally posted by: Nemesis 1
Originally posted by: Hard Ball
Originally posted by: Nemesis 1
Ya I know where K8l name came from and it wasn't the Inquirer. As for K8l being released at 2.6 ghz your dreaming. Duvie I believe was correct in what he believes it will be released at. That would hold true with AMD past releases.

Also keep in mind that. These first K8l's will be server. We won't see desktop processors till late in the 3rd or 4 qt.

Intel will have Wolfdales and Yorkfields out by than. With better sse4 instructions and High K with metal gates with the 4 core models at 3.5 ghz and 12mb. of shared cache between all 4 cores. The wolfdale is what I want clocked @ 4ghz with 6mb of shared cache.

I am sure Duvie will be eyeballing the Yorkfields.

It seems obvious that you are not basing any of your comments on anything more than your own speculation.


On K8l I am going by what AMD has done in the past.

As for Intel going by what articles say.

http://www.xbitlabs.com/news/cpu/display/20060930234444.html

Personally, I already have a pretty good idea about the approximate performance characteristics of Barcelona for each set of instructions, as well as the release clocks of each of the aforementioned thermal envelopes. But since what I know come from conversations with acqaintences and former university colleagues that are understood to be private, it's not something that I should be discussing here.
 
Originally posted by: Duvie
I know...LOL!!

i try not to get into the speculation game...Stop doing that sometime when I realized it is just a flamefest....I take all of this talk of 1.8x and 40% number has hype until I see numbers and facts to back it up...

I wait for numbers, and then numbers based on my experience I can extrapolate to some notiion of performance. That peaks my interest, but I still wait for the reviews.

LOL,

Duvie, you are always the guy to knows how to have a good time around here. 😀
 
Originally posted by: Nemesis 1
Ya I know where K8l name came from and it wasn't the Inquirer. As for K8l being released at 2.6 ghz your dreaming.

K10 at 2.6 would be very consistent with AMD's past releases on new chips...what speed do YOU expect, and why?

Duvie I believe was correct in what he believes it will be released at. That would hold true with AMD past releases.

Considering this is what AMD has announced as their minimum spec already, he most assuredly IS correct...

News.com article
I disagree that it's too small though...a large cache for AMD doesn't make sense, for Intel it's necessary to overcome the lack of an on-die memory controller and keep latency under control.

Also keep in mind that. These first K8l's will be server. We won't see desktop processors till late in the 3rd or 4 qt.

And? Yes, the K10 will be server in Q2, desktop in Q3, and mobile in Q4...this makes good sense from AMD's POV. In their conference call, AMD disclosed that in Q4 they:
Lost marketshare in servers
Stayed even in desktop, and
Gained marketshare in mobile

Intel will have Wolfdales and Yorkfields out by than. With better sse4 instructions and High K with metal gates with the 4 core models at 3.5 ghz and 12mb. of shared cache between all 4 cores. The wolfdale is what I want clocked @ 4ghz with 6mb of shared cache.

Huh? Intel has already stated that the 45nm chips won't START shipping until the end of 2007...I think you are confusing shipping for starting production.

I am sure Duvie will be eyeballing the Yorkfields.

Intel showed C2D at spring IDF

Intel Developer Forum
March 7-9, 2006 | San Francisco | Moscone Center West

And they "showed" the 5 GHz P4 at another IDF...but they didn't release samples for review until April/May...



 
Originally posted by: Duvie



He is right about me eyeballing the Yorkfields!!! 🙂

That's probably around when I will finally get a quad-core. I can never quite keep up with you, Duvie. 😀
 
I am sooo tired of rehashing the cache argument again...i wont....However I can tell you I know an app that is obviously loading large chunks into the L2 cache. the L2 cache runs at the clock speed thus is tremendously faster then using the memory. However it is clear from my testing it uses blocks larger then 1mb cache and can use chunks larger then 2mb of cache....I have tested this fromm 4mb down to 1mb and the gains are huge...the AMD goes from performing 2x slower to 5x slower in some of these test.

I think as the cache gets larger there is more opportunity for apps to load things into it. At 512kb I think it is file size limited. Why your argument may be correct in terms of Intel compensation, I think my argument is also correct. I think only some can take advanatge of it. however since Intel shares its cache in pools it offers the ability of some of these huge cache cpus they have coming of really taking advantage of the speed of the L2 cache....It runs at clock speed...for me that is SRAM running at 3.2ghz....
 
Originally posted by: Duvie
I am sooo tired of rehashing the cache argument again...i wont....However I can tell you I know an app that is obviously loading large chunks into the L2 cache. the L2 cache runs at the clock speed thus is tremendously faster then using the memory. However it is clear from my testing it uses blocks larger then 1mb cache and can use chunks larger then 2mb of cache....I have tested this fromm 4mb down to 1mb and the gains are huge...the AMD goes from performing 2x slower to 5x slower in some of these test.

I think as the cache gets larger there is more opportunity for apps to load things into it. At 512kb I think it is file size limited. Why your argument may be correct in terms of Intel compensation, I think my argument is also correct. I think only some can take advanatge of it. however since Intel shares its cache in pools it offers the ability of some of these huge cache cpus they have coming of really taking advantage of the speed of the L2 cache....It runs at clock speed...for me that is SRAM running at 3.2ghz....

Yes, Duvie and I BOTH have observed that running 1 instance of F@H on a 1499 project on a C2D@3.2 will complete a step in 4-5 minutes. On the same machine, firing up a second 1499 will slow the perf to 9 minutes ! If you run a GPU client as the 2nd app, it doesn;t care and doesn;t slow down. This is on a 6300 or 6400. On Duvies 6600 and quad core, that does NOT happen. So the extra cache size makes a very big difference in performance.
 
It's nice to see some people with some inside knowledge on these forums. 🙂

Do any of you know if the K8L/Barcelona/K10 will have an integrated memory controller for each core, as opposed to a single memory controller shared among multiple cores?

I'm thinking that if intel finally stops being stubborn and integrates a memory controller into the C2D chips, they will absolutely dominate AMD performance wise. Why have they resisted doing this for so long?

An 80% boost for AMD on a clock-for-clock basis is very good news, and really should put them miles ahead of intel given their current crop of chips (if this information is true, and if AMD can release these chips within the next few months).

As for the comments about the chips being server only, I don't think the chips will be server-exclusive for very long (maybe for a few months). AMD needs to get back into the performace lead desktop-wise, and I could see them aggressively pushing these new chips down the pipe. :beer:
 
Originally posted by: SickBeast
It's nice to see some people with some inside knowledge on these forums. 🙂

Do any of you know if the K8L/Barcelona/K10 will have an integrated memory controller for each core, as opposed to a single memory controller shared among multiple cores?

I'm thinking that if intel finally stops being stubborn and integrates a memory controller into the C2D chips, they will absolutely dominate AMD performance wise. Why have they resisted doing this for so long?

An 80% boost for AMD on a clock-for-clock basis is very good news, and really should put them miles ahead of intel given their current crop of chips (if this information is true, and if AMD can release these chips within the next few months).

As for the comments about the chips being server only, I don't think the chips will be server-exclusive for very long (maybe for a few months). AMD needs to get back into the performace lead desktop-wise, and I could see them aggressively pushing these new chips down the pipe. :beer:


That is what I would like to see as well.....

I would also liek to see INtel incorporate HTT type technology as well....



I didn't see it mentioned but one thing that needs to happen is AMD needs to add a 4th execution unit as Intel did...Heck I bet if they had did that alone with at least dual integrated memory controllers they would have likely taken the clock to clock crown back.

I think the accounts for the single biggest reason the C2D architecture is better clock for clock following that netburst architure...it is about IPC!!!!!

Latency is nice with integrated controller but if intel wants to spend the money and area on the die for large cache pools let them go ahead....cache is going to trump the memory controller anyways if it is large enough to load many things into it...Anytime it has to go out and hit the system ram AMD will show its muscle.
 
Originally posted by: Duvie
Originally posted by: SickBeast
It's nice to see some people with some inside knowledge on these forums. 🙂

Do any of you know if the K8L/Barcelona/K10 will have an integrated memory controller for each core, as opposed to a single memory controller shared among multiple cores?

I'm thinking that if intel finally stops being stubborn and integrates a memory controller into the C2D chips, they will absolutely dominate AMD performance wise. Why have they resisted doing this for so long?

An 80% boost for AMD on a clock-for-clock basis is very good news, and really should put them miles ahead of intel given their current crop of chips (if this information is true, and if AMD can release these chips within the next few months).

As for the comments about the chips being server only, I don't think the chips will be server-exclusive for very long (maybe for a few months). AMD needs to get back into the performace lead desktop-wise, and I could see them aggressively pushing these new chips down the pipe. :beer:


That is what I would like to see as well.....

I would also liek to see INtel incorporate HTT type technology as well....



I didn't see it mentioned but one thing that needs to happen is AMD needs to add a 4th execution unit as Intel did...Heck I bet if they had did that alone with at least dual integrated memory controllers they would have likely taken the clock to clock crown back.

I think the accounts for the single biggest reason the C2D architecture is better clock for clock following that netburst architure...it is about IPC!!!!!

Latency is nice with integrated controller but if intel wants to spend the money and area on the die for large cache pools let them go ahead....cache is going to trump the memory controller anyways if it is large enough to load many things into it...Anytime it has to go out and hit the system ram AMD will show its muscle.
I was under the impression that intel simply designed a brilliant and highly efficient memory controller for the C2D, and that the other performance aspects of the chip essentially nullified the HTT advantage that AMD has.

On the C2D, does each core have its own memory controller at least? That may explain a large portion of the performance advantage, if they do have dedicated memory controllers.

Also, don't the C2D and the A64 have very similar IPC stats? I'm very curious as to why the C2D is so fast. It's like intel just pulled that chip out of their hats...overnight they stole the performance crown, it's quite an amazing chip.
 
Originally posted by: SickBeast
Originally posted by: Duvie
Originally posted by: SickBeast
It's nice to see some people with some inside knowledge on these forums. 🙂

Do any of you know if the K8L/Barcelona/K10 will have an integrated memory controller for each core, as opposed to a single memory controller shared among multiple cores?

I'm thinking that if intel finally stops being stubborn and integrates a memory controller into the C2D chips, they will absolutely dominate AMD performance wise. Why have they resisted doing this for so long?

An 80% boost for AMD on a clock-for-clock basis is very good news, and really should put them miles ahead of intel given their current crop of chips (if this information is true, and if AMD can release these chips within the next few months).

As for the comments about the chips being server only, I don't think the chips will be server-exclusive for very long (maybe for a few months). AMD needs to get back into the performace lead desktop-wise, and I could see them aggressively pushing these new chips down the pipe. :beer:


That is what I would like to see as well.....

I would also liek to see INtel incorporate HTT type technology as well....



I didn't see it mentioned but one thing that needs to happen is AMD needs to add a 4th execution unit as Intel did...Heck I bet if they had did that alone with at least dual integrated memory controllers they would have likely taken the clock to clock crown back.

I think the accounts for the single biggest reason the C2D architecture is better clock for clock following that netburst architure...it is about IPC!!!!!

Latency is nice with integrated controller but if intel wants to spend the money and area on the die for large cache pools let them go ahead....cache is going to trump the memory controller anyways if it is large enough to load many things into it...Anytime it has to go out and hit the system ram AMD will show its muscle.
I was under the impression that intel simply designed a brilliant and highly efficient memory controller for the C2D, and that the other performance aspects of the chip essentially nullified the HTT advantage that AMD has.

On the C2D, does each core have its own memory controller at least? That may explain a large portion of the performance advantage, if they do have dedicated memory controllers.

Also, don't the C2D and the A64 have very similar IPC stats? I'm very curious as to why the C2D is so fast. It's like intel just pulled that chip out of their hats...overnight they stole the performance crown, it's quite an amazing chip.


C2D doesn't have integrated memory controller....

Well maybe there IPC is cloe but INtel pulled off the coup simply by doubling the number of execution units from 2 of netburst to 4 of C2D....

I think they really ramped up their integer performance but AMD still does well in FPU areas...
 
Originally posted by: Duvie
Originally posted by: SickBeast
Originally posted by: Duvie
Originally posted by: SickBeast
It's nice to see some people with some inside knowledge on these forums. 🙂

Do any of you know if the K8L/Barcelona/K10 will have an integrated memory controller for each core, as opposed to a single memory controller shared among multiple cores?

I'm thinking that if intel finally stops being stubborn and integrates a memory controller into the C2D chips, they will absolutely dominate AMD performance wise. Why have they resisted doing this for so long?

An 80% boost for AMD on a clock-for-clock basis is very good news, and really should put them miles ahead of intel given their current crop of chips (if this information is true, and if AMD can release these chips within the next few months).

As for the comments about the chips being server only, I don't think the chips will be server-exclusive for very long (maybe for a few months). AMD needs to get back into the performace lead desktop-wise, and I could see them aggressively pushing these new chips down the pipe. :beer:


That is what I would like to see as well.....

I would also liek to see INtel incorporate HTT type technology as well....



I didn't see it mentioned but one thing that needs to happen is AMD needs to add a 4th execution unit as Intel did...Heck I bet if they had did that alone with at least dual integrated memory controllers they would have likely taken the clock to clock crown back.

I think the accounts for the single biggest reason the C2D architecture is better clock for clock following that netburst architure...it is about IPC!!!!!

Latency is nice with integrated controller but if intel wants to spend the money and area on the die for large cache pools let them go ahead....cache is going to trump the memory controller anyways if it is large enough to load many things into it...Anytime it has to go out and hit the system ram AMD will show its muscle.
I was under the impression that intel simply designed a brilliant and highly efficient memory controller for the C2D, and that the other performance aspects of the chip essentially nullified the HTT advantage that AMD has.

On the C2D, does each core have its own memory controller at least? That may explain a large portion of the performance advantage, if they do have dedicated memory controllers.

Also, don't the C2D and the A64 have very similar IPC stats? I'm very curious as to why the C2D is so fast. It's like intel just pulled that chip out of their hats...overnight they stole the performance crown, it's quite an amazing chip.


C2D doesn't have integrated memory controller....

Well maybe there IPC is cloe but INtel pulled off the coup simply by doubling the number of execution units from 2 of netburst to 4 of C2D....

I think they really ramped up their integer performance but AMD still does well in FPU areas...
I realize that the C2D memory controller is non-integrated, but I was reading that it's highly efficient nonetheless (almost as effective as AMD's integrated controller, including latency).
 
"It speaks" -

I'm glad AMD is trying to say it has something in the works. I'm just hoping they have something to actually display in the next month or two than just numbers. I'm sick of this speculation.

Because to be honest AMD has stated this all ready 10 times over. We don't know what 40% means. It all marketing mombo jumbo. And if you want to talk big words with college mates go ahead.
 
I never read anything like that...i may have to go look for that..sometimes some of those comments come off as marketing lingo and hype and not much more then a tweak...I cant say as I have ever read a revuew that made much of reference to this...
 
Originally posted by: Duvie
I never read anything like that...i may have to go look for that..sometimes some of those comments come off as marketing lingo and hype and not much more then a tweak...I cant say as I have ever read a revuew that made much of reference to this...
Here's a link to the AT article.
Intel's Core 2 processors now offer even quicker memory access than AMD's Athlon 64 X2, without resorting to an on-die memory controller. While Intel will eventually add one, the fact of the matter is that it's simply not necessary for competitive memory performance today thanks to Intel's revamped architecture.
I'm thinking that a memory controller at that level of efficiency would be blazingly fast on-die. Quite the impressive feat of engineering on intel's part!
 
Originally posted by: Duvie
I am sooo tired of rehashing the cache argument again...i wont....However I can tell you I know an app that is obviously loading large chunks into the L2 cache. the L2 cache runs at the clock speed thus is tremendously faster then using the memory. However it is clear from my testing it uses blocks larger then 1mb cache and can use chunks larger then 2mb of cache....I have tested this fromm 4mb down to 1mb and the gains are huge...the AMD goes from performing 2x slower to 5x slower in some of these test.

I think as the cache gets larger there is more opportunity for apps to load things into it. At 512kb I think it is file size limited. Why your argument may be correct in terms of Intel compensation, I think my argument is also correct. I think only some can take advanatge of it. however since Intel shares its cache in pools it offers the ability of some of these huge cache cpus they have coming of really taking advantage of the speed of the L2 cache....It runs at clock speed...for me that is SRAM running at 3.2ghz....

Fair enough Duvie...
To clarify though, would it be fair to say that:

1. The large cache is only more effective with apps that fall within those specific ranges (i.e. in this case between 1-4MB) and that they quickly become much slower when you use chunks that exceed the 4MB level
2. The vast majority of apps do not fall into this category
(can you think of any besides F@H?)

one thing that needs to happen is AMD needs to add a 4th execution unit as Intel did

I haven't found an app/OS combo yet that actually retires 4 ops/clock...of course that could just be my lack of experience. Have you observed this happening, and on which software?

Cheers
 
Originally posted by: SickBeast
Originally posted by: Duvie
I never read anything like that...i may have to go look for that..sometimes some of those comments come off as marketing lingo and hype and not much more then a tweak...I cant say as I have ever read a revuew that made much of reference to this...
Here's a link to the AT article.
Intel's Core 2 processors now offer even quicker memory access than AMD's Athlon 64 X2, without resorting to an on-die memory controller. While Intel will eventually add one, the fact of the matter is that it's simply not necessary for competitive memory performance today thanks to Intel's revamped architecture.
I'm thinking that a memory controller at that level of efficiency would be blazingly fast on-die. Quite the impressive feat of engineering on intel's part!

That's from the cache/prefetch Duvie and I were discussing, not the memory controller.
The biggest reason for having a large cache (assuming you have good prefetch) is that you can vastly decrease the latency by pre-loading into the L2.

Edit: Think of it this way...there are 2 good ways to cut down memory latency.
1. Decrease the latency of access to system memory
2. Preload what you are going to need into cache
AMD is supreme at the first (because of the ODMC), and Intel is supreme at the second (because of the very large cache and excellent prefetch).
 
Originally posted by: Nemesis 1
Ya I know where K8l name came from and it wasn't the Inquirer. As for K8l being released at 2.6 ghz your dreaming. Duvie I believe was correct in what he believes it will be released at. That would hold true with AMD past releases.

Also keep in mind that. These first K8l's will be server. We won't see desktop processors till late in the 3rd or 4 qt.

Intel will have Wolfdales and Yorkfields out by than. With better sse4 instructions and High K with metal gates with the 4 core models at 3.5 ghz and 12mb. of shared cache between all 4 cores. The wolfdale is what I want clocked @ 4ghz with 6mb of shared cache.

A few nitpicks.

Barcelona is SUPPOSED to come out in late Q2 or early Q3. FX Altairs are supposed to come out in Q3. Altair's clock speeds (a quarter later) are rumored to be between 2.6 and 2.9GHz. These, of course, are 125W parts, the 65W dual-core parts are also planned to hit the same clocks according to the same sources. I'd say that 2.6GHz clocks are pretty reasonable for the Barcelona SE versions (120W), though I'm not sure about the 89W versions.

I've heard rumors of SSE4 also being included as part of AMD's K8L/K10/whatever it's called this week, though I'm not absolutely certain about this.

Yorksfield, like all the other *fields, shouldl be a dual-die 2x6MB L2 part. I've heard nothing at all about the 12MB being shared between the four cores (as this would be a major redesign, not just a shrink) and I have heard the 2x6MB L2 term being thrown around. Nehalem should be Intel's first native quad-core CPU.


About these numbers, they sound nice but like duvie said, it's hard to get excited without seeing anything more specific. Let us see a 2.0GHz part outperforming a 2.66GHz Clovertown and I'll believe it.
 
Originally posted by: Duvie

C2D doesn't have integrated memory controller....

Well maybe there IPC is cloe but INtel pulled off the coup simply by doubling the number of execution units from 2 of netburst to 4 of C2D....

I think they really ramped up their integer performance but AMD still does well in FPU areas...

...and shortening their pipelines...
 
I think AMD is nuts not to beef up their L2s/L3s. Intel is already smokin' them and even talking about, what, 8MB shared L2 between four cores? I can't see how this wouldn't provide some performance benefit across the board. If AMD sticks with the effective 512KL2/512KL3 per core I can't see how they can break out of 2nd place. That type of scheme just makes no sense to me. Why are they having so much trouble with larger caches? (Or ANY caches right now.) My uninformed speculation is that the enhanced execution cycles in K10 would be offset by longer waits due to the limited space in the caches. Still, we'll see.

Intel is also on a roll. Talking about a 1.8x delta vs. Clovertowns is basically meaningless. Let's see some new A64X4 chips with a full 1 MB L3 cache per core on a 65nm process, 3.0 GHz clock speeds, upgraded next-gen IMCs with DDR3, a serious fix for their cache latency problems, and unlocked multipliers. That's the only thing that would possibly keep me from getting a C2D or C4D by the end of this year. But I think Intel's pretty much invincible right now.
 
Originally posted by: Dadofamunky
I think AMD is nuts not to beef up their L2s/L3s. Intel is already smokin' them and even talking about, what, 8MB shared L2 between four cores? I can't see how this wouldn't provide some performance benefit across the board. If AMD sticks with the effective 512KL2/512KL3 per core I can't see how they can break out of 2nd place. That type of scheme just makes no sense to me. Why are they having so much trouble with larger caches? (Or ANY caches right now.) My uninformed speculation is that the enhanced execution cycles in K10 would be offset by longer waits due to the limited space in the caches. Still, we'll see.

Intel is also on a roll. Talking about a 1.8x delta vs. Clovertowns is basically meaningless. Let's see some new A64X4 chips with a full 1 MB L3 cache per core on a 65nm process, 3.0 GHz clock speeds, upgraded next-gen IMCs with DDR3, a serious fix for their cache latency problems, and unlocked multipliers. That's the only thing that would possibly keep me from getting a C2D or C4D by the end of this year. But I think Intel's pretty much invincible right now.

Increasing the size of the cache is the easiest way to increase performance (aside from ramping up clocks). AMD doesn't do it because the die-size increases too much for the little performance gain. Yes, cache helps out in games and the like but aside from games there's not much of a benefit to having a bigger cache.

K8L will support DDR3, the motherboards with DDR3 support won't come 'till next year, though. Not that it matters since DDR3 is just higher-latency DDR2 with more bandwidth.

K8L will double the data width for L2 (as far as I know) from 128bits to 256bits. This should be a HUGE thing.

I have yet to hear anything official on just how AMD will increase the IPC for this architecture though I've heard of out-of-order stores, increasing the CPU width, improving x86 decoder efficiency, increasing buffers, increasing prefetcher efficiency, increasing all kinds of internal data buses, etc. AMD only says "improved IPC cores" which just about any combination of the things I mentioned above could accomplish.
 
People nobody knows for sure if that's true or false but either way it's quite refreshing to hear something from AMD. I love this war because we all benefict form it. I'm buying my next computer later this year and i'll go with with the best, don't care if it's from Intel or AMD.
 
Any reason why AMD would miss out K9 from the architecture naming? I.e. all this talk abut K10 being the name doesn't make sense? 🙂
 
Back
Top