Intel's new strategy.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Accord99
Originally posted by: Viditor
I think by real goal he means that Nehelam was to have all of what Merom has, plus the CSI.
As to Merom being faster than Turion X2, I'm sure it's possible and even probable...but we really don't know yet so I won't let you get away with that one m8. :)
Mobile processors have a certain power limitation. Right now, a Yonah 2.16GHz uses less power than a low-voltage single-core Turon MT 2.2GHz under load. It's no wonder AMD will have to resort to 1.075v to try to be competitive.

http://www.silentpcreview.com/article313-page5.html

Servers are the "weakest link" for Intel...1P and 2P should be faster in 32 bit, but I am going to bet that 64 bit still goes to AMD. 4P and higher is AMD hands down...
Higher than 4S goes to Intel by default, since there are no remotely decent 8S Opteron product and none from HP or Sun at all. Meanwhile IBM's X3 based servers using Xeon MPs goes to 32S and scales well. Fujitsu-Siemens and Unisys also produce >4S servers using Xeon MPs.

IBM also has the fastest 4S x86 server in the all-important TPC-C benchmark and is considerably faster in all aspects than Intel's Truland chipset. Just because Intel makes a mediocre 4S chipset doesn't mean somebody else can't improve on it.

Well, 8P is CURRENTLY Xeon by default, but it currently doesn't matter either as they aren't selling any...this will change in Q3 or Q4 with Opteron+ and the Socket F platforms. They are rated to 16P and will work with quad-core IIRC.
As to IBM, while X3 is a brilliant development for them, they have currently (publicly) lamented the fact that they didn't go Opteron Article.
It seems at the large Enterprise level, clusters tend to be a better option over a 32P server...
BTW, the only reason the IBM server won the TPC-C benchmark is because they submitted it with a $500,000+ I/O system (about 10 times the price of the competition and far more expensive than the server itself). If you look at the TPC/$ ratio, it's about at the bottom of the list...

Edit: I forgot to comment on the mobile sector...AMD will be able to lower the voltage because the Turion X2s will be using the embedded SiGe straining which drops their power requirements significantly. When Tyler is released in early 07, it will be 65nm and most likely use metal gates...but if AMD can also incorporate the Z-Ram by then, we will see some truly amazing AMD mobile offerings (Z-Ram is 1/5 the size, uses almost no power, and is very fast...).
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: Viditor
Well, 8P is CURRENTLY Xeon by default, but it currently doesn't matter either as they aren't selling any...this will change in Q3 or Q4 with Opteron+ and the Socket F platforms. They are rated to 16P and will work with quad-core IIRC.
As to IBM, while X3 is a brilliant development for them, they have currently (publicly) lamented the fact that they didn't go Opteron Article.
It seems at the large Enterprise level, clusters tend to be a better option over a 32P server...
BTW, the only reason the IBM server won the TPC-C benchmark is because they submitted it with a $500,000+ I/O system (about 10 times the price of the competition and far more expensive than the server itself). If you look at the TPC/$ ratio, it's about at the bottom of the list...
The cost comes from using fibre-channel drives, instead of SCSI drives. The performance of the two types are similar; the IBM system won because of its superior chipset.

Edit: I forgot to comment on the mobile sector...AMD will be able to lower the voltage because the Turion X2s will be using the embedded SiGe straining which drops their power requirements significantly. When Tyler is released in early 07, it will be 65nm and most likely use metal gates...but if AMD can also incorporate the Z-Ram by then, we will see some truly amazing AMD mobile offerings (Z-Ram is 1/5 the size, uses almost no power, and is very fast...).
Z-RAM is slow in comparison to SRAM. There's no evidence of metal gates in the IBM-Sony-Toshiba-AMD high performance 65nm process, most likely Intel will be first at 45nm. And if AMD is really getting a new power-reducing process, there would not be any need to lower voltages to such a low-level.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Accord99
The cost comes from using fibre-channel drives, instead of SCSI drives. The performance of the two types are similar; the IBM system won because of its superior chipset.

Ummm...fibre channel is SIGNIFICANTLY faster than SCSI (in real terms more than twice as fast), and since TPC is so heavily transaction based it makes a very large difference in the scores.


Z-RAM is slow in comparison to SRAM. There's no evidence of metal gates in the IBM-Sony-Toshiba-AMD high performance 65nm process, most likely Intel will be first at 45nm. And if AMD is really getting a new power-reducing process, there would not be any need to lower voltages to such a low-level.

From Wikipedia
"The small cell size leads, in a roundabout way, to Z-RAM being faster than even SRAM, normally much faster than DRAM. SRAM's large cell size means that any "reasonable" amount of SRAM cache takes up a large portion of the CPU die. The long traces needed to carry current into the cells have a capacitance of their own, and requires the driver circuitry to "slow down" in order to allow the charge to settle. Although Z-RAM's individual cells are not as fast as SRAM, the lack of the long lines allows a similar amount of cache to be run at roughly the same speeds by avoiding this delay. Response times as low as 3ns have been stated"

As to metal gates, they've been on AMD's internal roadmap since 2002/3...and they made their first public presentation in Dec 2003 at the IEEE meeting in Washington. While there isn't any "evidence" for Intel or AMD on metal gates, most engineers I've spoken to believe it's almost assured that it will be in the second Rev of K8L@65nm, just prior to K10. Also, Intel indicated at IEDM 2005 that they might be delaying metal gates until the 32nm node...

From Real World Tech article
"Previously, Intel had stated that its plan of record was to integrate high-k gate with metal gate electrodes at the 45 nm node. However, the sentiment offered by technical presenters from Intel at IEDM 2005 was that the roadmap is subject to change, and Intel may wait until the 32 nm node to integrate high-k gates with metal gate electrodes in the case that the metal technology does not mature in time"

The power reduction process is in the embedded SiGe layer. Intel has also had this on their roadmap, but the yields on it were terrible. They have been using SiGe as well, but the stress layer is only for the NMOS but not for the PMOS...
IBM and AMD are releasing embedded SiGe for SOI with a Dual Stress Liner (both NMOS and PMOS)...the net gain is ~40% in leakage efficiency.
RealWorld Tech
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Um, considering the cap ratios and the fact it is a floating cap, I really doubt it is actually faster than 6T SRAM. Also, while this design style will scale to the sizes required for on-die caches, but I suspect the speed will degrade much more severely as the structure gets larger. 3ns on what process, and what is the bank size?
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: dmens
Um, considering the cap ratios and the fact it is a floating cap, I really doubt it is actually faster than 6T SRAM. Also, while this design style will scale to the sizes required for on-die caches, but I suspect the speed will degrade much more severely as the structure gets larger. 3ns on what process, and what is the bank size?

Good questions...there's no way we can guess yet about the actual speeds as we don't have any data on AMD's specific implementations (or if they can actually pull it off).
I agree that the speed would degrade at larger sizes, but the whole point is that Z-ram can be implemented at much (400-500%) higher densities. So for a comparable cache capability (say both are at 4MB), we should see Intel's cache being as much as 4-5 times larger and the traces on the Z-ram being 4-5 times shorter.
While I agree that this is only a potential new "club in the bag", it's a damn big club...it allows for not only huge cost savings on die space, but a much reduced power requirement as well.

What I was pointing out to Accord was that while he's absolutely correct that the individual Z-ram cells are indeed slower than SRAM, the reduced structure size has more than enough advantages in speed to overcome that.
 

xit2nowhere

Senior member
Sep 15, 2005
438
0
0
I wonder how many people are "excited" about Conroe. I mean, granted we don't have "real" benchies yet, but what we've seen so far looks great :)
 

darkdemyze

Member
Dec 1, 2005
155
0
0
Originally posted by: Linux23
amd won't be able to keep up. buh-bye amd.:(

Just cause AMD hasn't released all of their future plans as Intel has, doesn't mean they arn't doing anything..
 

ForumMaster

Diamond Member
Feb 24, 2005
7,792
1
0
well they need the $. so they want people to upgrade a lot. hell, i'm still using an AMD Athlon XP-M 2400+ with and AGP card and my rig is great for what i do. i don't plan on spending any $ over at intel's side either.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Edit: I forgot to comment on the mobile sector...AMD will be able to lower the voltage because the Turion X2s will be using the embedded SiGe straining which drops their power requirements significantly. When Tyler is released in early 07, it will be 65nm and most likely use metal gates...but if AMD can also incorporate the Z-Ram by then, we will see some truly amazing AMD mobile offerings (Z-Ram is 1/5 the size, uses almost no power, and is very fast...).

LOL, the Z-Ram claims keep getting mroe and more outrageous every time the story is repeated. Their fastest in a lab times are only a little faster then conroes L2, and much slower than its L1. And thats a REAL product, not something that so far is just a tech demo. I'll give you L3 as a great usage if it can be brought to market, but not for L1 or L2. Also, isn't Intel gonna be headed to FD-SOI on 32nm, they said they were going there on 45nm, but that turned out to be wrong, so I assume they are now shootinng for 32nm, or have they just given this up completely?
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: BrownTown

LOL, the Z-Ram claims keep getting mroe and more outrageous every time the story is repeated. Their fastest in a lab times are only a little faster then conroes L2, and much slower than its L1. And thats a REAL product, not something that so far is just a tech demo. I'll give you L3 as a great usage if it can be brought to market, but not for L1 or L2. Also, isn't Intel gonna be headed to FD-SOI on 32nm, they said they were going there on 45nm, but that turned out to be wrong, so I assume they are now shootinng for 32nm, or have they just given this up completely?

I think you're missing the point BT. Being faster isn't an issue...the advantages are huge even if Z-ram functions at the same speed. For example, if the Toledo core had Z-ram for it's L2 cache:

1. Instead of being 199 mm2, it would be closer to 110 mm2 (which is ~47% more dice/wafer). This means that AMD would have been able to sell the X2 3800 for the same price as the PD 920 and make more profit than they are already...

2. It would have a power profile lower than Yonah's

I agree that L3 is the very best usage (because it's so large) and that L1 (because it's so much smaller) would be better served as SRAM, but there's no reason I can think of that they couldn't mix Zram (for L2) and SRAM (for L1) on the die...

Interestingly, the larger the cache is the faster it's relative speeds are. This means that a hypothetical X2 Toledo with 6MB Zram cache on 90nm would have a smaller die size than Conroe with 4MB cache and would probably be a faster chip...of course those numbers are so hypothetical as to be worthless, but it's something to consider as we start predicting what is coming in H1 07 from AMD.
 

coldpower27

Golden Member
Jul 18, 2004
1,676
0
76
Originally posted by: Viditor
Originally posted by: Accord99
Z-RAM is slow in comparison to SRAM. There's no evidence of metal gates in the IBM-Sony-Toshiba-AMD high performance 65nm process, most likely Intel will be first at 45nm. And if AMD is really getting a new power-reducing process, there would not be any need to lower voltages to such a low-level.

From Wikipedia
"The small cell size leads, in a roundabout way, to Z-RAM being faster than even SRAM, normally much faster than DRAM. SRAM's large cell size means that any "reasonable" amount of SRAM cache takes up a large portion of the CPU die. The long traces needed to carry current into the cells have a capacitance of their own, and requires the driver circuitry to "slow down" in order to allow the charge to settle. Although Z-RAM's individual cells are not as fast as SRAM, the lack of the long lines allows a similar amount of cache to be run at roughly the same speeds by avoiding this delay. Response times as low as 3ns have been stated"

As to metal gates, they've been on AMD's internal roadmap since 2002/3...and they made their first public presentation in Dec 2003 at the IEEE meeting in Washington. While there isn't any "evidence" for Intel or AMD on metal gates, most engineers I've spoken to believe it's almost assured that it will be in the second Rev of K8L@65nm, just prior to K10. Also, Intel indicated at IEDM 2005 that they might be delaying metal gates until the 32nm node...

From Real World Tech article
"Previously, Intel had stated that its plan of record was to integrate high-k gate with metal gate electrodes at the 45 nm node. However, the sentiment offered by technical presenters from Intel at IEDM 2005 was that the roadmap is subject to change, and Intel may wait until the 32 nm node to integrate high-k gates with metal gate electrodes in the case that the metal technology does not mature in time"

The power reduction process is in the embedded SiGe layer. Intel has also had this on their roadmap, but the yields on it were terrible. They have been using SiGe as well, but the stress layer is only for the NMOS but not for the PMOS...
IBM and AMD are releasing embedded SiGe for SOI with a Dual Stress Liner (both NMOS and PMOS)...the net gain is ~40% in leakage efficiency.
RealWorld Tech

Forgive me if I can't accept Wikipedia as a valid source as you are aware anyone can modify it's contents.

And where are you drawing the conclusion that yeilds are terrible?
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: coldpower27

Forgive me if I can't accept Wikipedia as a valid source as you are aware anyone can modify it's contents.

And where are you drawing the conclusion that yeilds are terrible?

You might try a Google search then...I did find many, many sources. Here's the first that popped up:
Geek.com news story
"ZRAM offers the potential to completely level the playing field with regard to any bus-speed advantage Intel might have. With the potential to be 5 times as dense as existing SRAM cache solutions, and by having a 2-3 clock cycle latency for reads and writes, the same amount of silicon real-estate for a 1 MB cache today with a 5-20 clock cycle latency could now be upgraded to a 4 MB cache with a 2-3 clock cycle latency, greatly increasing performance. ZRAM only works with SOI, so it's not a technology that Intel would be able to use on its existing process technologies"

About the yields for Intel's PMOS stress liner attempts, I apologise that I didn't include that (and don't have time to look again at the moment). However, it should be fairly evident from the links I did include that there is a difference between straining techniques in SOI and bulk... At present, Intel is using (for 65nm) and epitaxial form of compressive PMOS straining (not a stress liner) that requires a nickel silicade layer.

Let me be clear...I'm not saying anything about Intel's yields, what I'm saying is that Intel had poor yields in the lab on their DSL experiments for bulk (which is why they chose the epitaxial form of PMOS compression).
The only really relevant point here is that AMD's chips will be using much less power moving forward...
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
caches aren't a major contributer to power consumption, so thats not likely to be too big an effect (not syre you are suggesting due to ZRAM or SOI here, so maybe this is not what you are saying).

Also, lets be clear, If that news story were true and ZRAM had a 2-3 cycle latentcy at 4MB size then when/if AMD gets it that will be game over in the market, thats L1 times for 4MB cache. I find it highly unlikely therefore that ZRAM will be nearly that fast. As current I also find it unlikely that ZRAM will see the light of day in the next 1.5 years if ever.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: BrownTown
caches aren't a major contributer to power consumption, so thats not likely to be too big an effect (not syre you are suggesting due to ZRAM or SOI here, so maybe this is not what you are saying).

Also, lets be clear, If that news story were true and ZRAM had a 2-3 cycle latentcy at 4MB size then when/if AMD gets it that will be game over in the market, thats L1 times for 4MB cache. I find it highly unlikely therefore that ZRAM will be nearly that fast. As current I also find it unlikely that ZRAM will see the light of day in the next 1.5 years if ever.

One of the main reasons that Yonah has such a low power profile is that it can flush cache to the RAM and effectively shut down portions of it's L2 very quickly. Since ZRAM uses zero capacitance and requires very little power, I expect that it's cache power profile will surpass that substantially...
I must admit that I don't know what portion of the power draw the cache utilizes under load, and if you can provide some numbers it would be greatly appreciated!

As to the story, I am also discounting some of their claims...however, (as I said above) even if latencies are equivalent to L2 (instead of L1), it is a major coup for AMD.
 

Regs

Lifer
Aug 9, 2002
16,666
21
81
I would love to see this happen. Not even being sarcastic. There is only so many years you can scale a current architecture using MHz with the speed of which software is developing.