• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Globalfoundries 7LP 7nm Leading Performance FINFET process and FX-7 ASIC platform

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Something also important to remember is rack density. Even if your perf/watt is 10% lower, if you provide double the performance in the same amount of space, you still end up with a favorable TCO.
To put it into Epyc terms, even if it draws 250W for 32 cores at 5GHz, if you can cool that efficiently, you're looking at amazing amounts of performance per socket. Such a solution could be better than 4GHz at 180W in TCO terms due to rack density afforded.

This makes picking the optimal spot on the frequency/voltage curve extra difficult for servers, and why a high performance node is a boon not just for desktop users, but servers too.
 
Something also important to remember is rack density. Even if your perf/watt is 10% lower, if you provide double the performance in the same amount of space, you still end up with a favorable TCO.

Well, obviously we won't be seeing double the performance/sq meter of floor space here - otherwise it would be an *easy* decision, especially for urban server ops.
_________________________________________________________________________________________________________

I'd be impressed as all get out if Zen2 could turbo up to 5 GHz. It's early and this is a PR effort by GloFo (obviously, since they left out critical metrics). I do think that a 50% boost in cores/CCX is a straight forward ask - hopefully AMD agrees. As far as IPC, I'm in wait and see mode. Seems like AMD wrung an awful lots of IPC out of Zen already.
 
Well, obviously we won't be seeing double the performance/sq meter of floor space here - otherwise it would be an *easy* decision, especially for urban server ops.
_________________________________________________________________________________________________________

I'd be impressed as all get out if Zen2 could turbo up to 5 GHz. It's early and this is a PR effort by GloFo (obviously, since they left out critical metrics). I do think that a 50% boost in cores/CCX is a straight forward ask - hopefully AMD agrees. As far as IPC, I'm in wait and see mode. Seems like AMD wrung an awful lots of IPC out of Zen already.
That statement reminds me of a question [unanswered] I once asked here.

Is there some theoretical limit to single thread IPC? AFAIK, all we have to go on for high performance X86 is Intel's products and ALL assume that if this is what Intel gets, then that is the best possible at the time.

This reasoning was behind the low expectations for Zen in both IPC and HT. Well, they beat Intel in HT. The unquestioned general assumption that the most any competitor can do, is get close to Intel, not surpass them.
 
That statement reminds me of a question [unanswered] I once asked here.

Is there some theoretical limit to single thread IPC? AFAIK, all we have to go on for high performance X86 is Intel's products and ALL assume that if this is what Intel gets, then that is the best possible at the time.

This reasoning was behind the low expectations for Zen in both IPC and HT. Well, they beat Intel in HT. The unquestioned general assumption that the most any competitor can do, is get close to Intel, not surpass them.

The fact is improving IPC on such large cores is a difficult task and is costly in terms of power and transistors spent. There is a law of diminishing returns in trying to increase IPC. Therefore AMD or Intel have to be very judicious in terms of where they spend the extra transistors from a node jump.
 
Here we have multiple sources confirming that GF 7LP is launching with a process for high performance .

http://www.anandtech.com/show/11558...nm-plans-three-generations-700-mm-hvm-in-2018

The 7 nm platform of GlobalFoundries is called 7LP for a reason — the company is targeting primarily high-performance applications, not just SoCs for smartphones, which contrasts to TSMC’s approach to 7 nm. GlobalFoundries intends to produce a variety of chips using the tech, including CPUs for high-performance computing, GPUs, mobile SoCs, chips for aerospace and defense, as well as automotive applications. That said, in addition to improved transistor density (up to 17 million gates per mm2 for mainstream designs) and frequency potential, GlobalFoundries also expects to increase the maximum die size of 7LP chips to approximately 700 mm², up from the roughly 650 mm² limit for ICs the company is producing today. In fact, when it comes to the maximum die sizes of chips, there are certain tools-related limitations.

For their newest node, the company is focusing on two ways to reduce power consumption of the chips: implementing superior gate control, and reducing voltages. To that end, chips made using GlobalFoundries' 7LP technology will support 0.65 – 1 V, which is lower than ICs produced using the company’s 14LPP fabrication process today. In addition, 7LP semiconductors will feature numerous work-functions for gate control.


https://www.semiwiki.com/forum/content/6837-globalfoundries-7nm-euv-update.html

GF however is leading with a high performance (LP equals Lead Performance in IBM speak) version of 7nm for AMD while TSMC is first with a low power version of 7nm for Apple, Qualcomm, MediaTek, and the other SoC vendors.


https://www.globalfoundries.com/sites/default/files/product-briefs/7lp-product-brief.pdf
http://semimd.com/chipworks/2017/01/18/iedm-2016-setting-the-stage-for-75-nm/

Four Vt options are available in the TSMC 7-nm technology [1] . There are four device Vt options with a range of ~200 mV.

GF 7LP provides choice of 5 core device Vt compared to TSMC which provides 4. Here again the requirement to support very high performance 5 Ghz and low power mobile is driving the need to provide a broader range of Vt and higher Vt.

TSMC is launching first with a low power mobile version of N7 for its primary customer Apple and GF is launching with a high performance version of 7LP for AMD. Its going to be interesting to see the comparison of products manufactured at TSMC N7 HPC for Nvidia vs GF 7LP for AMD in 2019.
 
While this all is very interesting to read, but knowing GF track record of over promising and under delivering, I'm not jumping around and clapping my hands just yet.

IBM engineering mixed in this equation is rising my hopes to new level though, so there seems to be real change to hit the H2 2018 volume production target rather than Q4 2019 which it would be in reality if GF would do everything alone.
 
That statement reminds me of a question [unanswered] I once asked here.

Is there some theoretical limit to single thread IPC? AFAIK, all we have to go on for high performance X86 is Intel's products and ALL assume that if this is what Intel gets, then that is the best possible at the time.

This reasoning was behind the low expectations for Zen in both IPC and HT. Well, they beat Intel in HT. The unquestioned general assumption that the most any competitor can do, is get close to Intel, not surpass them.

You're limited by the number of parallel operations present in your code. For instance in equation A = (B + C) / D, you can't run both the addition and division in parallel; the second is dependent on the result of the first. This level of parallelism is different for every piece of code, but is the ultimate limit to instruction level parallelism.
 
You're limited by the number of parallel operations present in your code. For instance in equation A = (B + C) / D, you can't run both the addition and division in parallel; the second is dependent on the result of the first. This level of parallelism is different for every piece of code, but is the ultimate limit to instruction level parallelism.

What you just described is a relationship between non/commutative and non/associative operations with respect to parallelism ... 🙂

If every program could be reduced to be either commutative (like addition) or associative (like multiplication) operations then programmers wouldn't be worth much since one could parallelize every program optimally with code being near maintenance free to boot and we'd all be rocking with GPUs ... 😱
 
What you just described is a relationship between non/commutative and non/associative operations with respect to parallelism ... 🙂

If every program could be reduced to be either commutative (like addition) or associative (like multiplication) operations then programmers wouldn't be worth much since one could parallelize every program optimally with code being near maintenance free to boot and we'd all be rocking with GPUs ... 😱
That particular example of A = (B+C)/D has nothing to do with commutativity. Assuming A, B, C are vectors then D must be a scalar(in the mathematical sense), in which case it is a simple multiplication that can be done outside of the loop in one operation.
 
That particular example of A = (B+C)/D has nothing to do with commutativity. Assuming A, B, C are vectors then D must be a scalar(in the mathematical sense), in which case it is a simple multiplication that can be done outside of the loop in one operation.

Actually I was intending for A, B and C to all be scalars as well 🙂 I was talking about instruction level parallelism, or ILP, which is what superscalar CPUs exploit.
 
Honestly, I have a hard time believing this is true.

Intel was first to 14nm back in 2014, while GF only reached 14nm in 2016, so GF was 2 years behind. And now they claim 7nm in 2018, while Intel will only be doing 10nm at that time ? So we are supposed to believe that they have suddenly jumped from being 2 years behind to being one node ahead ? 14nm to 7nm in 2 years while Intel has taken 4 years to go from 14nm to 10nm ? That would be quite miraculous if they pull it off, but I have my doubts.
 
That particular example of A = (B+C)/D has nothing to do with commutativity. Assuming A, B, C are vectors then D must be a scalar(in the mathematical sense), in which case it is a simple multiplication that can be done outside of the loop in one operation.

Not what I had in mind. My point was if you read a little bit about algebraic fields, commutative and associative operations under certain number fields are order independent which has huge ramifications for parallelization ...

For example ... (assuming that the set of numbers only consists of integers)

(A = A+B+C+D) contains only commutative operations which is easily amenable to parallelization ... (In fact if one had a 4-operand addition unit it would take only 1 instruction to execute!)

(A = A-B-C-D) does not have any commutative or associative operations so it can't be parallelized. It would need 3 instructions to resolve problem ... (A-B, (A-B)-C, ((A-B)-C)-D in that order)
 
Not what I had in mind. My point was if you read a little bit about algebraic fields, commutative and associative operations under certain number fields are order independent which has huge ramifications for parallelization ...

For example ... (assuming that the set of numbers only consists of integers)

(A = A+B+C+D) contains only commutative operations which is easily amenable to parallelization ... (In fact if one had a 4-operand addition unit it would take only 1 instruction to execute!)

(A = A-B-C-D) does not have any commutative or associative operations so it can't be parallelized. It would need 3 instructions to resolve problem ... (A-B, (A-B)-C, ((A-B)-C)-D in that order)

And then of course we need to consider that we have limited precision in all of our instructions, so the amount of re-ordering that the CPU can do at runtime is limited 🙂 Mathematically speaking, an infinite chain of A = B + C + D + E + ... is associative and can be massively parallelised (with some kind of tree reduction at the end), but in reality changing the execution order can massively change the end result (especially for floating point datatypes). Of course at compile time the user can set compiler flags to indicate that he is more relaxed about accuracy (usually some sort of --fast_math flag) and let the compiler make similar optimizations, but the CPU does not have that information.
 
Honestly, I have a hard time believing this is true.

Intel was first to 14nm back in 2014, while GF only reached 14nm in 2016, so GF was 2 years behind. And now they claim 7nm in 2018, while Intel will only be doing 10nm at that time ? So we are supposed to believe that they have suddenly jumped from being 2 years behind to being one node ahead ? 14nm to 7nm in 2 years while Intel has taken 4 years to go from 14nm to 10nm ? That would be quite miraculous if they pull it off, but I have my doubts.

From what I know, Intel's 10nm and GF 7nm will be very similar in most aspects. They will not be one node ahead.
 
From what I know, Intel's 10nm and GF 7nm will be very similar in most aspects. They will not be one node ahead.
Yep and add 14nm arived from Samsung and 7nm lp is a IBM technology. The timeframes of then ariving is more about deals and when ibm/samsumg is ready. Not gf.
If gf had to make 14nm by itself we would still be waiting for it.

Gf dropped the stupid idea that they had to make the basic process research themselves. A very simple decision but it saved them. And amd as a consequence. Now they can shop for best research.

And amd can shop for best process. Products get better. Customers win. Its good for all.
 
Not what I had in mind. My point was if you read a little bit about algebraic fields, commutative and associative operations under certain number fields are order independent which has huge ramifications for parallelization
Well, speaking of linear spaces of which vectors are an example, they form a group under addition so they are guaranteed to be associative. As an added bonus, they are commutative under addition as well(and hence they form an abelian group).
(A = A+B+C+D) contains only commutative operations which is easily amenable to parallelization ... (In fact if one had a 4-operand addition unit it would take only 1 instruction to execute!)

(A = A-B-C-D) does not have any commutative or associative operations so it can't be parallelized. It would need 3 instructions to resolve problem ... (A-B, (A-B)-C, ((A-B)-C)-D in that order)
Or you could flip the sign bit in B, C and D and do the same 4-operand addition, with one additional step to check the correctness of the result of course.
Just use a 3 operand addition on B,C,D and one more step would be needed. Less parallel, but not impossible.
 
Last edited:
Well, speaking of linear spaces of which vectors are an example, they form a group under addition so they are guaranteed to be associative. As an added bonus, they are commutative under addition as well(and hence they form an abelian group).

Or you could flip the sign bit in B, C and D and do the same 4-operand addition, with one additional step to check the correctness of the result of course.
Just use a 3 operand addition on B,C,D and one more step would be needed. Less parallel, but not impossible.
This.
 
Honestly, I have a hard time believing this is true.

Intel was first to 14nm back in 2014, while GF only reached 14nm in 2016, so GF was 2 years behind. And now they claim 7nm in 2018, while Intel will only be doing 10nm at that time ? So we are supposed to believe that they have suddenly jumped from being 2 years behind to being one node ahead ? 14nm to 7nm in 2 years while Intel has taken 4 years to go from 14nm to 10nm ? That would be quite miraculous if they pull it off, but I have my doubts.

Node names are meaningless. They could call it 7nm, 2nm or 100mm. It would all be the same thing.
 
Node names are meaningless. They could call it 7nm, 2nm or 100mm. It would all be the same thing.

While this is true, I think that this is the closest other fabs have been to Intel in a long while. From what's been published the gate and interconnect pitches for the 7 nm process are pretty similar, with Intel only have a slight edge. Historically Intel enjoyed both much longer time-to-market advantage as well as better characteristics for a given process. Intel has had a lot of problems with the move to 10 nm, so I think it's more of a case of them falling behind their usual pace than anything else. It remains to be seen how well TSMC and Global Foundries can execute on their 7 nm process though as they're no strangers to delays and setbacks either.
 
While this is true, I think that this is the closest other fabs have been to Intel in a long while. From what's been published the gate and interconnect pitches for the 7 nm process are pretty similar, with Intel only have a slight edge. Historically Intel enjoyed both much longer time-to-market advantage as well as better characteristics for a given process. Intel has had a lot of problems with the move to 10 nm, so I think it's more of a case of them falling behind their usual pace than anything else. It remains to be seen how well TSMC and Global Foundries can execute on their 7 nm process though as they're no strangers to delays and setbacks either.

Yes. This is the closest that the foundries have ever got to Intel in terms of transistor density. Intel 10nm is slightly more dense than TSMC N7 and GF 7LP.

Intel 10nm , CPP = 54nm , MMP = 36nm , Cell Area = CPP * MMP = 54 * 36 = 1944
TSMC 7nm , CPP = 54nm , MMP = 40nm , Cell Area = CPP * MMP = 54 * 40 = 2160 (CPP is estimates from area shrink, MMP is actual disclosed value)
GF 7nm , CPP = 56nm , MMP = 40nm , Cell Area = CPP * MMP = 56 * 40 = 2240 (CPP and MMP are estimates from disclosed area shrink)
 
Last edited:
While this is true, I think that this is the closest other fabs have been to Intel in a long while. From what's been published the gate and interconnect pitches for the 7 nm process are pretty similar, with Intel only have a slight edge. Historically Intel enjoyed both much longer time-to-market advantage as well as better characteristics for a given process. Intel has had a lot of problems with the move to 10 nm, so I think it's more of a case of them falling behind their usual pace than anything else. It remains to be seen how well TSMC and Global Foundries can execute on their 7 nm process though as they're no strangers to delays and setbacks either.

Let's wait and see. GF has a long history of over-promising on their node technology.
 
Let's wait and see. GF has a long history of over-promising on their node technology.
If it were just GF, I'd be skeptical as well. But, at the end of the day this is an IBM node. Does that make it certain? Nothing is a %100 certainty when it comes to nodes. Intel has had more than a few fumbles as well I'd point out. All the foundries use the same litho gear from ASML. And they all have the same challenges. Intel has been very good at using new things. Not perfect, but very good is about as good as you're going to get. But IBM is very good too.
 
Back
Top