Full Skylake reveal result? Waiting for Zen.

2is · Aug 21, 2015

StrangerGuy said:
K6 was utter trash in games even with a Geforce 2 Ti, particularly in UT99 compared to a stock clocked Celeron-As. Not to mention general stability where Intel wins by a mile.

I actually meant K6-3

jhu · Aug 21, 2015

nismotigerwvu said:
Very true, 3Dnow brought the gaming performance of the K6-2 to basically a dead heat with the Pentium III clock for clock.

http://www.anandtech.com/show/160/10

Technically it isn't the most fair comparison in terms of the underlying architectures by giving the K6 SIMD support while the the Pentium III was running X87 code, but SSE was a year later to market. While Nvidia never really took advantage of 3dnow, 3DFX sure did and a Voodoo II equipped K6-2 was a very competent machine.

Man, those were awful days: Glide, OpenGL, miniGL. Everyone wrote for Glide and everyone was too lazy to write a full OpenGL driver so they just made a small subset that Quake used.

looncraz · Aug 22, 2015

Enigmoid said:
Windows scheduler performance has been fixed greatly in windows 8 and 10. Windows 7 likes to bounce threads around which causes minor performance losses of cache needs to be flushed and rewritten. This behaviour can be seen on Intel CPUs as well (though intel does a better job of masking this problem).

You are referring to the kernel scheduler overhead, I'm talking about the background instruction load on the module design, specifically.

Related, but not the same thing. One effects both, the other only effects Bulldozer-derived designs.

Enigmoid said:
You are referring to stuff like this.
[snip]

No, actually, I'm not. You see, that shows scaling, and I'm talking about the delta of performance when a module is loaded or unloaded, and how much hidden overhead there just happens to be even when only one thread is being executed. We can't see all of the overhead, sadly, but it is very much there.

First, let's look at the minimum, non-fluid, losses due to the module design:

For the front-end, we lose one cycle, minimum, no matter what, just in the need to determine to which thread an instruction belongs, this stage, IIRC, is part of the dispatch control logic. The next point when we lose a cycle is with the FPU scheduler (same reason, mostly) and also SIMD... however, this cost is repeated on instruction retirement as the FPU has to determine which core's load/store unit has the solemn duty of handling the computational result.

For integer loads, the front-end has an extra one-cycle penalty that is always present, but a minimum of three cycles for FPU or SIMD instructions. For latency-sensitive loads, this can be harmful. And all that is assuming AMD managed to keep things into the best-possible design (which they probably did - if I can think of how they could do it, they probably did it :thumbsup

.

However, we know the dispatch controller gets backed up and is paired with a resource monitor that is doing double-duty, which probably means there is another lost cycle in here somewhere, otherwise these CPUs could not clock as well.

We also know that the branch prediction is where AMD has made most of its improvements, as misses are more costly with the module design due to the extra cycles involved, so those few extra cycles can really start to add up when a miss occurs.

Our next issue with performance, that is specifically limited to the module design, is the shared L2 and the write coalescing cache. If the L2 was not shared, there'd be no need for concurrency during cache accesses. When a core needs data for an instruction it must utilize the CIU (forget what it's called, too lazy to look it up - almost bed time

). This CIU has the task of enforcing read and write concurrency with the L2 and the L3/MC. This adds at a minimum one cycle just for concurrency concerns... for reads. For writes, however, we go through the WCC, which will add, at least, one more cycle.

All of the aforementioned factors are present regardless of whether or not another thread is executing. However, I'll run through what happens for different workloads, just for fun :wub:

When only one thread is executing, we have a 5~8 cycle overhead versus a non-module design, which will, often enough, be repeated due to branch prediction failures, data stalls, or interrupts resulting in context switches and pipeline flushes (this last thing is the only thing Windows 8's scheduler really improved upon). When the other thread in a module is receiving just a 'HLT' instruction, we have full access of the front-end for one thread until something is scheduled for the other thread, so the front-end fills up the pipeline completely as prefetch and branch prediction work together to fill the L2 with impunity while attempting to keep the execution units fully utilized. When a branch prediction misses its mark, its entire chain is flushed from the pipeline (a simple operation), and that part of the pipeline simply goes dormant until it is filled up again as the normal progression of the chip's operation continues. Each time that happens, of course, it destroys a different amount of in-flight instructions and related data, so it multiplies that extra 5~8 cycles by some variable amount.

Now, an interrupt wakes up the second thread. The incoming instructions are fed into the pipeline and data is fetched as needed, and the pipeline can become unevenly divided, but tends towards equal division. It goes through the same front-end and introduces few additional damage to the other execution thread at this point, except for fewer predictive executions the front-end can still pretty much keep the execution units fed, so there is very little loss at this point from executing two threads in the module versus one thread.

The primary losses when running two threads on the module come from cache contention and a reduction in branch predictions tried during stalls. AMD profiled their design to determine where the greatest stall would be in this stage, naturally, and implement the WCC so that stalls when writing results to the L2 (through the CIU) are less prone to stuffing up the execution pipeline (since the instructions cannot be retired until the result is written, naturally). The cache latency under contention is the greater evil when two threads are both taxing the module, but the added pipeline complexity doesn't help - and that difference is further compounded by a reduction in the amount of the front-end and branch prediction capabilities that can be dedicated to one thread.

So, in conclusion, I hope you can see how the module design can so negatively effect single-threaded performance, even if the other thread is mostly, or even entirely, dormant. Even though this might contradict what many think, this is the simple truth. Even an optimal module design adds stages to the pipeline, which is always bad when it can't be coupled with a significant boost in clock rate or sufficient reduction in branch prediction misses.

When AMD dumps the module design, they will lose 5~8 cycle latency cost for every single instruction going through the CPU, and each time a branch prediction fails they will have a 5~8 cycle reduction in cost.

That much of a reduction in latency in the pipeline should be worth about 3% overall, though some workloads won't care at all - none will be hurt by this improvement. The massively reduced branch prediction failure penalty, however, is universally beneficial. It is estimated that Bulldozer's misprediction penalty is about 20 cycles, which seems about right in line with my understanding of the architecture. Reducing that penalty back to phenom II's 12 cycles would do wonders, especially with all of the improvements AMD has made to their branch prediction unit that has to deal with this higher penalty.

EDIT:

Also, it should be noted that a considerable amount of Sandy Bridge's IPC improvement over Nehalem was from a variable 3 cycle reduction in the cache misprediction penalty.

I decided to look it up ;-)

http://www.anandtech.com/show/5057/the-bulldozer-aftermath-delving-even-deeper/2

Exophase · Aug 22, 2015

jhu said:
There were barely any games that used 3Dnow (I don't recall any). The problem with the K6 FPU was that the shortest instruction took 2 cycles (FADD, FMUL, etc., even the 3Dnow ones) whereas P6 was effectively 1 cycle because it was pipelined. Games back then were way more dependant on yhe FPU than they are now.

3DNow! on K6-2 and III was fully pipelined, see this document: http://www.ii.uib.no/~osvik/amd_opt/21924d.pdf (page 20)

P6, on the other hand, only had partially pipelined FMUL, resulting in the 2 cycle throughput you describe (http://www.agner.org/optimize/instruction_tables.pdf page 108), although this didn't apply to SSE. This combined with the fact that 3DNow! is also two-wide SIMD really did give allow it better FP throughput than x87 on P6.

Idontcare · Aug 22, 2015

jhu said:
What's considered a good yield? Is 20% considered good, or is it normally considered bad but was good in this case?

Depends on many things.

If you are capacity constrained, good yield is in the 90%'s because you can sell all the chips you can yield. But in an under-utilized situation (fab loadings at say 50% or 70%) then having 95% yield versus 85% makes almost no difference, you just end up running fewer wafers through the fab (at marginally lower materials expenses but same depreciation and staffing costs) since you can't sell all the chips you would otherwise be making if you were running at full capacity.

In the parlance of fab speak though, we talk about "yield entitlement", which is the yield you (as a business/company) are entitled to attain on the basis of the investments you have made into developing/optimizing/maintaining a certain process node and accompanying fab equipment.

For example, lets say you decide to spend less on your air handlers or filtration equipment in the fab. You save ongoing expenses, reducing the filtration quality or some such, and in exchange you acknowledge and expect your D0 (base defectivity level) to rise by a certain (and predictable) amount, thus lowering your yield entitlement by a certain (and predictable) amount. Its a cost/opportunity trade off.

No different when going in the other direction, the only reason modern fabs have the D0 they have, and not better D0 values (thus enabling better yield entitlement), is because the numbers were crunched and it was determined that the cost-benefits of targeting the existing levels of D0 (while accepting the fact that yields will be lower because of it) made financial sense.

This is a lot of words but what I am attempting to do is speak to the fact that the answer to your question is that it depends on the cost-benefits structure the particular fab owner (IDM/foundry) is targeting, as well as fab utilization rates.

Yield entitlement calcs are imperative though in establishing milestone objectives and project management stages. Spend a lot of money (as Intel did with 14nm) and come in below yield entitlement and you have a problem on your hands (poor project management and planning), but spend a little money (as say UMC did with 28nm HKMG development) and come in above yield entitlement and you have the hallmarks of a well managed development program and fab environment.

Anyways, for us at the time, 20% was well above yield entitlement. We did not make the monetary investments necessary to ensure that we would have D0 values that would enable us to hit 20% yields for the die sizes we were fabbing for SUN at the time. So when we hit those yields it was no surprise to fab planning that SUN asked us to reduce wafer starts, they had done their wafer start modeling on the basis of yield entitlement numbers and we did no one any favors by coming in with better yields. Just meant the fab utilization numbers declined (decreasing costs by maybe 1% from a materials standpoint, but all the fixed costs remain unchanged).

TheProgrammer · Aug 22, 2015

Phynaz said:
My son works in the industry as a designer. Please let us know what AAA title you were a developer on and I'll have him provide confirmation.

Phynaz said:
I'm calling shens on him being a AAA Windows game dev.
If TheProgrammer provides us with details (he won't) my son could reach out to the PM for the game with a sample of his posts and ask if the writing style is anything like somebody that was on that particular project.
Barring that, he's just trolling. It's time for him to put-up or shut-up.

No problem. You don't need to go off my "writing style" though. I'll give you my website with my 15 years history in the industry listed and every project I've been involved with. But not first without giving me your son's name and where he works.

BTW unless your son has worked in Europe he doesn't know anyone at the place. I'm sorry you're going to show him what a clown his dad is though when you ask him to verify a stranger on the internet by his writing style.

mohit9206 said:
Only then most tech websites were neutral. Today most tech websites have been bought out by Intel and Nvidia.
Skylake is 2.3% faster than Broadwell and Anandtech among other sites still gives it a rave review. Never in a million years would respected Mr. Anand Lal Shimpi would give a new processor generation a recommended rating if it was only 2%faster than previous generation.

^ that

StrangerGuy · Aug 22, 2015

TheProgrammer said:
^ that

Oh yes, reviewers are so biased against AMD that they are doing damage PR control for a $185 Phenom II X4 980 instead of saying it's a complete crap of a chip against a 2500K like what enthusiasts will do that isn't a blatant AMD fanboy.

witeken · Aug 22, 2015

Idontcare said:
Depends on many things.

If you are capacity constrained, good yield is in the 90%'s because you can sell all the chips you can yield. But in an under-utilized situation (fab loadings at say 50% or 70%) then having 95% yield versus 85% makes almost no difference, you just end up running fewer wafers through the fab (at marginally lower materials expenses but same depreciation and staffing costs) since you can't sell all the chips you would otherwise be making if you were running at full capacity.

In the parlance of fab speak though, we talk about "yield entitlement", which is the yield you (as a business/company) are entitled to attain on the basis of the investments you have made into developing/optimizing/maintaining a certain process node and accompanying fab equipment.

For example, lets say you decide to spend less on your air handlers or filtration equipment in the fab. You save ongoing expenses, reducing the filtration quality or some such, and in exchange you acknowledge and expect your D0 (base defectivity level) to rise by a certain (and predictable) amount, thus lowering your yield entitlement by a certain (and predictable) amount. Its a cost/opportunity trade off.

No different when going in the other direction, the only reason modern fabs have the D0 they have, and not better D0 values (thus enabling better yield entitlement), is because the numbers were crunched and it was determined that the cost-benefits of targeting the existing levels of D0 (while accepting the fact that yields will be lower because of it) made financial sense.

This is a lot of words but what I am attempting to do is speak to the fact that the answer to your question is that it depends on the cost-benefits structure the particular fab owner (IDM/foundry) is targeting, as well as fab utilization rates.

Yield entitlement calcs are imperative though in establishing milestone objectives and project management stages. Spend a lot of money (as Intel did with 14nm) and come in below yield entitlement and you have a problem on your hands (poor project management and planning), but spend a little money (as say UMC did with 28nm HKMG development) and come in above yield entitlement and you have the hallmarks of a well managed development program and fab environment.

Anyways, for us at the time, 20% was well above yield entitlement. We did not make the monetary investments necessary to ensure that we would have D0 values that would enable us to hit 20% yields for the die sizes we were fabbing for SUN at the time. So when we hit those yields it was no surprise to fab planning that SUN asked us to reduce wafer starts, they had done their wafer start modeling on the basis of yield entitlement numbers and we did no one any favors by coming in with better yields. Just meant the fab utilization numbers declined (decreasing costs by maybe 1% from a materials standpoint, but all the fixed costs remain unchanged).

Nice post as usual, but not sure if 14nm is poor project management and planning. Intel is no amateur.

MiddleOfTheRoad · Aug 22, 2015

cmdrdredd said:
Actually...if you look at the DX12 benchmarks available, clockspeed is king and a 6700k beats a 5960x. More cores won't help, especially not AMD cores.

Yeah, just like Nvidia cards are going to be way faster than Radeons under DirectX 12..... Right? (sarcasm)

Blanket statements always backfire.......

MiddleOfTheRoad · Aug 22, 2015

jhu said:
Man, those were awful days: Glide, OpenGL, miniGL. Everyone wrote for Glide and everyone was too lazy to write a full OpenGL driver so they just made a small subset that Quake used.

I disagree with that statement. It was the opposite in my opinion. If you owned a 3DFX card, you pretty much were guaranteed that the game was going to run fast and smooth.... A pleasant gaming experience every time. Glide could even run games on gutless Cyrix or Winchip cpu's -- those CPU's always choked trying to run Direct3D or OpenGL.... But glide could get the job done.

ShintaiDK · Aug 22, 2015

Glide didnt use any less CPU than DirectX or OpenGL.

And Glide quickly became a problem when it couldnt get more than 16bit colours. And DirectX quickly running away from it in performance as well.

mrmt · Aug 22, 2015

mahoshojo said:
The R&D reduction is kind of always misleading. AMD sold >80% of their analog/IO group to Synopsys, and outsourced their south bridge to ASmedia. It's about 20% or more of AMD's R&D workforce.

No, it's not. That means AMD will have to pay royalties (included on COGS) during the entire lifetime of the product and not during the R&D stages only as expense. AMD might end up with an even worse cost structure than they have before, and the company only does this because they have to save cash whatever the cost.

Think of it as the same situation of their fabs. Selling the fabs to Globalfoundries took out a lot of debt and expenses out of AMD balance sheet, but certainly didn't solve the problem of not having profitable fabs, and the consequences are here for everyone to see.

Idontcare · Aug 22, 2015

witeken said:
Nice post as usual, but not sure if 14nm is poor project management and planning. Intel is no amateur.

It is neither.

14nm was planned and executed well. But the phrase "yield entitlement" contains the word "entitlement" for a reason, it merely speaks to the opportunity one is due, not the actuality that comes of it/

At an individual level it is much like choosing to pursue higher education, you do so at a cost-benefits assessment, but the reality is that only the costs are certain while the benefits are a bit of a gamble. The benefits, what you are entitled to expect for your investment and efforts, (get a B.S. and have the opportunity to earn $70k per year!) is more of a maximum possible envelope and not a guaranteed outcome (you've been laid off, go to unemployment and collect your benefits

).

Intel did all they needed to make sure 14nm was on time, healthy, and high yielding. They paid their dues, to be sure. That it hasn't panned out yet doesn't mean they failed, it just means at some point they miscalculated what their entitlement (for their give investment) was worth. They obviously underestimated what it would take in terms of R&D to have the 14nm node they desired to have ready for HVM in fall 2014.

Their yield entitlement estimates were in error, not the efforts their engineers made with the resources they were granted to craft the node some 4 years prior. (I say this based on data from real life interactions, not based on anything I can necessarily link to, so take it as anecdotal or hearsay, but it is a position I am confident in having at this time)

Had they a better estimate of their 14nm yield entitlement in 2013 then they would have either increased R&D so as to ensure a higher entry yield entitlement point come 2014, or they would have done what they just did with 10nm and elected to delay the process node a year or so such that the yield entitlement fell in-line with expectation on the basis of investments up to that point in time (on the then protracted timeline).

cmdrdredd · Aug 22, 2015

MiddleOfTheRoad said:
Yeah, just like Nvidia cards are going to be way faster than Radeons under DirectX 12..... Right? (sarcasm)

Blanket statements always backfire.......

Who was talking about GPUs? Nvidia clearly has work to do, but we are talking about CPUs. This is the CPU subforum isn't it?

Clockspeed and IPC have been shown thus far to benefit DX12 more than number of cores or threads. I'm really surprised this has to be repeated over and over when the charts are pretty clear. Sure different engines might benefit from more threads, drivers perhaps can change how the CPU is accessed and use more threads. What we see right now in the first few benchmarks we have to go off of today, it's not the case.

Enigmoid · Aug 22, 2015

I think I see what you are trying to say. You mean tradeoffs that were done to make the core CMT compatible. I get that. When I said dropping CMT I meant literally exorcising the resources for the second thread and making no further changes. I accept your explanation.

looncraz said:
For the front-end, we lose one cycle, minimum, no matter what, just in the need to determine to which thread an instruction belongs, this stage, IIRC, is part of the dispatch control logic. The next point when we lose a cycle is with the FPU scheduler (same reason, mostly) and also SIMD... however, this cost is repeated on instruction retirement as the FPU has to determine which core's load/store unit has the solemn duty of handling the computational result.

For integer loads, the front-end has an extra one-cycle penalty that is always present, but a minimum of three cycles for FPU or SIMD instructions. For latency-sensitive loads, this can be harmful. And all that is assuming AMD managed to keep things into the best-possible design (which they probably did - if I can think of how they could do it, they probably did it :thumbsup.

However, we know the dispatch controller gets backed up and is paired with a resource monitor that is doing double-duty, which probably means there is another lost cycle in here somewhere, otherwise these CPUs could not clock as well.

We also know that the branch prediction is where AMD has made most of its improvements, as misses are more costly with the module design due to the extra cycles involved, so those few extra cycles can really start to add up when a miss occurs.

Our next issue with performance, that is specifically limited to the module design, is the shared L2 and the write coalescing cache. If the L2 was not shared, there'd be no need for concurrency during cache accesses. When a core needs data for an instruction it must utilize the CIU (forget what it's called, too lazy to look it up - almost bed time ). This CIU has the task of enforcing read and write concurrency with the L2 and the L3/MC. This adds at a minimum one cycle just for concurrency concerns... for reads. For writes, however, we go through the WCC, which will add, at least, one more cycle.

All of the aforementioned factors are present regardless of whether or not another thread is executing. However, I'll run through what happens for different workloads, just for fun :wub:

When only one thread is executing, we have a 5~8 cycle overhead versus a non-module design, which will, often enough, be repeated due to branch prediction failures, data stalls, or interrupts resulting in context switches and pipeline flushes (this last thing is the only thing Windows 8's scheduler really improved upon). When the other thread in a module is receiving just a 'HLT' instruction, we have full access of the front-end for one thread until something is scheduled for the other thread, so the front-end fills up the pipeline completely as prefetch and branch prediction work together to fill the L2 with impunity while attempting to keep the execution units fully utilized. When a branch prediction misses its mark, its entire chain is flushed from the pipeline (a simple operation), and that part of the pipeline simply goes dormant until it is filled up again as the normal progression of the chip's operation continues. Each time that happens, of course, it destroys a different amount of in-flight instructions and related data, so it multiplies that extra 5~8 cycles by some variable amount.

Now, an interrupt wakes up the second thread. The incoming instructions are fed into the pipeline and data is fetched as needed, and the pipeline can become unevenly divided, but tends towards equal division. It goes through the same front-end and introduces few additional damage to the other execution thread at this point, except for fewer predictive executions the front-end can still pretty much keep the execution units fed, so there is very little loss at this point from executing two threads in the module versus one thread.

The primary losses when running two threads on the module come from cache contention and a reduction in branch predictions tried during stalls. AMD profiled their design to determine where the greatest stall would be in this stage, naturally, and implement the WCC so that stalls when writing results to the L2 (through the CIU) are less prone to stuffing up the execution pipeline (since the instructions cannot be retired until the result is written, naturally). The cache latency under contention is the greater evil when two threads are both taxing the module, but the added pipeline complexity doesn't help - and that difference is further compounded by a reduction in the amount of the front-end and branch prediction capabilities that can be dedicated to one thread.

So, in conclusion, I hope you can see how the module design can so negatively effect single-threaded performance, even if the other thread is mostly, or even entirely, dormant. Even though this might contradict what many think, this is the simple truth. Even an optimal module design adds stages to the pipeline, which is always bad when it can't be coupled with a significant boost in clock rate or sufficient reduction in branch prediction misses.

When AMD dumps the module design, they will lose 5~8 cycle latency cost for every single instruction going through the CPU, and each time a branch prediction fails they will have a 5~8 cycle reduction in cost.

That much of a reduction in latency in the pipeline should be worth about 3% overall, though some workloads won't care at all - none will be hurt by this improvement. The massively reduced branch prediction failure penalty, however, is universally beneficial. It is estimated that Bulldozer's misprediction penalty is about 20 cycles, which seems about right in line with my understanding of the architecture. Reducing that penalty back to phenom II's 12 cycles would do wonders, especially with all of the improvements AMD has made to their branch prediction unit that has to deal with this higher penalty.

EDIT:

Also, it should be noted that a considerable amount of Sandy Bridge's IPC improvement over Nehalem was from a variable 3 cycle reduction in the cache misprediction penalty.

I decided to look it up ;-)

http://www.anandtech.com/show/5057/the-bulldozer-aftermath-delving-even-deeper/2

looncraz · Aug 22, 2015

mrmt said:
No, it's not. That means AMD will have to pay royalties (included on COGS) during the entire lifetime of the product and not during the R&D stages only as expense. AMD might end up with an even worse cost structure than they have before, and the company only does this because they have to save cash whatever the cost.

AMD will not be paying royalties, motherboard makers will be buying the parts from another company, that's it. That won't hurt AMD at all, really, but it does have some effects: Reduced R&D and the loss of the ability to gain any revenue from those products (assuming AMD isn't RECEIVING royalties - which is more likely than them paying royalties

).

mrmt said:
Think of it as the same situation of their fabs. Selling the fabs to Globalfoundries took out a lot of debt and expenses out of AMD balance sheet, but certainly didn't solve the problem of not having profitable fabs, and the consequences are here for everyone to see.

Nonsense, all of it.

AMD divorced its chip design from its chip fabrication in order to distance itself from its massive fabrication expenses and losses. They are no longer concerned if the fabs are profitable or not, since they don't have any. If GlobalFoundries fails in the coming years (not likely), AMD will just build chips at TSMC or Samsung or IBM or wherever else has the ability to produce their chips. And ALL of those companies are AMD partners. And two of them are GlobalFoundries partners, and one is the competition :awe:

mrmt · Aug 22, 2015

Idontcare said:
Had they a better estimate of their 14nm yield entitlement in 2013 then they would have either increased R&D so as to ensure a higher entry yield entitlement point come 2014, or they would have done what they just did with 10nm and elected to delay the process node a year or so such that the yield entitlement fell in-line with expectation on the basis of investments up to that point in time (on the then protracted timeline).

Do you think Intel would have solved this problem throwing more money at it? The company is returning money to shareholders like crazy and they were not shy to spend truckloads of money in things like contrarevenue or Altera. Given how critical the nodes are to Intel strategy, and the idea with 10nm was to delay it a bit, I can't see how this would be a matter of more money.

witeken · Aug 22, 2015

Idontcare said:
It is neither.

14nm was planned and executed well. But the phrase "yield entitlement" contains the word "entitlement" for a reason, it merely speaks to the opportunity one is due, not the actuality that comes of it/

At an individual level it is much like choosing to pursue higher education, you do so at a cost-benefits assessment, but the reality is that only the costs are certain while the benefits are a bit of a gamble. The benefits, what you are entitled to expect for your investment and efforts, (get a B.S. and have the opportunity to earn $70k per year!) is more of a maximum possible envelope and not a guaranteed outcome (you've been laid off, go to unemployment and collect your benefits ).

Intel did all they needed to make sure 14nm was on time, healthy, and high yielding. They paid their dues, to be sure. That it hasn't panned out yet doesn't mean they failed, it just means at some point they miscalculated what their entitlement (for their give investment) was worth. They obviously underestimated what it would take in terms of R&D to have the 14nm node they desired to have ready for HVM in fall 2014.

Their yield entitlement estimates were in error, not the efforts their engineers made with the resources they were granted to craft the node some 4 years prior. (I say this based on data from real life interactions, not based on anything I can necessarily link to, so take it as anecdotal or hearsay, but it is a position I am confident in having at this time)

Had they a better estimate of their 14nm yield entitlement in 2013 then they would have either increased R&D so as to ensure a higher entry yield entitlement point come 2014, or they would have done what they just did with 10nm and elected to delay the process node a year or so such that the yield entitlement fell in-line with expectation on the basis of investments up to that point in time (on the then protracted timeline).

With that miscalculation thing you say, I guess you are hinting at what was said at ISSCC?

We were told that Intel has learned that the increase in development complexity of 14nm required more internal testing stages and masking implementations was a major reason for the delay, as well as requiring sufficient yields to go ahead with the launch. As a result, Intel is improving the efficiency testing at each stage and expediting the transfer of wafers with their testing protocols in order to avoid delays. Intel tells us that that their 10nm pilot lines are operating 50% faster than 14nm was as a result of these adjustments.

Intel basically just wants to continue Moores Law like usual, without compromising on the features to get it out in time, and if it takes more than 2 years then too bad.

mrmt · Aug 22, 2015

looncraz said:
AMD will not be paying royalties, motherboard makers will be buying the parts from another company, that's it. That won't hurt AMD at all, really, but it does have some effects: Reduced R&D and the loss of the ability to gain any revenue from those products (assuming AMD isn't RECEIVING royalties - which is more likely than them paying royalties ).

With the south bridge moving to the SoC, I can't imagine ASmedia paying AMD to include their IP on AMD SoCs.

looncraz said:
AMD divorced its chip design from its chip fabrication in order to distance itself from its massive fabrication expenses and losses.

AMD divorced its south bridge business from its cpu business in order to distance itself from the massive R&D expenses and losses.... See the point? AMD dumps an inviable business they used to have to someone else and now must pay this someone else something, but a bad business for AMD is also a bad business to someone else, and ceding it to someone else just transfers the problem, doesn't eliminate it.

We can see this with Globalfoundries, with the new owners having to revamp the entire R&D framework and to do a lot of acquisitions to give the former AMD foundry business the right scale.

With ASMedia, what prompted the spin off was the lack of scale of AMD CPU business and the R&D starvation they are having due to lack of sales, but lack of sales for AMD chipset business for AMD is also lack of sales for ASMedia chipset business as far as AMD chips are the target, so whatever problems AMD was forecasting for their chipset business will still be there with ASMedia.

AtenRa · Aug 22, 2015

mrmt said:
With the south bridge moving to the SoC, I can't imagine ASmedia paying AMD to include their IP on AMD SoCs.

There will be GPP chipsets for the next AMD APUs/CPUs. Those will most probable be the ASmedia designs.

DrMrLordX · Aug 22, 2015

looncraz said:
If GlobalFoundries fails in the coming years (not likely), AMD will just build chips at TSMC or Samsung or IBM or wherever else has the ability to produce their chips. And ALL of those companies are AMD partners. And two of them are GlobalFoundries partners, and one is the competition :awe:

IBM dumped its production fab capacity on GF so it won't be IBM. I think they still have some research facilities (maybe) but that's it.

looncraz · Aug 22, 2015

mrmt said:
With the south bridge moving to the SoC, I can't imagine ASmedia paying AMD to include their IP on AMD SoCs.

I've seen no indication that upcoming Zen products will feature an integrated south bridge. The may do so for a mobile SoC in 2017~18, but I would not expect that from an AM4 platform chip, and certainly not for server chips.

mrmt said:
AMD divorced its south bridge business from its cpu business in order to distance itself from the massive R&D expenses and losses.... See the point? AMD dumps an inviable business they used to have to someone else and now must pay this someone else something, but a bad business for AMD is also a bad business to someone else, and ceding it to someone else just transfers the problem, doesn't eliminate it.

Calling a money-losing venture an enviable business is a bit... interesting.

AsMedia already has the necessary IP, drivers, and OS support. AMD would have to develop and license all of these technologies in an effort to bring AM4 to market with modern features. This would delay the AM4 platform immensely. Instead, we are seeing AsMedia place its existing IP on an AMD-designed system/CPU interface at a much reduced cost to AMD, and a massive boost in business for AsMedia. Win-win.

But, more importantly, is that AsMedia SATA controllers are on par, if not sometimes even better, than Intel's. AMD could not accomplish that in time for, relatively quick, AM4 platform release. That's a win for consumers.

Win-win-win.

mrmt said:
We can see this with Globalfoundries, with the new owners having to revamp the entire R&D framework and to do a lot of acquisitions to give the former AMD foundry business the right scale.

This is an ever-present situation. Even Intel uses third party solutions, products, and technology. Almost every Intel CPU has AMD technology on it, in fact.

mrmt said:
With ASMedia, what prompted the spin off was the lack of scale of AMD CPU business and the R&D starvation they are having due to lack of sales, but lack of sales for AMD chipset business for AMD is also lack of sales for ASMedia chipset business as far as AMD chips are the target, so whatever problems AMD was forecasting for their chipset business will still be there with ASMedia.

AsMedia doesn't need most of the the IP, they need to development rights, the platform rights, and some IP related to system integration (I don't think AsMedia has a few cross bridges and such).

looncraz · Aug 22, 2015

DrMrLordX said:
IBM dumped its production fab capacity on GF so it won't be IBM. I think they still have some research facilities (maybe) but that's it.

Yes, I forgot about that.

Thanks!

mrmt · Aug 22, 2015

looncraz said:
I've seen no indication that upcoming Zen products will feature an integrated south bridge.

Probably not, but still the "high volume" APU parts will need a south bridge, which have to come from ASMedia, and given that we should be talking about a monolithic die, ASMedia has to license IP for AMD, and certainly ASMedia isn't going to give AMD a free lunch.

looncraz said:
AsMedia already has the necessary IP, drivers, and OS support. AMD would have to develop and license all of these technologies in an effort to bring AM4 to market with modern features. This would delay the AM4 platform immensely. Instead, we are seeing AsMedia place its existing IP on an AMD-designed system/CPU interface at a much reduced cost to AMD, and a massive boost in business for AsMedia. Win-win.

You talk as if ASMedia is some charity. ASmedia will either get paid directly by AMD or they will get a chunk of AMD revenue share in the ecosystem. AMD might end up better than they are today, but doesn't change the fact that:

- AMD will have to pay for ASMedia one way or another.

- AMD might end up in a worse financial position in the long run compared to a situation where they could muster enough resources to develop these technology by themselves.

- Given the minuscule volume of AMD processors as of lately, whatever developments made specifically for them will be very hard to pay off because of the small number of units to amortize R&D costs.

looncraz said:
This is an ever-present situation. Even Intel uses third party solutions, products, and technology. Almost every Intel CPU has AMD technology on it, in fact.

Which Intel product has AMD >>technology<< on it?

Magic Carpet · Aug 22, 2015

MiddleOfTheRoad said:
I disagree with that statement. It was the opposite in my opinion. If you owned a 3DFX card, you pretty much were guaranteed that the game was going to run fast and smooth.... A pleasant gaming experience every time. Glide could even run games on gutless Cyrix or Winchip cpu's -- those CPU's always choked trying to run Direct3D or OpenGL.... But glide could get the job done.

Agreed.

ShintaiDK said:
Glide didnt use any less CPU than DirectX or OpenGL.

Yes, it did. There were games with lesser CPU requirements if Glide was used, versus DirectX.

ShintaiDK said:
And Glide quickly became a problem when it couldnt get more than 16bit colours. And DirectX quickly running away from it in performance as well.

The early games ran like crap w/ 32-bit color anyway. Marketing feature it was really. And later, with Voodoo5 32-bit color as well as larger textures were added (you could run any Glide game in 32-bit color, if desired). Sure, DirectX has massively evolved since, but back in the day, Glide was the preferable API to choose from, if one was looking for speed.

Apologize for a bit of the off topic content here.

Full Skylake reveal result? Waiting for Zen.

Diamond Member

Lifer

Senior member

Diamond Member

Elite Member

Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Lifer

Diamond Member

Elite Member

Lifer

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Senior member

Senior member

Diamond Member

Diamond Member