K6 was utter trash in games even with a Geforce 2 Ti, particularly in UT99 compared to a stock clocked Celeron-As. Not to mention general stability where Intel wins by a mile.
I actually meant K6-3
K6 was utter trash in games even with a Geforce 2 Ti, particularly in UT99 compared to a stock clocked Celeron-As. Not to mention general stability where Intel wins by a mile.
Very true, 3Dnow brought the gaming performance of the K6-2 to basically a dead heat with the Pentium III clock for clock.
http://www.anandtech.com/show/160/10
Technically it isn't the most fair comparison in terms of the underlying architectures by giving the K6 SIMD support while the the Pentium III was running X87 code, but SSE was a year later to market. While Nvidia never really took advantage of 3dnow, 3DFX sure did and a Voodoo II equipped K6-2 was a very competent machine.
Windows scheduler performance has been fixed greatly in windows 8 and 10. Windows 7 likes to bounce threads around which causes minor performance losses of cache needs to be flushed and rewritten. This behaviour can be seen on Intel CPUs as well (though intel does a better job of masking this problem).
You are referring to stuff like this.
[snip]
There were barely any games that used 3Dnow (I don't recall any). The problem with the K6 FPU was that the shortest instruction took 2 cycles (FADD, FMUL, etc., even the 3Dnow ones) whereas P6 was effectively 1 cycle because it was pipelined. Games back then were way more dependant on yhe FPU than they are now.
What's considered a good yield? Is 20% considered good, or is it normally considered bad but was good in this case?
My son works in the industry as a designer. Please let us know what AAA title you were a developer on and I'll have him provide confirmation.
I'm calling shens on him being a AAA Windows game dev.
If TheProgrammer provides us with details (he won't) my son could reach out to the PM for the game with a sample of his posts and ask if the writing style is anything like somebody that was on that particular project.
Barring that, he's just trolling. It's time for him to put-up or shut-up.
Only then most tech websites were neutral. Today most tech websites have been bought out by Intel and Nvidia.
Skylake is 2.3% faster than Broadwell and Anandtech among other sites still gives it a rave review. Never in a million years would respected Mr. Anand Lal Shimpi would give a new processor generation a recommended rating if it was only 2%faster than previous generation.
^ that
Depends on many things.
If you are capacity constrained, good yield is in the 90%'s because you can sell all the chips you can yield. But in an under-utilized situation (fab loadings at say 50% or 70%) then having 95% yield versus 85% makes almost no difference, you just end up running fewer wafers through the fab (at marginally lower materials expenses but same depreciation and staffing costs) since you can't sell all the chips you would otherwise be making if you were running at full capacity.
In the parlance of fab speak though, we talk about "yield entitlement", which is the yield you (as a business/company) are entitled to attain on the basis of the investments you have made into developing/optimizing/maintaining a certain process node and accompanying fab equipment.
For example, lets say you decide to spend less on your air handlers or filtration equipment in the fab. You save ongoing expenses, reducing the filtration quality or some such, and in exchange you acknowledge and expect your D0 (base defectivity level) to rise by a certain (and predictable) amount, thus lowering your yield entitlement by a certain (and predictable) amount. Its a cost/opportunity trade off.
No different when going in the other direction, the only reason modern fabs have the D0 they have, and not better D0 values (thus enabling better yield entitlement), is because the numbers were crunched and it was determined that the cost-benefits of targeting the existing levels of D0 (while accepting the fact that yields will be lower because of it) made financial sense.
This is a lot of words but what I am attempting to do is speak to the fact that the answer to your question is that it depends on the cost-benefits structure the particular fab owner (IDM/foundry) is targeting, as well as fab utilization rates.
Yield entitlement calcs are imperative though in establishing milestone objectives and project management stages. Spend a lot of money (as Intel did with 14nm) and come in below yield entitlement and you have a problem on your hands (poor project management and planning), but spend a little money (as say UMC did with 28nm HKMG development) and come in above yield entitlement and you have the hallmarks of a well managed development program and fab environment.
Anyways, for us at the time, 20% was well above yield entitlement. We did not make the monetary investments necessary to ensure that we would have D0 values that would enable us to hit 20% yields for the die sizes we were fabbing for SUN at the time. So when we hit those yields it was no surprise to fab planning that SUN asked us to reduce wafer starts, they had done their wafer start modeling on the basis of yield entitlement numbers and we did no one any favors by coming in with better yields. Just meant the fab utilization numbers declined (decreasing costs by maybe 1% from a materials standpoint, but all the fixed costs remain unchanged).
Actually...if you look at the DX12 benchmarks available, clockspeed is king and a 6700k beats a 5960x. More cores won't help, especially not AMD cores.
Man, those were awful days: Glide, OpenGL, miniGL. Everyone wrote for Glide and everyone was too lazy to write a full OpenGL driver so they just made a small subset that Quake used.
The R&D reduction is kind of always misleading. AMD sold >80% of their analog/IO group to Synopsys, and outsourced their south bridge to ASmedia. It's about 20% or more of AMD's R&D workforce.
Nice post as usual, but not sure if 14nm is poor project management and planning. Intel is no amateur.
Yeah, just like Nvidia cards are going to be way faster than Radeons under DirectX 12..... Right? (sarcasm)
Blanket statements always backfire.......
For the front-end, we lose one cycle, minimum, no matter what, just in the need to determine to which thread an instruction belongs, this stage, IIRC, is part of the dispatch control logic. The next point when we lose a cycle is with the FPU scheduler (same reason, mostly) and also SIMD... however, this cost is repeated on instruction retirement as the FPU has to determine which core's load/store unit has the solemn duty of handling the computational result.
For integer loads, the front-end has an extra one-cycle penalty that is always present, but a minimum of three cycles for FPU or SIMD instructions. For latency-sensitive loads, this can be harmful. And all that is assuming AMD managed to keep things into the best-possible design (which they probably did - if I can think of how they could do it, they probably did it :thumbsup.
However, we know the dispatch controller gets backed up and is paired with a resource monitor that is doing double-duty, which probably means there is another lost cycle in here somewhere, otherwise these CPUs could not clock as well.
We also know that the branch prediction is where AMD has made most of its improvements, as misses are more costly with the module design due to the extra cycles involved, so those few extra cycles can really start to add up when a miss occurs.
Our next issue with performance, that is specifically limited to the module design, is the shared L2 and the write coalescing cache. If the L2 was not shared, there'd be no need for concurrency during cache accesses. When a core needs data for an instruction it must utilize the CIU (forget what it's called, too lazy to look it up - almost bed time). This CIU has the task of enforcing read and write concurrency with the L2 and the L3/MC. This adds at a minimum one cycle just for concurrency concerns... for reads. For writes, however, we go through the WCC, which will add, at least, one more cycle.
All of the aforementioned factors are present regardless of whether or not another thread is executing. However, I'll run through what happens for different workloads, just for fun :wub:
When only one thread is executing, we have a 5~8 cycle overhead versus a non-module design, which will, often enough, be repeated due to branch prediction failures, data stalls, or interrupts resulting in context switches and pipeline flushes (this last thing is the only thing Windows 8's scheduler really improved upon). When the other thread in a module is receiving just a 'HLT' instruction, we have full access of the front-end for one thread until something is scheduled for the other thread, so the front-end fills up the pipeline completely as prefetch and branch prediction work together to fill the L2 with impunity while attempting to keep the execution units fully utilized. When a branch prediction misses its mark, its entire chain is flushed from the pipeline (a simple operation), and that part of the pipeline simply goes dormant until it is filled up again as the normal progression of the chip's operation continues. Each time that happens, of course, it destroys a different amount of in-flight instructions and related data, so it multiplies that extra 5~8 cycles by some variable amount.
Now, an interrupt wakes up the second thread. The incoming instructions are fed into the pipeline and data is fetched as needed, and the pipeline can become unevenly divided, but tends towards equal division. It goes through the same front-end and introduces few additional damage to the other execution thread at this point, except for fewer predictive executions the front-end can still pretty much keep the execution units fed, so there is very little loss at this point from executing two threads in the module versus one thread.
The primary losses when running two threads on the module come from cache contention and a reduction in branch predictions tried during stalls. AMD profiled their design to determine where the greatest stall would be in this stage, naturally, and implement the WCC so that stalls when writing results to the L2 (through the CIU) are less prone to stuffing up the execution pipeline (since the instructions cannot be retired until the result is written, naturally). The cache latency under contention is the greater evil when two threads are both taxing the module, but the added pipeline complexity doesn't help - and that difference is further compounded by a reduction in the amount of the front-end and branch prediction capabilities that can be dedicated to one thread.
So, in conclusion, I hope you can see how the module design can so negatively effect single-threaded performance, even if the other thread is mostly, or even entirely, dormant. Even though this might contradict what many think, this is the simple truth. Even an optimal module design adds stages to the pipeline, which is always bad when it can't be coupled with a significant boost in clock rate or sufficient reduction in branch prediction misses.
When AMD dumps the module design, they will lose 5~8 cycle latency cost for every single instruction going through the CPU, and each time a branch prediction fails they will have a 5~8 cycle reduction in cost.
That much of a reduction in latency in the pipeline should be worth about 3% overall, though some workloads won't care at all - none will be hurt by this improvement. The massively reduced branch prediction failure penalty, however, is universally beneficial. It is estimated that Bulldozer's misprediction penalty is about 20 cycles, which seems about right in line with my understanding of the architecture. Reducing that penalty back to phenom II's 12 cycles would do wonders, especially with all of the improvements AMD has made to their branch prediction unit that has to deal with this higher penalty.
EDIT:
Also, it should be noted that a considerable amount of Sandy Bridge's IPC improvement over Nehalem was from a variable 3 cycle reduction in the cache misprediction penalty.
I decided to look it up ;-)
http://www.anandtech.com/show/5057/the-bulldozer-aftermath-delving-even-deeper/2
No, it's not. That means AMD will have to pay royalties (included on COGS) during the entire lifetime of the product and not during the R&D stages only as expense. AMD might end up with an even worse cost structure than they have before, and the company only does this because they have to save cash whatever the cost.
Think of it as the same situation of their fabs. Selling the fabs to Globalfoundries took out a lot of debt and expenses out of AMD balance sheet, but certainly didn't solve the problem of not having profitable fabs, and the consequences are here for everyone to see.
Nonsense, all of it.
AMD divorced its chip design from its chip fabrication in order to distance itself from its massive fabrication expenses and losses. They are no longer concerned if the fabs are profitable or not, since they don't have any. If GlobalFoundries fails in the coming years (not likely), AMD will just build chips at TSMC or Samsung or IBM or wherever else has the ability to produce their chips. And ALL of those companies are AMD partners. And two of them are GlobalFoundries partners, and one is the competition :awe:
Had they a better estimate of their 14nm yield entitlement in 2013 then they would have either increased R&D so as to ensure a higher entry yield entitlement point come 2014, or they would have done what they just did with 10nm and elected to delay the process node a year or so such that the yield entitlement fell in-line with expectation on the basis of investments up to that point in time (on the then protracted timeline).
With that miscalculation thing you say, I guess you are hinting at what was said at ISSCC?It is neither.
14nm was planned and executed well. But the phrase "yield entitlement" contains the word "entitlement" for a reason, it merely speaks to the opportunity one is due, not the actuality that comes of it/
At an individual level it is much like choosing to pursue higher education, you do so at a cost-benefits assessment, but the reality is that only the costs are certain while the benefits are a bit of a gamble. The benefits, what you are entitled to expect for your investment and efforts, (get a B.S. and have the opportunity to earn $70k per year!) is more of a maximum possible envelope and not a guaranteed outcome (you've been laid off, go to unemployment and collect your benefits).
Intel did all they needed to make sure 14nm was on time, healthy, and high yielding. They paid their dues, to be sure. That it hasn't panned out yet doesn't mean they failed, it just means at some point they miscalculated what their entitlement (for their give investment) was worth. They obviously underestimated what it would take in terms of R&D to have the 14nm node they desired to have ready for HVM in fall 2014.
Their yield entitlement estimates were in error, not the efforts their engineers made with the resources they were granted to craft the node some 4 years prior. (I say this based on data from real life interactions, not based on anything I can necessarily link to, so take it as anecdotal or hearsay, but it is a position I am confident in having at this time)
Had they a better estimate of their 14nm yield entitlement in 2013 then they would have either increased R&D so as to ensure a higher entry yield entitlement point come 2014, or they would have done what they just did with 10nm and elected to delay the process node a year or so such that the yield entitlement fell in-line with expectation on the basis of investments up to that point in time (on the then protracted timeline).
We were told that Intel has learned that the increase in development complexity of 14nm required more internal testing stages and masking implementations was a major reason for the delay, as well as requiring sufficient yields to go ahead with the launch. As a result, Intel is improving the efficiency testing at each stage and expediting the transfer of wafers with their testing protocols in order to avoid delays. Intel tells us that that their 10nm pilot lines are operating 50% faster than 14nm was as a result of these adjustments.
AMD will not be paying royalties, motherboard makers will be buying the parts from another company, that's it. That won't hurt AMD at all, really, but it does have some effects: Reduced R&D and the loss of the ability to gain any revenue from those products (assuming AMD isn't RECEIVING royalties - which is more likely than them paying royalties).
AMD divorced its chip design from its chip fabrication in order to distance itself from its massive fabrication expenses and losses.
With the south bridge moving to the SoC, I can't imagine ASmedia paying AMD to include their IP on AMD SoCs.
If GlobalFoundries fails in the coming years (not likely), AMD will just build chips at TSMC or Samsung or IBM or wherever else has the ability to produce their chips. And ALL of those companies are AMD partners. And two of them are GlobalFoundries partners, and one is the competition :awe:
With the south bridge moving to the SoC, I can't imagine ASmedia paying AMD to include their IP on AMD SoCs.
AMD divorced its south bridge business from its cpu business in order to distance itself from the massive R&D expenses and losses.... See the point? AMD dumps an inviable business they used to have to someone else and now must pay this someone else something, but a bad business for AMD is also a bad business to someone else, and ceding it to someone else just transfers the problem, doesn't eliminate it.
We can see this with Globalfoundries, with the new owners having to revamp the entire R&D framework and to do a lot of acquisitions to give the former AMD foundry business the right scale.
With ASMedia, what prompted the spin off was the lack of scale of AMD CPU business and the R&D starvation they are having due to lack of sales, but lack of sales for AMD chipset business for AMD is also lack of sales for ASMedia chipset business as far as AMD chips are the target, so whatever problems AMD was forecasting for their chipset business will still be there with ASMedia.
IBM dumped its production fab capacity on GF so it won't be IBM. I think they still have some research facilities (maybe) but that's it.
I've seen no indication that upcoming Zen products will feature an integrated south bridge.
AsMedia already has the necessary IP, drivers, and OS support. AMD would have to develop and license all of these technologies in an effort to bring AM4 to market with modern features. This would delay the AM4 platform immensely. Instead, we are seeing AsMedia place its existing IP on an AMD-designed system/CPU interface at a much reduced cost to AMD, and a massive boost in business for AsMedia. Win-win.
This is an ever-present situation. Even Intel uses third party solutions, products, and technology. Almost every Intel CPU has AMD technology on it, in fact.
Agreed.I disagree with that statement. It was the opposite in my opinion. If you owned a 3DFX card, you pretty much were guaranteed that the game was going to run fast and smooth.... A pleasant gaming experience every time. Glide could even run games on gutless Cyrix or Winchip cpu's -- those CPU's always choked trying to run Direct3D or OpenGL.... But glide could get the job done.
Yes, it did. There were games with lesser CPU requirements if Glide was used, versus DirectX.Glide didnt use any less CPU than DirectX or OpenGL.
The early games ran like crap w/ 32-bit color anyway. Marketing feature it was really. And later, with Voodoo5 32-bit color as well as larger textures were added (you could run any Glide game in 32-bit color, if desired). Sure, DirectX has massively evolved since, but back in the day, Glide was the preferable API to choose from, if one was looking for speed.And Glide quickly became a problem when it couldnt get more than 16bit colours. And DirectX quickly running away from it in performance as well.
