Discussion [Anandtech] Using AI to Build Better Processors: Google Was Just the Start, Says Synopsys

Gideon · Jun 30, 2021

Anandtech has had a great article on the front page for a while I don't see discussed here:

Using AI to Build Better Processors: Google Was Just the Start, Says Synopsys

www.anandtech.com

As Synopsys is the company providing tools to AMD (and AMD themselves say their partnership was essential to deliver Zen 2 as well as they did) I'm really looking forward to see these advancements at play also while designing CPUs.

The most relevant quote (though I really suggest to read the entire article):

In this graph, we are plotting power against wire delay. The best way to look at this graph is to start at the labeled point at the top, which says Start Point.

Start Point, where a basic quick layout is achieved

Customer Target, what the customer would be happy with

Best Human Effort, where humans get to after several months

Best DSO result (untrained), where AI can get to in just 24 hours

All of the small blue points indicate one full AI sweep of placing the blocks in the design. Over 24 hours, the resources in this test showcase over 100 different results, with the machine learning algorithm understanding what goes where with each iteration. The end result is something well beyond what the customer requires, giving them a better product.

There is a fifth point here that isn't labeled, and that is the purple dots that represent even better results. This comes from the DSO algorithm on a pre-trained network specifically for this purpose. The benefit here is that in the right circumstances, even a better result can be achieved. But even then, an untrained network can get almost to that point as well, indicated by the best untrained DSO result.

Synopsys has already made some disclosures with customers, such as Samsung. Across four design projects, time to design optimization was reduced by 86%, from a month do days, using up to 80% fewer resources and often beating human-led design targets.

I could see this shortening the design cycle for a CPU by several months with huge wins in perf/watt and in the long run radically change how we build chips. Well worth a separate discussion thread IMO.

BillClo1 · Jun 30, 2021

Its a win-win for the companies. Faster development of chips, and they get to fire the development teams because the AI works nonstop 24/7 for nothing but electricity.

Gideon · Jun 30, 2021

BillClo1 said:
Its a win-win for the companies. Faster development of chips, and they get to fire the development teams because the AI works nonstop 24/7 for nothing but electricity.

These are still tools not Terminator T-800's. Engineers will still be required (and probably the better / more experienced ones) just less of them for a given project.

Think more like an automated McDonald's with a single shift manager and a guy in the Drive-through rather than all the current employees.

ThatBuzzkiller · Jun 30, 2021

Last time "automated tools" didn't end so well for Bulldozer according to one of the former AMD engineer so it's highly doubtful that they've now become helpful in the most important cases ...

Hand optimized designs are still all the rage when we take a look at others like Apple and admittedly Intel as well ...

Gideon · Jun 30, 2021

ThatBuzzkiller said:
Last time "automated tools" didn't end so well for Bulldozer according to one of the former AMD engineer so it's highly doubtful that they've now become helpful in the most important cases ...

Hand optimized designs are still all the rage when we take a look at others like Apple and admittedly Intel as well ...

Wait, what? Bulldozer had many fails but automation was never part of it.

In fact the Bobcat/Jaguar cores were automated rather than hand-placed designs with great success.

Also when using automated high-density libraries bulldozer's FPU shrunk was considerably shrunk (it think Excavator was the first one to use it):

The power savings comes from not having to route clocks and signals as far, while the area savings are a result of the computer automated transistor placement/routing and higher density gate/logic libraries.

The tradeoff is peak frequency. These heavily automated designs won’t be able to clock as high as the older hand drawn designs. AMD believes the sacrifice is worth it however because in power constrained environments (e.g. a notebook) you won’t hit max frequency regardless, and you’ll instead see a 15 - 30% energy reduction per operation. AMD equates this with the power savings you’d get from a full process node improvement.

Carrizo slides show it even better:

Zen took it much further and the entire Zen family uses sythesized blocks much more extensivley but that has nothing to do with what the article above is talking about, that's on a different level (and has nothing to do with Layout). The AI tools won't replace engineers, they will just start working on a different abstraction level for some (not all!) problems.

Jim Keller also thinks it's coming very fast (see "Chips Made by AI, and Beyond Silicon")

Gideon · Jun 30, 2021

ThatBuzzkiller said:
Last time "automated tools" didn't end so well for Bulldozer according to one of the former AMD engineer

Ok, it seems you're referencing a fake, pure fiction story by an "Ex AMD Engineer". Here is what an actual engineer has to say about it 4 years ago.

I have work experience at both companies, and can assure you that both companies still do physical implementation of most of their high-performance CPUs "by hand". In particular, the second reference you listed (allegedly by a former AMD engineer) is almost pure fiction and has been widely discredited. Huge chunks of the Bulldozer core were hand drawn by very experienced and talented engineers; and the perceived performance problems of this core have little, if any, to do with the method of implementation used in its design.

Note that I specifically said "high performance CPUs". I am not at liberty to talk about how the Atom cores are implemented, but AMD's lower performance "Cat cores" (Bobcat / Jaguar etc.) were mostly synthesized; with the exception of some performance-sensitive paths and analog/mixed signal circuitry. The percentage of blocks generated through synthesis was certainly much higher than that in a high-performance core such as Bulldozer.

The fact that initial 2 Bulldozer designs were hand-placed is well known and Anandtech has covered it extensively. I've somehow missed that fake story you referenced, probably as I disregarded it as pure nonsense the second it was posted.

Synthesized blocks are wildly used and here to stay. The articile mentions automation done on a totally different level.

Atari2600 · Jun 30, 2021

Automated tools are only as good as the people using them.

If you don't understand how it got its results - then you are likely going to make a serious mistake in implementation as something will invariably have been concluded outside of the automated tool's operating window or beyond its inter-relationship understanding.

Doug S · Jun 30, 2021

Chip design has always evolved to greater automation over time. Engineers used to lay out individual gates and hand routed every single wire in the 4004 and 6502 days. As chips became more complex you saw the rise of stuff like VHDL, synthesized blocks, place and route software, etc. This is a logical step along that five decade path.

For all we know Google wasn't even the first to do this, just first to announce. Its quite possible one or more of Intel, AMD and Apple were already doing something similar, since they wouldn't have much incentive to make a press release about it. Intel & AMD because wouldn't want to give any hints to their rival, and Apple because they just like secrecy.

ThatBuzzkiller · Jun 30, 2021

Gideon said:
Ok, it seems you're referencing a fake, pure fiction story by an "Ex AMD Engineer". Here is what an actual engineer has to say about it 4 years ago.

Nope, he's real ...

Gideon said:
The fact that initial 2 Bulldozer designs were hand-placed is well known and Anandtech has covered it extensively. I've somehow missed that fake story you referenced, probably as I disregarded it as pure nonsense the second it was posted.

Synthesized blocks are wildly used and here to stay. The articile mentions automation done on a totally different level.

I don't believe Cliff personally worked worked on the Bulldozer architecture himself, I think one of his colleagues who was on another team was responsible for that. At the time it sounds like shortly just after the acquisition of ATI, there was a meteoric rise in synthesized blocks ...

For Bulldozer being a "hand optimized" design, it sure is a total dumpster fire that doesn't show it when compared to Nehalem which achieved higher performance on an older process technology, smaller die size, and at a lower power consumption ...

Automated tools still can't give everyone a level playing field when we look at the current state of the industry given the persistent performance or efficiency disparities between the different players. Hand optimizing designs is still going to remain as a pivotal competitive advantage ...

Gideon · Jul 23, 2021

And cadence as well:

Cadence Cerebrus to Enable Chip Design with ML: PPA Optimization in Hours, not Months

www.anandtech.com

Doug S · Jul 23, 2021

So like I mused above Google was not the first to do this, only the first to announce. Cadence has obviously been working on a long time to take this all the way to finished product. It would have been pretty easy for Google to have heard about Cadence's upcoming product, since there would have been customers utilizing beta versions of Cerebus for a long time under NDA. I doubt the timing of Google's announcement a few weeks earlier than Cadence was a coincidence...

MadRat · Jul 31, 2021

It looks like the AI creates round areas when it optimizes, which makes sense. As you work out from any one point your area grows rapidly.

Couldn't memory blocks be optimized for performance by using the same methods?

maddie · Jul 31, 2021

MadRat said:
It looks like the AI creates round areas when it optimizes, which makes sense. As you work out from any one point your area grows rapidly.

Couldn't memory blocks be optimized for performance by using the same methods?

Check out pictures of cultured petri dishes and a die shot of machine developed logic circuitry.

MadRat · Aug 1, 2021

I did some searching but its not an easy task to find what you described.

maddie · Aug 1, 2021

MadRat said:
I did some searching but its not an easy task to find what you described.

Try these. The CCX logic areas always made "see" a fungal growth. Probably similar math rules governing the organization.

Search

Discussion [Anandtech] Using AI to Build Better Processors: Google Was Just the Start, Says Synopsys

Gideon

Golden Member

Using AI to Build Better Processors: Google Was Just the Start, Says Synopsys

BillClo1

Junior Member

Gideon

Golden Member

ThatBuzzkiller

Golden Member

Gideon

Golden Member

Gideon

Golden Member

Atari2600

Golden Member

Doug S

Platinum Member

ThatBuzzkiller

Golden Member

Gideon

Golden Member

Cadence Cerebrus to Enable Chip Design with ML: PPA Optimization in Hours, not Months

Doug S

Platinum Member

MadRat

Lifer

maddie

Diamond Member

MadRat

Lifer

maddie

Diamond Member

Attachments

TRENDING THREADS