• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

News "Aurora’s Troubles Move Frontier into Pole Exascale Position" - HPCwire

moinmoin

Diamond Member
With Sapphire Rapids delayed Intel's Aurora exascale supercomputer misses yet another date. This means Frontier is now on route of becoming the first exascale supercomputer.

 
This article is dated 8 months ago and Sapphire Rapids isn't on 7 nm.
Ponte Vecchio is supposed to be on Intel's 7nm. I'll admit that I missed the news last year that Aurora is confirmed to fall back this much.

 
PV is TSMC N6, no? Was that supposed to be fabbed inside originally? When did Intel switch it to external?
Maybe one can summarize PV as a magical mess? 😉

Raja teased that there are 7 advanced technologies at play here, and by our calculation, these would be:
  • Intel 7nm
  • TSMC 7nm
  • Foveros 3D Packaging
  • EMIB
  • 10nm Enhanced Super Fin
  • Rambo Cache
  • HBM2
Following is how Intel gets to 47 tiles on the Ponte Vecchio chip:
  • 16 Xe HPC (internal/external)
  • 8 Rambo (internal)
  • 2 Xe Base (internal)
  • 11 EMIB (internal)
  • 2 Xe Link (external)
  • 8 HBM (external)
 

Argonne has gone ahead and bought a new modest AMD+nVidia cluster. Should be online by early next year. Don't think this means that Aurora is getting cancelled but only because Intel is going to miss the deadline now and will have to give it to them for free (?)

There s some odd thing here :

Powerful compute to improve modeling, simulation and data-intensive workflows using 560 2nd and 3rd Gen AMD EPYC™ processors
https://www.amd.com/en/products/epyc
 

Argonne has gone ahead and bought a new modest AMD+nVidia cluster. Should be online by early next year. Don't think this means that Aurora is getting cancelled but only because Intel is going to miss the deadline now and will have to give it to them for free (?)

Intel pre-announced a 300m charge against Q4 earnings, which most people interpret as some penalty payment for missing yet another Aurora deadline.

So, DoE has some pocket change to go on a shopping spree...
 
Guess they aren't picky on Rome vs Milan. The focus is really on the GPUs.

I think the focus is on speed of delivery. Apparently, Intel could not deliver even a modest system using engineering samples to get the programmers start working, so DoE went elswhere, to get the system delivered on (one of the) old Aurora deadline - end of 2021.

This article says that DoE will make a slight upgrade, swapping out Rome for Milan (X?) and the interconnect in March 2022:
Argonne’s 44-Petaflops ‘Polaris’ Supercomputer Will Be Testbed for Aurora, Exascale Era (hpcwire.com)
 
Last edited:
Ponte Vecchio has chiplets that could wind up being from the following:

Intel 10SFE/Intel7
Intel 7nm/Intel4
TSMC N6

Compute chiplet is on TSMC N5 and the link tile is on TSMC N7, nothing on Intel 4, at least according to their architecture day presentation.
 
^^^ It was originally on Intel 7 nm. Then they said it was a dual source, and now it's just TSMC.
I have something even better for you. Whenever intel's 7nm progress was doubted, Aurora was the stomping card that to counter any doubts. "They have a contract"

Yeah right... Suppose it served them well, seeing as they will now get a couple of billions of taxpayer dollars as well, just to reassure they have no real obligations to keep government deadlines or tell the truth in quarterly reports.
 
I have something even better for you. Whenever intel's 7nm progress was doubted, Aurora was the stomping card that to counter any doubts. "They have a contract"

Yeah right... Suppose it served them well, seeing as they will now get taxpayer money as well!

Kind of makes you wonder why they didn't move to TSMC sooner to avoid paying the 300 mill penalty. That 300 mill is likely what's bankrolling this new cluster.
 
Kind of makes you wonder why they didn't move to TSMC sooner to avoid paying the 300 mill penalty. That 300 mill is likely what's bankrolling this new cluster.
Pay a 300 mill fine here, get 20 times as much incentive there 🙂 they know how to play the "free market" game very well 🤣🤣🤣
 
That's the crazy thing - how can the Govt trust that Intel's fabs will deliver?

Intel's fabs are guaranteed to deliver. Other than a few cases like Aurora the government is not looking for leading edge processes, and Intel is not suddenly going to become unable to deliver 14nm or older chips.
 
That's the crazy thing - how can the Govt trust that Intel's fabs will deliver?
I don't think they looks at it that way, during the HotChips33 talks the DoE speaker mentioned the RoI from the Govt point of view is different from the typical companies.
For example during the Bulldozer days, AMD was awarded with Fast Forward 1/2, Path Forward, Design Forward etc. From enthusiast point of view we would have thought it would be crazy to give such contracts to AMD who was failing bad vs Intel, but today AMD (with HPE) looks like they will deliver the first Exascale system for the US.
 
Intel's fabs are guaranteed to deliver. Other than a few cases like Aurora the government is not looking for leading edge processes, and Intel is not suddenly going to become unable to deliver 14nm or older chips.

This is totally about leading edge. The article mentions 18A as the Govt's target.
 
This is totally about leading edge. The article mentions 18A as the Govt's target.

That may be their target for another Aurora like supercomputer. The bulk of the chips they buy are far from leading edge. They will buy a tiny fraction of Intel's leading edge output.
 
That may be their target for another Aurora like supercomputer. The bulk of the chips they buy are far from leading edge. They will buy a tiny fraction of Intel's leading edge output.

The funding Intel got was intended for theoretical leading edge nodes. There was nothing mentioned about old nodes.
 
Back
Top