Discussion Quo vadis Apple Macs - Intel, AMD and/or ARM CPUs? ARM it is!

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SarahKerrigan

Member
Oct 12, 2014
187
173
116
Ok. So code written for ARM Fujitsu A64FX with 512-bit SVE using 512-bit registers will not run at Matterhorn with 256-bit SVE2? Another one, 2048-bit SVE2 can be scaled with 128-bit increment so there are 16 versions of SVE2 length (128 * 16 = 2048)...... this looks to me very messy. This would also mean that you cannot pair Little cores with bigger cores with different SIMD widths.

I think this solves Vector Length Agnostic instructions (VLA):
  • Vectors cannot be initialised from compile-time constant in memory, so...•INDEX Zd.S,#1,#4: Zd= [ 1, 5, 9, 13, 17, 21, 25, 29 ]
  • Predicates also cannot be initialised from memory, so...•PTRUE Pd.S,MUL3: Pd= [ T, T, T , T, T, T , F, F ]
  • Vector loop increment and trip count are unknown at compile-time, so...•INCD Xi: increment scalar Xiby # of 64-bit dwordsin vector •WHILELT Pd.D,Xi,Xe: next iteration predicate Pd= [ while i++ < e ]
  • Vector register spill & fill must adjust to vector length, so...•ADDVL SP,SP,#-4: decrement stack pointer by (4*VL)•STR Z1,[SP,#3,MUL VL]: store vector Z1to address (SP+ 3*VL)
https://indico.math.cnrs.fr/event/4705/attachments/2362/2940/ARM_SVE_tutorial.pdf
page 8.


As far as I understand VLA it will able to chop whatever long vector into lengths suitable for local SIMD unit to able to process it regardless its HW width. So little cores with slow 128-bit SIMD FPU can be paired with 4x256-bit SIMD FPUs thanks to VLA. Or do you see VLA function in different way?
Having actually WRITTEN code for SVE, I can answer this one easily enough: SVE is vector-length-agnostic and therefore compatible across implementations. You seem to be imagining it as "you run a 2048b instruction and it simply takes 16 cycles" - which is how it would work on a traditional vector machine (SX, Cray X1/X2, etc.) That is NOT how SVE works. Instead, you essentially loop based on interrogated vector length; if you want to add all elements in a 32-element, 1024-bit vector, for instance, you get the number of elements your hardware supports, do the add operation, then subtract the number of elements computed from 32 for each iteration you run. So in a 128b machine, you end up running an inner loop for eight iterations, while on a 1024b machine the inner loop only runs for one iteration. (This is a slight oversimplification, but roll with it.)

In no case here is anything 2048b. There's not 2048b registers. There's not an "add 2048b' instruction. There's just an "add vector" instruction of machine vector length, and a way to use it intelligently to compose code streams that scale to longer vector lengths. The only thing that's 2048b is the maximum length hardware can support.

As for big.little, my expectation would be that SVE implementations within a core cluster must be consistent but I would be interested to see ways around that.
 
Last edited:

Stuka87

Diamond Member
Dec 10, 2010
4,761
500
126
These GPUs also don't have to support a ton of APIs. That works well with Apple that is vertically integrated. They need to run metal API and that is basically it. No 10 different versions of DX and OPenGL and vulcan and so forth. Plus the mentioned Apple GPU has video encode/decode in a seprate block where as in case of AMD or Intel they are part of the GPU and in these small gpus, a very significant part. You would have to count the size of the CUs only.
To be clear, Metal has been around for quite a long time now, and there have been continual updates to it. Its not like there is only one version of it. Each version of MacOS and iOS have improvements on it. When a new version of iOS comes out, devs often have to update things in the games. But since iOS devices are all almost always updated, they typically only have to target a few versions. On the Mac side, this is not the case.
 

Richie Rich

Senior member
Jul 28, 2019
421
196
76
So in a 128b machine, you end up running an inner loop for eight iterations, while on a 1024b machine the inner loop only runs for one iteration.

In no case here is anything 2048b. There's not 2048b registers. There's not an "add 2048b' instruction. There's just an "add vector" instruction of machine vector length, and a way to use it intelligently to compose code streams that scale to longer vector lengths. The only thing that's 2048b is the maximum length hardware can support.

As for big.little, my expectation would be that SVE implementations within a core cluster must be consistent but I would be interested to see ways around that.
Thanks for your answer. So it's even better than I thought. SVE2 via VLA supports vectors of unlimited length and 2048-bit is just HW limitation. Is that correct?
 

SarahKerrigan

Member
Oct 12, 2014
187
173
116
Thanks for your answer. So it's even better than I thought. SVE2 via VLA supports vectors of unlimited length and 2048-bit is just HW limitation. Is that correct?
The VLA concept can scale to whatever; that's the entire idea (although extreme vector lengths get diminishing returns, which is why general-purpose cores don't generally bother). SVE/SVE2 specifically is limited to 2048b implementations and extending it would require modification. Either way, there is absolutely zero "2048-bit SVE" in 256b SVE implementations. No 2048b ops, no 2048b registers, no 2048b anything. And honestly, you don't want it to be 2048b SIMD, because the spill costs of that would be painful; that means that any function call that touches the lower v-regs would have to spill 16kbit, even on a machine with only 128b functional units!

SVE was designed specifically to avoid the issues associated with ultra-wide vector machines, and is arguably a better solution. But a random mainstream SVE implementation is not "2048-bit SVE" in any way, so I'd suggest you stop claiming it is. SVE's well-designed and generally likable, even without having to make up exaggerated claims about it.
 
  • Like
Reactions: CHADBOGA

Richie Rich

Senior member
Jul 28, 2019
421
196
76
The VLA concept can scale to whatever; that's the entire idea (although extreme vector lengths get diminishing returns, which is why general-purpose cores don't generally bother). SVE/SVE2 specifically is limited to 2048b implementations and extending it would require modification. Either way, there is absolutely zero "2048-bit SVE" in 256b SVE implementations. No 2048b ops, no 2048b registers, no 2048b anything. And honestly, you don't want it to be 2048b SIMD, because the spill costs of that would be painful; that means that any function call that touches the lower v-regs would have to spill 16kbit, even on a machine with only 128b functional units!

SVE was designed specifically to avoid the issues associated with ultra-wide vector machines, and is arguably a better solution. But a random mainstream SVE implementation is not "2048-bit SVE" in any way, so I'd suggest you stop claiming it is. SVE's well-designed and generally likable, even without having to make up exaggerated claims about it.
Thanks a lot for clarification this SVE topic. I think here is not many guys that written SVE code. I guess you have access to Fujitsu A64FX HW, have you? Originally I thought the max length is 2048-bit and HW can break it into smaller pieces according to FPU width (something Bulldozer did with AVX, breaking 1x 256-bit AVX for 2x128-bit SIMD/FPU). Now I see the SVE is much advanced approach, that's very impressive.

So comparing x86 AVX and ARM SVE2. There is no need for Matterhorn to go 256-bit wide SIMD/FPUs as they can stay at 128-bit what is better for Little core A55-like using 1x128-bit FPU. Big A79 core can have 2x128-bit FPUs and bigger X2 can just simply add another FPU units to 4x128-bit or even 6x128-bit. With VLA SVE2 they can hook up and utilize higher number of SIMD units than normally would be possible/effective. So they can further increase core width what is the main condition for IPC increase. Pretty smart.
 

SarahKerrigan

Member
Oct 12, 2014
187
173
116
Thanks a lot for clarification this SVE topic. I think here is not many guys that written SVE code. I guess you have access to Fujitsu A64FX HW, have you? Originally I thought the max length is 2048-bit and HW can break it into smaller pieces according to FPU width (something Bulldozer did with AVX, breaking 1x 256-bit AVX for 2x128-bit SIMD/FPU). Now I see the SVE is much advanced approach, that's very impressive.

So comparing x86 AVX and ARM SVE2. There is no need for Matterhorn to go 256-bit wide SIMD/FPUs as they can stay at 128-bit what is better for Little core A55-like using 1x128-bit FPU. Big A79 core can have 2x128-bit FPUs and bigger X2 can just simply add another FPU units to 4x128-bit or even 6x128-bit. With VLA SVE2 they can hook up and utilize higher number of SIMD units than normally would be possible/effective. So they can further increase core width what is the main condition for IPC increase. Pretty smart.
I don't know of any code stream that's going to get much out of six 128b SIMD units, and in fact four is probably pushing it unless you're multithreaded (because there's going to be a fairly high percentage of dependent ops). There's a downside to all of this, too - narrower SVE units will result in a significantly higher dynamic instruction count than wider SVE units, and those extra ops are going to occupy slots in frontend and scheduling structures, etc. In a non-MT core that doesn't have to worry about compatibility with little cores in the same cluster, 2x256 is probably a reasonable spot, and that's around where I expect Matterhorn to end up - but we'll see, of course.

In summary, there's no free lunch - but SVE/SVE2 is still probably my favorite SIMD extension, or at least close to it, because it doesn't make me reinvent the wheel every time there's a significant change on the hardware side.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,990
579
126
NEON => V0-V31, only 128-bit registers
SVE => Z0-Z31, 128 to 2048-bit registers

A single 2048-bit op decode -> a single 2048-bit reservation -> can be executed across multiple 128-bit -> with results loaded from Zx(Vx*16) + Zy(Vy*16) and stored to Zb(Vb*16). => all the while only consuming a single SVE instruction slot.

Single 2048-bit instruction => Multiple 16*128-bit(2x64, 4x32, 8x16, 16x8) data.

Using JIT/AOT one can also run multipe 128-bit ops that are converted into a single 2048-bit op instruction. Before the 128-bit ops get flighted, the software OoO side can combine non-dependent 128-bit ops to a single 2048-bit op for op cache/op storage in RAM.

Instruction quantity goes down with longer Vector Lengths. If the code can support 2048-bit, optimize for 2048-bit. Code like 32 x 32 x 16-bit is best run on 2048-bit instructions than 128/256/512-bit instuctions. As the 2048-bit operation gets inflight sooner than lower quanity-bit operations.

Must support 2048-bit Z0-Z31 width, or it isn't SVE compliant.
^== 512 128-bit registers, 256 256-bit registers, 128 512-bit registers. As the register gets wider it can do more things with less area.
== Another way is support V0-V31 with the PRF and do Z0-Z31 with a register cache(using SRAM devices).

Year 0(Arch 1) => 128-bit same time execution + 2048-bit instruction => Ultra slow
Year 2(Arch 2) => 256-bit same time execution + 2048-bit instruction => Slower
Year 4(Arch 3) => 512-bit same time execution + 2048-bit instruction => Slow
Year 6(Arch 4) => 1024-bit same time execution + 2048-bit instruction => Faster
Year 8(Arch 5) => 2048-bit same time execution + 2048-bit instruction => Ultra fast
Same code, different architecture.

x265, vp9, aom-av1, vvc, aom-av2 you'll want 2048-bit(SVE)/4096-bit(RVV) instruction support. Even if the units are 128-bit in ARM or 32-bit in RISC-V.
 
Last edited:

SarahKerrigan

Member
Oct 12, 2014
187
173
116
NEON => V0-V31, only 128-bit registers
SVE => Z0-Z31, 128 to 2048-bit registers

A single 2048-bit op decode -> a single 2048-bit reservation -> can be executed across multiple 128-bit -> with results loaded from Zx(Vx*16) + Zy(Vy*16) and stored to Zb(Vb*16). => all the while only consuming a single SVE instruction slot.

Single 2048-bit instruction => Multiple 16*128-bit(2x64, 4x32, 8x16, 16x8) data.

Using JIT/AOT one can also run multipe 128-bit ops that are converted into a single 2048-bit op instruction. Before the 128-bit ops get flighted, the software OoO side can combine non-dependent 128-bit ops to a single 2048-bit op for op cache/op storage in RAM.

Instruction quantity goes down with longer Vector Lengths. If the code can support 2048-bit, optimize for 2048-bit. Code like 32 x 32 x 16-bit is best run on 2048-bit instructions than 128/256/512-bit instuctions. As the 2048-bit operation gets inflight sooner than lower quanity-bit operations.

Must support 2048-bit Z0-Z31 width, or it isn't SVE compliant.
^== 512 128-bit registers, 256 256-bit registers, 128 512-bit registers. As the register gets wider it can do more things with less area.
== Another way is support V0-V31 with the PRF and do Z0-Z31 with a register cache(using SRAM devices).

Year 0(Arch 1) => 128-bit same time execution + 2048-bit instruction => Ultra slow
Year 2(Arch 2) => 256-bit same time execution + 2048-bit instruction => Slower
Year 4(Arch 3) => 512-bit same time execution + 2048-bit instruction => Slow
Year 6(Arch 4) => 1024-bit same time execution + 2048-bit instruction => Faster
Year 8(Arch 5) => 2048-bit same time execution + 2048-bit instruction => Ultra fast
Same code, different architecture.

x265, vp9, aom-av1, vvc, aom-av2 you'll want 2048-bit(SVE)/4096-bit(RVV) instruction support. Even if the units are 128-bit in ARM or 32-bit in RISC-V.
Whatever that means, lol. Much though we all appreciate the patented Seronx gibberish, the manual is clear on this; there is no requirement for 2048-bit Z0-Z31. Nobody does it. I don't expect anyone to do it any time soon.
 
  • Haha
Reactions: Thunder 57

NostaSeronx

Platinum Member
Sep 18, 2011
2,990
579
126
there is no requirement for 2048-bit Z0-Z31.
Z0-Z31are required to support 2048-bit widths. Base support is 2048-bit only. No 2048-bit equals not complient with ARM-SVEx.

Full support is all 128-bit + 128-bit up to 15 times. Base support is 2048-bit with every NEON V0-V31 register.
 

SarahKerrigan

Member
Oct 12, 2014
187
173
116
Z0-Z31are required to support 2048-bit widths. Base support is 2048-bit only. No 2048-bit equals not complient with ARM-SVEx.

Full support is all 128-bit + 128-bit up to 15 times. Base support is 2048-bit with every NEON V0-V31 register.
Yeah, no.

"SVE introduces 32 scalable vector registers, Z0-Z31.The size of every vector register is an IMPLEMENTATION DEFINED multiple of 128 bits."

Thanks for playing, and I'm sure Tunnelborer and Crane are going to be here real soon now.
 

Eug

Lifer
Mar 11, 2000
22,698
301
126
It is my opinion that base MacBook pro 13 inch specs are very pedestrian (i5, 8GB, 128GB). The "pro" moniker for a system like that is kind of laughable. I would imagine that most base model MacBook pro buyers are not using that type of system for anything other than text editing, email, internet, maybe some light photoshop. It is much more reasonable to call the higher end ones "pros" when they have i7, i9 and 16GB RAM and 512/1TB SSD.
A14X in MacBook Pro 13" will satisfy 95% of users, including lots and lots of real "pros". Most pros don't run Geekbench all day. And even if they did, A14X performance won't disappoint.

For example, I know someone who does web design for billion dollar multi-national companies and he runs a 2014 iMac. He says for his work CPU performance is a non-issue. He mainly needs gobs of RAM. But even then, he doesn't need as much for his work as some overzealous geeks think they need just to play video games. 32 GB RAM is sufficient.

However, the bigger benefit of the Pros IMO in the past has been stuff like wide colour gamut and better keyboard and trackpad, etc.
 

podspi

Golden Member
Jan 11, 2011
1,925
22
81
Yes, I think people are overestimating the number of people who are checking benchmarks before buying a laptop. Even people doing creative work can often get by with much less than enthusiasts think, especially when custom-built silicon is in the mix, which it is here (AI acceleration, etc).
 

Eug

Lifer
Mar 11, 2000
22,698
301
126
P.S. Star Trek: Discovery is edited on a 2013 Mac Pro trash can. That's a 7 year-old machine. What's even more surprising though is that the "desk" the computer sits on during Covid is a plastic foldable portable table. :p


I'm not sure which CPU he has in the Mac Pro, but the iPad Pro A14X SoC coming within the year will likely be almost as fast for CPU speed as a mid-tier 8-core Mac Pro of that generation. Only the 12-core model would still be significantly faster.

Obviously I'm not recommending a broadcast TV video editor buying a new computer today for work should be buying such a computer but it does illustrate the performance improvements Apple has made in its Arm processors during this period, and how Apple is more than prepared for the transition.

Apple simply does not need boutique chips for most MacBook Pros and iMacs. Their existing chip categories are already good enough for most pro users. It's really just for the really high end does Apple need faster chips, but I think it's pretty safe to say Apple can make those too if it decides it wants to. They will be comparatively more costly for Apple to make, but that's OK, because those are more expensive machines anyway. Instead of paying Intel for high end i7, i9, and Xeon chips, it can keep everything in-house (aside from paying TSMC for fab services).
 

jpiniero

Diamond Member
Oct 1, 2010
7,789
1,114
126
FWIW, it does sound like Apple is also ditching Radeon dGPUs too, based upon what they are telling developers. This is not definitive but does make sense with them going to their own CPU.
 
  • Like
Reactions: Etain05

soresu

Golden Member
Dec 19, 2014
1,179
443
136
FWIW, it does sound like Apple is also ditching Radeon dGPUs too, based upon what they are telling developers. This is not definitive but does make sense with them going to their own CPU.
It's a huge jump from scaling up a pre existing and proven CPU core to introducing a dGPU at the level they scale up to on the current Mac Pro (2x Vega II Duo's, or 4x Vega II GPU's).

The existing GPU core in the Axx SoC is basically a stripped down and souped up PowerVR design that hasn't been used for any truly significant compute work as no such application was available on iOS or its derivatives.

I have no doubt that they will certainly try for a custom GPU, but this is a fraught minefield for them with all the GPU patents floating about - the fact that they signed back up to PowerVR is proof apparent that they could not simply make a GPU that fit outside of the box that existing patents and compute needs constrain it too.

OTOH they have tremendous leeway with AMD semi custom designs, and currently they have really not pushed that beyond some basic high efficiency HBM SKU's and the Vega II Duo's - they may well just go for a more extreme custom design, after all licensing from Imagination does not prevent them from having AMD co designs too, ala Sony/MS console GPU's.

Especially if AMD does indeed have chiplet GPU coming and has disclosed these plans to semi custom partners.
 
  • Like
Reactions: Tlh97

Doug S

Member
Feb 8, 2020
143
172
76
If Apple really is going to use their own GPU design for the Mac Pro they must already have prototyped some high end GPUs and determined the performance is competitive. There's no way they'd do it if they weren't confident of that.

It still isn't clear what the new license with Imagination was about, but I have to think it is simply patent protection. After making a big deal about designing their own GPUs they wouldn't go back to using someone else's designs unless they hit a pretty big roadblock and were truly desperate. Imagination doesn't have experience with designs at the high end so they aren't likely to be of much help with the Mac Pro GPU.

Perhaps it is possible they might license the Mac Pro GPU from AMD, at least for the first gen, if they aren't able to compete at that level yet. But I question whether AMD would be very amenable to that. Today AMD has all of Apple's discrete GPU business, if they help Apple they would be helping Apple become their former customer. If they know Apple has a missing piece of "what to do about the Mac Pro" and they don't help them, maybe it takes Apple another year or two before they are ready to switch. On the other hand, helping Apple leave x86 hurts Intel so maybe it is an "enemy of my enemy" situation lol
 
  • Like
Reactions: Tlh97

senttoschool

Golden Member
Jan 30, 2010
1,358
82
91
If Apple really is going to use their own GPU design for the Mac Pro they must already have prototyped some high end GPUs and determined the performance is competitive. There's no way they'd do it if they weren't confident of that.

It still isn't clear what the new license with Imagination was about, but I have to think it is simply patent protection. After making a big deal about designing their own GPUs they wouldn't go back to using someone else's designs unless they hit a pretty big roadblock and were truly desperate. Imagination doesn't have experience with designs at the high end so they aren't likely to be of much help with the Mac Pro GPU.

Perhaps it is possible they might license the Mac Pro GPU from AMD, at least for the first gen, if they aren't able to compete at that level yet. But I question whether AMD would be very amenable to that. Today AMD has all of Apple's discrete GPU business, if they help Apple they would be helping Apple become their former customer. If they know Apple has a missing piece of "what to do about the Mac Pro" and they don't help them, maybe it takes Apple another year or two before they are ready to switch. On the other hand, helping Apple leave x86 hurts Intel so maybe it is an "enemy of my enemy" situation lol
I think Apple will use its own GPUs for Mac laptops but will continue to offer AMD as a choice for iMacs and Mac Pros.

There's no way Apple can whip out a dedicated GPU that can compete with the best of Nvidia and AMD in the next 5 years.
 

jpiniero

Diamond Member
Oct 1, 2010
7,789
1,114
126
It's a huge jump from scaling up a pre existing and proven CPU core to introducing a dGPU at the level they scale up to on the current Mac Pro (2x Vega II Duo's, or 4x Vega II GPU's).
I could see them offering AMD GPUs as compute accelerators on the Mac Pro.
 

Richie Rich

Senior member
Jul 28, 2019
421
196
76
I think Apple will use its own GPUs for Mac laptops but will continue to offer AMD as a choice for iMacs and Mac Pros.

There's no way Apple can whip out a dedicated GPU that can compete with the best of Nvidia and AMD in the next 5 years.
Apple surely can make dedicated desktop GPU if they want. Their SOC GPU are brutaly efficient. Apple GPU has 4x more FPS per Watt compared to NVIDIA and Radeons, just look at the measurements:


GPU in A12X is literaly beating older desktop class Radeons 6000 and 7000 series:

There is nothing to hold them back to put more CUs. GPU scales pretty well once you have good architecture. And Apple has best mobile architecture on the market. The same big bang what happens with their ARM CPU can easily happen in GPU too. Tough mobile environment forced them to develop much efficient and better architecture (valid for both CPUs and GPUs). Lazy desktop chipmakers like Nvidia and AMD will eat sour fruits of their own laziness. That's simple.

ML and AI? Apple has again best in class NPU for that. NVIDIA was so bad in ML/AI that Tesla dump Nvidia and raher created their own silicon (with help of Jim Keller). AMD is much worse than NV in ML. Question is if GPU can be as good as special NPU, I really doubt.

ARM MacBook is game changer. Always on function thanks to Little cores like smartphone. Face recognition for unlock-screen, speech to text thanks to NPU. Current desktops are last century technology on stereoids. Somehow still powerfull but really outdated like dinosaurs.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
15,742
4,706
136
Apple: We'll transition the Mac to ARM.
Forum: Apple is making server chips and discrete GPUs.

I am genuinely amazed we don't have a dedicated Apple console thread by now.
I agree with where you're going with this. There's no real indicator that Apple is trying to replace every class of Intel chip they've used in Mac products in the past (much less dGPUs). They may be abandoning certain market segments, or at least changing how they serve them.
 

senttoschool

Golden Member
Jan 30, 2010
1,358
82
91
ML and AI? Apple has again best in class NPU for that. NVIDIA was so bad in ML/AI that Tesla dump Nvidia and raher created their own silicon (with help of Jim Keller). AMD is much worse than NV in ML. Question is if GPU can be as good as special NPU, I really doubt.
Relax. Nvidia is the standard in A.I. acceleration. There is a huge difference between tiny inference ML/AI acceleration on a phone versus the Nvidia class A.I. hardware acceleration and industry-standard APIs.

Inferring a face recognition machine learning algorithm on an iPhone is different than training a neural network with petabytes of data.

I guarantee you Apple employees are using Nvidia cards to do internal A.I. work.

You can be bullish on Apple's SoC designing capabilities, but you don't have to be foolish.
 
Last edited:
  • Like
Reactions: Glo. and beginner99

senttoschool

Golden Member
Jan 30, 2010
1,358
82
91
I agree with where you're going with this. There's no real indicator that Apple is trying to replace every class of Intel chip they've used in Mac products in the past (much less dGPUs). They may be abandoning certain market segments, or at least changing how they serve them.
The most important thing to Apple is a unified SoC architecture so iOS, iPadOS, and MacOs can share applications.

The second is performance.

Apple won't be able to match everything that AMD, Nvidia, and Intel can provide right now to start with. But they're ok with that because their main goal is to unify their OSs.
 

blckgrffn

Diamond Member
May 1, 2003
6,829
126
106
www.teamjuchems.com
Apple: We'll transition the Mac to ARM.
Forum: Apple is making server chips and discrete GPUs.

I am genuinely amazed we don't have a dedicated Apple console thread by now.
I think the last couple ATV launches must have been under your radar. Did you miss how those are game consoles?

And how they will leverage the App Store to bring in devs and immediately have a huge impact on the market?

I don't know about here because I was on a hiatus, but a good part of the Internet seemed really enthused about this possibility.

Just think of it, next gen consoles come out and Apple is there an ready too, with two years of free Apple TV and an iPad SoC in a box (Apple TV 8k!) that costs just as much as the PS5 and the Xbox SX... how could we say no?!? :p
 

soresu

Golden Member
Dec 19, 2014
1,179
443
136
Perhaps it is possible they might license the Mac Pro GPU from AMD, at least for the first gen, if they aren't able to compete at that level yet. But I question whether AMD would be very amenable to that. Today AMD has all of Apple's discrete GPU business, if they help Apple they would be helping Apple become their former customer.
Licensing is not necessary.

The relationships AMD have with Sony and MS show that a semi custom arrangement with can be far more than just sticking some HBM on a custom SKU of their off the shelf uArch's.

Unlike nVidia who are essentially providing a pre designed (and rapidly aging) chip with TX1 for Switch, AMD have provided highly customised designs for their console partners - as Apple, why bother to make such a massive investment to create a whole new high end GPU division when they have a partner doing all that custom work for them?

Personally I would not be inclined to go off on a tangent with a new design, patents or not - AMD are just getting started with RDNA2 IMHO, given their design strategies are aligning to Zen and passing learning from one project back to the other.

nVidia are probably thinking of Zen2 and the possible future of RDNA when designing for Hopper - while AMD are already evolving the physical chiplet architecture going from Zen3 to Zen4 and beyond, I don't expect the RDNA chiplet evolution to look like Zen2 at all.
 

ASK THE COMMUNITY