Question Miscellaneous Questions thread

amd6502

Senior member
Apr 21, 2017
971
360
136
This is a thread for minor questions that maybe aren't quite suitable for their own thread.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
so i came across a bit that where it's said that Pentirum 4 has double pumped ALU's or execution pipes, meaning that such pipes run at twice the frequency as the processor frequency.

1. I'm wondering if anyone here knows whether there are any modern processors where this is also done.

2. Is this done on modern GPU's?
 

NTMBK

Lifer
Nov 14, 2011
10,239
5,026
136
Sounds worthy of its own thread to me :)

Used to be used on NVidia GPUs, where the "shader clock" was double that of the rest of the GPU. But they dropped it way back in Kepler (GTX 680).
 
  • Like
Reactions: Ottonomous

amd6502

Senior member
Apr 21, 2017
971
360
136
Thanks for the great info NTMBK.

As for the thread, it's just kind of an odd curiosity I had lingering a while, and I think if anyone else has some question in the back of their minds they might as well feel free tag onto this random question thread.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
so i came across a bit that where it's said that Pentirum 4 has double pumped ALU's or execution pipes, meaning that such pipes run at twice the frequency as the processor frequency.

1. I'm wondering if anyone here knows whether there are any modern processors where this is also done.

2. Is this done on modern GPU's?

1. Not that I know of.

Here's my question:

Why did K10 - BD have massive L3 set associative cache? I think K10 was 32 way, K10.5 was 48 way, and BD was 64 way. We seemed to have settled on to 16 way these days. I just double checked, and those values for K10 and K10.5 seem correct, but it was a bit difficult to find numbers on the BD family. I remember seeing 64 but now I am also seeing 16. I'll edit this if I find something conclusive.
 
  • Like
Reactions: Ottonomous

Ottonomous

Senior member
May 15, 2014
559
292
136
I was thinking of doing this, thanks.

Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
I was thinking of doing this, thanks.

Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential

More of a GPU question but my guess is that it was too difficult or would've broken GCN, or just taken too long. They call call Navi RDNA but IMHO it looks like RDNA2 next year will be the real "break", if you will, from GCN.

I'd say it's like Navi is more like a Zen+, while RDNA2 is more like Zen 2. Expect to see some significant changes.
 
  • Like
Reactions: Ottonomous

Ottonomous

Senior member
May 15, 2014
559
292
136
1. Not that I know of.

Here's my question:

Why did K10 - BD have massive L3 set associative cache? I think K10 was 32 way, K10.5 was 48 way, and BD was 64 way. We seemed to have settled on to 16 way these days. I just double checked, and those values for K10 and K10.5 seem correct, but it was a bit difficult to find numbers on the BD family. I remember seeing 64 but now I am also seeing 16. I'll edit this if I find something conclusive.
Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.

That might make sense. At least for BD. Still not sure why they had so many sets for Phemon I & II though. I have to think that cost them latency wise.

Well I was definitely going about trying to verify the sets in BD the wrong way. I realized I could just search for cpu-z bulldozer and pull up a screenshot. Sure enough, it is 64 ways.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.

Why did K10 - BD have massive L3 set associative cache?

Not sure but they had server ambitions and BD and PD both had large L2's already. So I think they figured L3 may as well be slow, and if anything improve speed first on L2.

(Of course then it turned out speed demon CMT on 32nm or 28nm SOI was not that efficient and threw in the towel for servers after FDSOI was indefinitely delayed. After PD the already large L2 became the LLC. And then from SR to XV it was halved to 1MB of L2 per module.)

Also L2 and L3 might integrate better together as there were four 16-way L2's on the die. (I'm supposing they could share sets).


Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential

Hardware scheduling might ease CPU to GPU communication, so maybe out of bandwidth concerns for APU's (whose size and performance are always limited by RAM bandwidth).
 
Last edited:
  • Like
Reactions: Ottonomous