Question Miscellaneous Questions thread

amd6502 · Dec 1, 2019

This is a thread for minor questions that maybe aren't quite suitable for their own thread.

amd6502 · Dec 1, 2019

so i came across a bit that where it's said that Pentirum 4 has double pumped ALU's or execution pipes, meaning that such pipes run at twice the frequency as the processor frequency.

1. I'm wondering if anyone here knows whether there are any modern processors where this is also done.

2. Is this done on modern GPU's?

NTMBK · Dec 1, 2019

Sounds worthy of its own thread to me 🙂

Used to be used on NVidia GPUs, where the "shader clock" was double that of the rest of the GPU. But they dropped it way back in Kepler (GTX 680).

amd6502 · Dec 1, 2019

Thanks for the great info NTMBK.

As for the thread, it's just kind of an odd curiosity I had lingering a while, and I think if anyone else has some question in the back of their minds they might as well feel free tag onto this random question thread.

Thunder 57 · Dec 1, 2019

amd6502 said:
so i came across a bit that where it's said that Pentirum 4 has double pumped ALU's or execution pipes, meaning that such pipes run at twice the frequency as the processor frequency.

1. I'm wondering if anyone here knows whether there are any modern processors where this is also done.

2. Is this done on modern GPU's?

1. Not that I know of.

Here's my question:

Why did K10 - BD have massive L3 set associative cache? I think K10 was 32 way, K10.5 was 48 way, and BD was 64 way. We seemed to have settled on to 16 way these days. I just double checked, and those values for K10 and K10.5 seem correct, but it was a bit difficult to find numbers on the BD family. I remember seeing 64 but now I am also seeing 16. I'll edit this if I find something conclusive.

Ottonomous · Dec 1, 2019

I was thinking of doing this, thanks.

Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential

Thunder 57 · Dec 1, 2019

Ottonomous said:
I was thinking of doing this, thanks.

Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential

More of a GPU question but my guess is that it was too difficult or would've broken GCN, or just taken too long. They call call Navi RDNA but IMHO it looks like RDNA2 next year will be the real "break", if you will, from GCN.

I'd say it's like Navi is more like a Zen+, while RDNA2 is more like Zen 2. Expect to see some significant changes.

Ottonomous · Dec 1, 2019

Thunder 57 said:
1. Not that I know of.

Here's my question:

Why did K10 - BD have massive L3 set associative cache? I think K10 was 32 way, K10.5 was 48 way, and BD was 64 way. We seemed to have settled on to 16 way these days. I just double checked, and those values for K10 and K10.5 seem correct, but it was a bit difficult to find numbers on the BD family. I remember seeing 64 but now I am also seeing 16. I'll edit this if I find something conclusive.

Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.

Thunder 57 · Dec 1, 2019

Ottonomous said:
Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.

That might make sense. At least for BD. Still not sure why they had so many sets for Phemon I & II though. I have to think that cost them latency wise.

Well I was definitely going about trying to verify the sets in BD the wrong way. I realized I could just search for cpu-z bulldozer and pull up a screenshot. Sure enough, it is 64 ways.

amd6502 · Dec 1, 2019

Ottonomous said:
Excuse my ignorance, but better inter-module communication + plus shared execution resources? L2 probably wasn't as necessarily efficient. Shooting in the dark here.

Thunder 57 said:
Why did K10 - BD have massive L3 set associative cache?

Not sure but they had server ambitions and BD and PD both had large L2's already. So I think they figured L3 may as well be slow, and if anything improve speed first on L2.

(Of course then it turned out speed demon CMT on 32nm or 28nm SOI was not that efficient and threw in the towel for servers after FDSOI was indefinitely delayed. After PD the already large L2 became the LLC. And then from SR to XV it was halved to 1MB of L2 per module.)

Also L2 and L3 might integrate better together as there were four 16-way L2's on the die. (I'm supposing they could share sets).

Ottonomous said:
Why did AMD insist on hardware scheduling on its Vega GPUs instead of transferring it to the CPU (SW) like nvidia? Is it theoretically more efficient or was it a shortcoming on GCN they needed to address to achieve its potential

Hardware scheduling might ease CPU to GPU communication, so maybe out of bandwidth concerns for APU's (whose size and performance are always limited by RAM bandwidth).

Question Miscellaneous Questions thread

amd6502

Senior member

amd6502

Senior member

NTMBK

Lifer

amd6502

Senior member

Thunder 57

Diamond Member

Ottonomous

Senior member

Thunder 57

Diamond Member

Ottonomous

Senior member

Thunder 57

Diamond Member

amd6502

Senior member

TRENDING THREADS