Unified generally has better utilization, but requires more area and power than distributed scheduler.Is this a good or a bad thing? What does it mean for performance impacts do you think?
Unified generally has better utilization, but requires more area and power than distributed scheduler.Is this a good or a bad thing? What does it mean for performance impacts do you think?
yeah so you get a lot less entries.but requires more area and power than distributed scheduler.
More like Zen3 imo. Zen3 had 4 schedulers (3x ALU + AGU, 1x ALU + Branch). I think Zen6 is 6x ALU + AGU.yeah so you get a lot less entries.
In any case, this looks like it went back to Zen1/2 scheduler layout?
Will that improve integer perf substantially?I think Zen6 is 6x ALU + AGU.
RETVRN TO K10.5I think Zen6 is 6x ALU + AGU.
no.Will that improve integer perf substantially?
What will it improve then - FP or just more suitable for SMT workloads?
Well it's the int scheduler.What will it improve then - FP or just more suitable for SMT workloads?
Well if it schedules better then int perf should go up, right? Seems like a drastic change which would be big risk if perf improvements were not goodWell it's the int scheduler.
Maybe.Well if it schedules better then int perf should go up, right?
ah no LNC ditched unified scheduler and also fixed the port favela that's been with us since P6.just that the unified scheduler discussion made me think about what Intel used to do (or still do?)
Grok, find team B's fingerprints on this change.
Oh awesome, my speculation, at least for the integer scheduler, looks to likely be wrong and that's great!
wanna bet on i$ size now?Oh awesome, my speculation, at least for the integer scheduler, looks to likely be wrong and that's great!
I do love when I get shit wrong because that means I get to ask questions and learn more about why they did what they did... and I do like interesting twists and this is definitely an interesting twist...
I am really interested if they have added 2 more AGUs... because that would be up to 6 memory ops per cycle which is quite spicy IMO...oh okay memory scheduler is also GONE.
We're either back to Zen3/4 sched layout, or K10.5.
The latter is the funnier option.
I am going to assume it's the same 32KB as Zen 2-5 but I am kinda really hoping for 48+ KB... Maybe 64KB... maybe...wanna bet on i$ size now?
But by such a large margin though?Simple, you are having to feed a much larger frontend... you expect more L1i misses assuming the same structure size...
The only specint2017 subtest where L2 BTB overrides went down in Huang's testing was 500.perlbench 2. 500.perlbench 1 and 3, as well as all the other specint subtests he ran, saw a large L2 BTB override increase.So that is very dependent on the workload, for example compiling the Linux Kernel the L2 BTB overrides went down from about 12.86 MPKI to about 3 MPKI... where as the L1 iTLB misses went up which again isn't surprising considering that the L1 iTLB size didn't change from Zen 4 to Zen 5...

Maybe a double node jump allows them to increase capacity. Intel did increase L1i capacity from GLC to RWC when they shrunk from Intel 7 to Intel 4.I am going to assume it's the same 32KB as Zen 2-5 but I am kinda really hoping for 48+ KB... Maybe 64KB... maybe...
oh nyo area isn't the problem.Maybe a double node jump allows them to increase capacity
no. speed is life.Could it they are going back to previous designs because Zen6 is going to used in more client focused products as well?
😉In any case, this is the first time AMD is diddling schedulers for a derived core.
