Discussion 12700k vs 5900x/3900x/5950x DC benchmarks info, and some build info.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I have a 12700k coming and the parts on Monday. It will take a couple of days to get setup, then I would like some assistance trying to benchmark it vs the 3900x/5900x/5950x. The build is in the Alder lake builders thread and starts here: https://forums.anandtech.com/threads/alder-lake-builders-thread.2598850/post-40712563.

But I got off-topic, and this thread will serve the benchmark part for DC purposes. Suggestions are welcome, but so far, I can get a good primegrid CPU comparison between a 3950x and a 5950x, as I can get both machines loaded 100%. Without that its not fair as the speed changes depending on the load.

Anyway, suggestions on how to do this are more than welcome. It would be easy if WCG were up.

Build info:
12700k
ASUS Prime Z690P motherboard
WD black SN770 PCIE gen4 250 gig HD/SSD
DDR4 4000 cl18-22-22-42
850 gold EVGA PSU.
 
Last edited:
  • Like
Reactions: Assimilator1

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
If you want to take PrimeGrid as a benchmarking tool, perhaps start with SGS-LLR (Sophie Germain Prime Search). Among PrimeGrid's subprojects without GPU use, this one is searching the smallest primes and thereby has got a tolerably low processor cache footprint. Meaning, it runs best in singlethreaded mode, allowing for straightforward apples-to-apples comparison. ("Singlethreaded" meaning one program thread per task. Whether to use 1 or 2 hardware threads of SMT-/ HT-capable cores is another question.)

The other PrimeGrid subprojects have larger cache requirements. This means that application performance on a given CPU depends a lot on how many application instances run simultaneously × how many program threads an application instance uses.
If you run too many application instances at once, the processor's execution units are starved waiting for memory accesses, as caches are no longer sufficient to feed them.

If you increase the program thread count, thread synchronization overhead increases, IOW the processor's execution units spend increasing time on management rather than on actual computation.

Yet if the product of simultaneous application instance count and program thread count is too small, the processor's execution units won't be fully utilized either.

Furthermore, CPUs with segmented caches — that is, most of AMD's Zen based CPUs — profit from taking the process scheduling partly out of the control of the operating system, because cache aware scheduling is either absent in operating system kernels or at least not as efficient as "manual" scheduling, a.k.a. CPU affinity.

All this means that one and the same CPU, testing one and the same prime candidate number, will perform quite differently depending on the setup.

Edit:
PrimeGrid's applications performances are heavily biased towards vector units width and cache sizes. Perhaps you should check out TN-Grid. The variability between workunits is rather small, it's application does not appear to be cache heavy, and it uses vector units but in moderation. Also, TN-Grid is a biological project with direct or indirect medical implications, depending on workunit batches. The current work is related to neurodegenerative disorders, and to hematopoietic tumors, according to the status page.
Due to the WCG pause, TN-Grid work is much in demand right now. You may have to issue some periodic project updates in order to receive work continuously.

Any given host will receive a few different application versions (SSE2, AVX, and FMA versions). These versions are all capable to process the same workunits, and they share a single server-side workqueue. As the server sends work for each of these application versions to a hos for quite a while, it tries to figure out which version works best on the host, and will eventually send mostly work for the apparent best version. However, performance differences are rather small between the versions, and the server may in fact ending up settling on a version which is not the precise optimum.
 
Last edited:
  • Like
Reactions: igor_kavinski

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Interesting... Can it be believed ? The 12700F

mark@mark-12700F:~$ lscpu --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 6100.0000 800.0000
1 0 0 0 0:0:0:0 yes 6100.0000 800.0000
2 0 0 1 1:1:1:0 yes 6100.0000 800.0000
3 0 0 1 1:1:1:0 yes 6100.0000 800.0000
4 0 0 2 2:2:2:0 yes 6300.0000 800.0000
5 0 0 2 2:2:2:0 yes 6300.0000 800.0000
6 0 0 3 3:3:3:0 yes 6100.0000 800.0000
7 0 0 3 3:3:3:0 yes 6100.0000 800.0000
8 0 0 4 4:4:4:0 yes 6100.0000 800.0000
9 0 0 4 4:4:4:0 yes 6100.0000 800.0000
10 0 0 5 5:5:5:0 yes 6100.0000 800.0000
11 0 0 5 5:5:5:0 yes 6100.0000 800.0000
12 0 0 6 6:6:6:0 yes 6300.0000 800.0000
13 0 0 6 6:6:6:0 yes 6300.0000 800.0000
14 0 0 7 7:7:7:0 yes 6100.0000 800.0000
15 0 0 7 7:7:7:0 yes 6100.0000 800.0000
16 0 0 8 8:8:8:0 yes 3600.0000 800.0000
17 0 0 9 9:9:8:0 yes 3600.0000 800.0000
18 0 0 10 10:10:8:0 yes 3600.0000 800.0000
19 0 0 11 11:11:8:0 yes 3600.0000 800.0000
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I have some preliminary results. In primegrid, it looks like 20 units can do work at the same ETA as a 5950x, but that one is doing 31 units. And the wattage is the same, 245 watts from the wall for a 5950x with 31 vs a 12700F at the same wattage doing only 19. Speed appears the same but 50% more units at the same wattage.
 
  • Like
Reactions: cellarnoise
Jul 27, 2020
16,165
10,240
106
Interesting... Can it be believed ? The 12700F

mark@mark-12700F:~$ lscpu --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ
0 0 0 0 0:0:0:0 yes 6100.0000 800.0000
4 0 0 2 2:2:2:0 yes 6300.0000 800.0000
Bug in lscpu? No way your 12700F is doing 6000+ MHz.
 

biodoc

Diamond Member
Dec 29, 2005
6,261
2,238
136
Bug in lscpu? No way your 12700F is doing 6000+ MHz.

I believe that number is the theoretical maximum. Oddly enough, the output shows the actual clock speed as 0.0000 so 'lscpu' is not showing the the correct clock speed if the processor is under full load..

Output from my 5950X under full load:

5083 MHz is theoretical max and the second number, 3979 is actual clock speed in MHz. 2200 MHz is theoretical minimum.

Code:
mark@x32-linux:~$ lscpu --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 5083.3979 2200.0000
  1    0      0    1 1:1:1:0          yes 5083.3979 2200.0000
  2    0      0    2 2:2:2:0          yes 5083.3979 2200.0000
  3    0      0    3 3:3:3:0          yes 5083.3979 2200.0000
  4    0      0    4 4:4:4:0          yes 5083.3979 2200.0000
  5    0      0    5 5:5:5:0          yes 5083.3979 2200.0000
  6    0      0    6 6:6:6:0          yes 5083.3979 2200.0000
  7    0      0    7 7:7:7:0          yes 5083.3979 2200.0000
  8    0      0    8 8:8:8:1          yes 5083.3979 2200.0000
  9    0      0    9 9:9:9:1          yes 5083.3979 2200.0000
 10    0      0   10 10:10:10:1       yes 5083.3979 2200.0000
 11    0      0   11 11:11:11:1       yes 5083.3979 2200.0000
 12    0      0   12 12:12:12:1       yes 5083.3979 2200.0000
 13    0      0   13 13:13:13:1       yes 5083.3979 2200.0000
 14    0      0   14 14:14:14:1       yes 5083.3979 2200.0000
 15    0      0   15 15:15:15:1       yes 5083.3979 2200.0000
 16    0      0    0 0:0:0:0          yes 5083.3979 2200.0000
 17    0      0    1 1:1:1:0          yes 5083.3979 2200.0000
 18    0      0    2 2:2:2:0          yes 5083.3979 2200.0000
 19    0      0    3 3:3:3:0          yes 5083.3979 2200.0000
 20    0      0    4 4:4:4:0          yes 5083.3979 2200.0000
 21    0      0    5 5:5:5:0          yes 5083.3979 2200.0000
 22    0      0    6 6:6:6:0          yes 5083.3979 2200.0000
 23    0      0    7 7:7:7:0          yes 5083.3979 2200.0000
 24    0      0    8 8:8:8:1          yes 5083.3979 2200.0000
 25    0      0    9 9:9:9:1          yes 5083.3979 2200.0000
 26    0      0   10 10:10:10:1       yes 5083.3979 2200.0000
 27    0      0   11 11:11:11:1       yes 5083.3979 2200.0000
 28    0      0   12 12:12:12:1       yes 5083.3979 2200.0000
 29    0      0   13 13:13:13:1       yes 5083.3979 2200.0000
 30    0      0   14 14:14:14:1       yes 5083.3979 2200.0000
 31    0      0   15 15:15:15:1       yes 5083.3979 2200.0000
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
OK, a full loaded 5950x, then a fully loaded 12700F, both at stock. It appears to be that the 5950x for 20 tasks, averages 5525 seconds. The same data for a 12700F shows average of 8677 seconds, both with the same average credit, here are the last tasks. So the 12700F does 64% of the work of the 5950x(64% OF THE TIME), but also for 2/3rds of the daily ppd(21 vs 31 tasks at a time). Can someone look at my facts and tell me what I might be missing ? Or 224k vs 91.5k ppd calculated as average daily task time=tasks per day times average task credit times number of tasks per computer. The Excel calc was =(SUM(G25:G44)/20)*10*19 for the 12700F. It can do 10 tasks a day and 19 concurrent tasks.
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:26:07 UTC9,123.908,995.14481.75Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:31:20 UTC9,434.029,293.32481.75Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 16:25:31 UTC9 Mar 2022 | 18:00:13 UTC5,666.455,654.34481.79Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:20:36 UTC3,122.283,115.12481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:20:22 UTC3,148.483,137.76481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:20:49 UTC3,198.733,191.09481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:21:19 UTC3,241.213,231.17481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:21:05 UTC3,235.513,226.19481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:22:46 UTC3,390.483,382.32481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:20:22 UTC3,255.443,246.91481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:25:23 UTC3,573.883,557.09481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:19:16 UTC3,254.753,245.55481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:19:02 UTC3,304.763,292.96481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:17:25 UTC3,330.133,316.62481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 1:18:49 UTC3,745.043,722.76481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:36:34 UTC9,394.849,274.77481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:34:21 UTC9,386.229,233.39481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:26:19 UTC9,131.358,993.62481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:23:31 UTC8,970.778,817.60481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 21:53:50 UTC10 Mar 2022 | 0:33:57 UTC9,592.429,457.36481.82Genefer 17 Mega v3.22 (cpuGFN17MEGA)
5525.033​
5469.254​
481.8115​
9 Mar 2022 | 0:54:10 UTC10 Mar 2022 | 0:57:00 UTC6,373.306,373.00481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:58 UTC10 Mar 2022 | 0:41:40 UTC9,371.319,371.31481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC10 Mar 2022 | 0:31:28 UTC9,385.649,385.64481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC10 Mar 2022 | 0:16:19 UTC8,673.828,673.73481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC10 Mar 2022 | 0:28:14 UTC9,515.699,515.63481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC9 Mar 2022 | 23:40:02 UTC6,703.296,702.42481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC10 Mar 2022 | 0:16:42 UTC8,990.338,990.33481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:46 UTC10 Mar 2022 | 0:00:31 UTC8,366.958,366.95481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC10 Mar 2022 | 0:08:27 UTC9,729.099,729.09481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC9 Mar 2022 | 23:41:25 UTC8,221.788,221.35481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC9 Mar 2022 | 23:49:06 UTC9,100.269,100.26481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC9 Mar 2022 | 23:56:07 UTC9,670.619,670.61481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC9 Mar 2022 | 23:36:20 UTC8,511.888,511.77481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:35 UTC9 Mar 2022 | 23:31:51 UTC8,498.858,498.61481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 23:38:39 UTC9,085.989,085.91481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 23:27:52 UTC8,909.768,909.41481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 23:10:48 UTC8,245.178,244.92481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 22:58:23 UTC8,206.988,206.47481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 22:57:10 UTC8,830.368,830.09481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
9 Mar 2022 | 0:53:24 UTC9 Mar 2022 | 22:58:41 UTC9,158.809,158.80481.8Genefer 17 Mega v3.22 (cpuGFN17MEGA)
8677.493​
8677.315​
481.8​
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
I am not sure what the chache footprint of cpuGFN17MEGA is. It may be a bit much for 20 simultaneous program instances on i7-12700F.

Core i7-12700F has got 1.25 MB L2$ for each of the 8 dualthreaded performance cores and 512 kB L2$ for each of the 4 singlethreaded efficiency cores, plus 25 MB unified L3$.

Ryzen 9 5950X has got 512kB L2$ for each of the 16 dualthreaded cores, and 32 MB L3$ for each of the two 8c/16t core complexes.

In either case, the hot data of one cpuGFN17MEGA instance will most likely exceed L2$ size. I *suspect* 20 instances¹ won't fit entirely in to the 25 MB L3$ of the i7 either, but 2x16 instances² might still fit just barely into the 2x32 MB L3$ of the R9. (But again, I don't know for sure.)

That's why I suggested to test with SGS-LLR (Sophie Germain Prime Search). Cache footprint shouldn't be an issue with this one.

Furthermore, as you are taking an average of the throughput of 16 SMT threads of the performance cores and 4 efficiency cores, it's better to take a sample of, let's say, 60 results, not 20 results.

PS, average tasks completions pay day per thread is 24 hours/day * 3600 seconds/day / average task duration in seconds = 24*3600/8677.493 = 9.96 average completions per day per thread.

________
¹) or 19, for that matter
²) or 15+16, if you prefer
plus whatever other workload you might have running in parallel to the test (e.g. FahCore_22 may be cache heavy too, I suspect, and certainly memory I/O heavy)
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
I am not sure what the chache footprint of cpuGFN17MEGA is. It may be a bit much for 20 simultaneous program instances on i7-12700F.

Core i7-12700F has got 1.25 MB L2$ for each of the 8 dualthreaded performance cores and 512 kB L2$ for each of the 4 singlethreaded efficiency cores, plus 25 MB unified L3$.

Ryzen 9 5950X has got 512kB L2$ for each of the 16 dualthreaded cores, and 32 MB L3$ for each of the two 8c/16t core complexes.

In either case, the hot data of one cpuGFN17MEGA instance will most likely exceed L2$ size. I *suspect* 20 instances¹ won't fit entirely in to the 25 MB L3$ of the i7 either, but 2x16 instances² might still fit just barely into the 2x32 MB L3$ of the R9. (But again, I don't know for sure.)

That's why I suggested to test with SGS-LLR (Sophie Germain Prime Search). Cache footprint shouldn't be an issue with this one.

Furthermore, as you are taking an average of the throughput of 16 SMT threads of the performance cores and 4 efficiency cores, it's better to take a sample of, let's say, 60 results, not 20 results.

PS, average tasks completions pay day per thread is 24 hours/day * 3600 seconds/day / average task duration in seconds = 24*3600/8677.493 = 9.96 average completions per day per thread.

________
¹) or 19, for that matter
²) or 15+16, if you prefer
plus whatever other workload you might have running in parallel to the test (e.g. FahCore_22 may be cache heavy too, I suspect, and certainly memory I/O heavy)
Not sure I got all of that, but for one NO other processes were running. NO folding anything. As far as cache sizes and all, this was just one test. I fully expect different results from say WCG. But it should be representative of how Alderlake stacks up to Zen3. And yes, by my calculations, ADL (12700F in this case) can do 10 units per day per core average, where the 5950x can do 15, or 50% more units, and approx 50% more units per day (19 vs 31).

So I think you validated my results. More test results where WCG comes back online. Rosetta is too irregular with handing out units, and the way it does points and such. And this test shows that ADL draws as much power for 19 virtual cores as 31 for Zen 3, so Zen 3 is more power efficient.

This was unit only test, but thats what I wantes, so P core vs E core is an average it appears, and that unix doesn't care which is which. This is all about DC apps, nothing else..
 

Skillz

Senior member
Feb 14, 2014
926
951
136
That's probably not right. Those 4 eco cores are most likely not in that data set which is why Stef is suggesting you take data from 60 or more tasks to compare to instead of 20.

The best thing to do in this situation is to create to brand new instances on both computers. Then let them start running both at the same time. Then in 12, 24, 48 hours etc... compare the number of completed tasks by each system as well as the total points each system has accumulated. (GFN has a wingman, so the points could be off by a good bit)
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
That's probably not right. Those 4 eco cores are most likely not in that data set which is why Stef is suggesting you take data from 60 or more tasks to compare to instead of 20.
I haven't researched ADL myself, but I imagine there are scenarios in which 1 E core performs similar to each of 2 threads on one P core. (Notably in memory-bottlenecked situations.)

More test results where WCG comes back online. Rosetta is too irregular with handing out units, and the way it does points and such.
Maybe try TN-Grid next, as long as WCG is out.
Workunit sizes and credit/result varies somewhat at TN-Grid, but not much.
Probably best would be to run it for as long as it takes until the hosts settle to receive work of one type of tasks mostly (out of FMA/ AVX/ SSE2).

There are a few 5950Xes listed in top_hosts.php, but I spotted only Windows hosts. I don't think there are notable performance differences between Linux and Windows at TN-Grid though.
 
Jul 27, 2020
16,165
10,240
106
Also, you are testing with Linux Mint 19.2 if I'm not mistaken? That one has kernel version 4.15. Too old. You need minimum 5.16 for Alder Lake:

Windows 11 no longer the fastest OS for Alder Lake: Linux 5.16 on Core i9-12900K emerges winner in most benchmarks - NotebookCheck.net News

Also, Intel Thread Director support will be in upcoming 5.18:

Intel's Thread Director Coming to Linux 5.18 to Fix Alder Lake Performance Issues | Tom's Hardware (tomshardware.com)

You have to manually update the kernel since Mint is only at 5.4 even with the latest release.

So you have to do follow-up testing once 5.18 is released for a proper conclusion.

If you would rather not upgrade the kernel version, maybe disable the E-cores to prevent the Linux scheduler from doing suboptimal scheduling on the E-cores. It has no idea that the E-cores are low power cores.

Unlike Windows 11, Linux currently doesn't have the appropriate support for Intel's Thread Director technology that uses the Enhanced Hardware Feedback Interface (HFI). HFI allows the OS to properly allocate threads to the high-performance Golden Cove and energy-efficient Gracemont cores, which is why Intel's hybrid Alder Lake CPUs perform better under Windows.

Without HFI support, the Linux kernel makes decisions on whether to use P or E cores using Intel's ITMT/Turbo Boost Max 3.0 driver that relies on the information exposed by the firmware. That's why in many cases it prefers the fastest cores with the highest frequency (i.e., Golden Cove cores) and does not use Gracemont cores even when possible, which leads to their underutilization.
 
Last edited:
Jul 27, 2020
16,165
10,240
106
Intel Alder Lake continues use of memory gears - DVHARDWARE

With the launch of Rocket Lake-S, Intel started using gear modes on its memory controller. The idea here is that in Gear 1, the memory controller operates at the same frequency as the system memory, using a 1:1 ratio. This offers the lowest possible memory latency. In Gear 2 on the other hand, the memory controller runs at half the frequency of the memory, resulting in a 2:1 ratio. This allows the memory to achieve a much higher frequency -- at the expense of latency.

You have confirmed that RAM is running at 4000 MT/s so your CPU is in Gear mode 2. Dial back the RAM to 3200 MHz and it will automatically switch to Gear Mode 1, allowing the memory controller to run at full speed and your GB5 ST score will cross 1800. That is the optimum configuration.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Also, you are testing with Linux Mint 19.2 if I'm not mistaken? That one has kernel version 4.15. Too old. You need minimum 5.16 for Alder Lake:
[...]
If you would rather not upgrade the kernel version, maybe disable the E-cores to prevent the Linux scheduler from doing suboptimal scheduling on the E-cores.
Since Markfw loads all cores (and all threads on all SMT capable cores), this doesn't matter. (On the other hand, the bigLITTLE approach doesn't make sense on machines for workloads like this in the first place.)

There are certain DC applications which would benefit from a) loading less than all hardware threads, b) special scheduling. I am thinking of most of PrimeGrid's LLR based subprojects. But these run best with *manual* scheduling, as opposed to automatic scheduling by the operating system's kernel, on processors with more complex structure such as Zen/ Zen2/ Zen3 and, I presume, Alderlake.

You have confirmed that RAM is running at 4000 MT/s so your CPU is in Gear mode 2. Dial back the RAM to 3200 MHz and it will automatically switch to Gear Mode 1, allowing the memory controller to run at full speed and your GB5 ST score will cross 1800. That is the optimum configuration.
The optimum for Geekbench may differ from the optimum for, let's say, cpuGFN17MEGA.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Code:
mark@mark-12700F:~$ lscpu --extended
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    6100.0000 800.0000
1   0    0      0    0:0:0:0       yes    6100.0000 800.0000
2   0    0      1    1:1:1:0       yes    6100.0000 800.0000
3   0    0      1    1:1:1:0       yes    6100.0000 800.0000
4   0    0      2    2:2:2:0       yes    6300.0000 800.0000
5   0    0      2    2:2:2:0       yes    6300.0000 800.0000
6   0    0      3    3:3:3:0       yes    6100.0000 800.0000
7   0    0      3    3:3:3:0       yes    6100.0000 800.0000
8   0    0      4    4:4:4:0       yes    6100.0000 800.0000
9   0    0      4    4:4:4:0       yes    6100.0000 800.0000
10  0    0      5    5:5:5:0       yes    6100.0000 800.0000
11  0    0      5    5:5:5:0       yes    6100.0000 800.0000
12  0    0      6    6:6:6:0       yes    6300.0000 800.0000
13  0    0      6    6:6:6:0       yes    6300.0000 800.0000
14  0    0      7    7:7:7:0       yes    6100.0000 800.0000
15  0    0      7    7:7:7:0       yes    6100.0000 800.0000
16  0    0      8    8:8:8:0       yes    3600.0000 800.0000
17  0    0      9    9:9:8:0       yes    3600.0000 800.0000
18  0    0      10   10:10:8:0     yes    3600.0000 800.0000
19  0    0      11   11:11:8:0     yes    3600.0000 800.0000
Core i7-12700F has got 1.25 MB L2$ for each of the 8 dualthreaded performance cores and 512 kB L2$ for each of the 4 singlethreaded efficiency cores, plus 25 MB unified L3$.
So I have to correct that based on the lscpu output: 1.25 MB L2$ for each of the 8 dualthreaded performance cores and 2.0 MB L2$ combined for all 4 singlethreaded efficiency cores, plus 25 MB unified L3$.

@Markfw, you can use the CPU numbering from the lscpu output if you want to test what happens if you run work only on P cores, or only on E cores. (Or if you go through the pain of setting up two boinc client instances, you could load P and E cores at once but keep their work in separate client instances, visible as separate "hosts" on the respective BOINC project website.)

According to the lscpu output,
  • logical CPUs 0, 2, 4, …, 14 are one set of hardware threads of the 8 P cores,
  • logical CPUs 1, 3, 5, …, 15 are the other set of hardware threads of the P cores,
  • logical CPUs 16…19 are the E cores.
If a boinc client is up (but not yet running any work), you can set its CPU affinity. As soon as boinc starts some work, this will be in subprocesses of the boinc process and as such inherit the CPU affinity which you gave to the boinc client.

a) Look up the PID of the boinc process:
ps -C boinc -o pid=​
b) Tie this process to a subset of logical CPUs.
For example, boinc's PID is 1234 and you want to limit it to one set of hardware threads of the 8 P cores:
sudo taskset -pc 0,2,4,6,8,10,12,14 1234​
Or all threads of the P cores:
sudo taskset -pc 0-15 1234​
Or only E cores:
sudo taskset -pc 16-19 1234​
('sudo' only needs to be prepended if the process runs under a different user ID than the user who is issuing the taskset command.)​
c) After that, let boinc launch new work.

You can verify the CPU affinity of any running process, be it boinc or one of the science tasks, again using the taskset command:
ps -C boinc -o pid= | xargs -l taskset -pc​
shows the affinity of any process with the name boinc.
 
  • Love
Reactions: igor_kavinski

Skillz

Senior member
Feb 14, 2014
926
951
136
Ah yeah, one E core might be as fast as a P thread. Didn't think about that.
 
Jul 27, 2020
16,165
10,240
106
OK, its is slightly faster at 3200, cl19, slower than factory rated cl18. This is stupid, but I guess its how ADL works with DDR4.
At 3200, you may be able to reduce the CL to 16. Try it in UEFI. Probably in Advanced settings.
 
Jul 27, 2020
16,165
10,240
106
And this test shows that ADL draws as much power for 19 virtual cores as 31 for Zen 3, so Zen 3 is more power efficient.
How Does Temperature Affect The Performance of Computer Components? (chron.com)

Excessive heat lowers the electrical resistance of objects, therefore increasing the current. In addition, slowdown is a result of overheating.

Both CPUs should have the same or very similar sized heatsink with the same amount of airflow (fan with similar CFM) to ensure a fair comparison. I know that Zen 3 may still come out ahead but at least, it should be an apples to apples comparison by keeping as many of the factors as constant as possible.