AMD APU not showing proper core amounts?

Phaetos · Nov 2, 2014

Just installed Win 8.1 and looking at the Performance tab in Task Manager, it shows:
1 Socket (yes)
2 Cores (nope, got 4)
4 Logical Processors (shouldn't that be 8 if the cores were reading correctly?)

What's the deal here?

VirtualLarry · Nov 2, 2014

Phaetos said:
Just installed Win 8.1 and looking at the Performance tab in Task Manager, it shows:
1 Socket (yes)
2 Cores (nope, got 4)
4 Logical Processors (shouldn't that be 8 if the cores were reading correctly?)

What's the deal here?

You have an FM2/FM2+ APU, correct? Not an AM1?

In that case, what Windows has listed is correct.

Phaetos · Nov 2, 2014

VirtualLarry said:
You have an FM2/FM2+ APU, correct? Not an AM1?

In that case, what Windows has listed is correct.

Correct. Device Manager shows 4 processors, and CPU-Z show 1 processor 4 core 4 thread. So what Windows is showing is not correct. It showed as 4 cores under Win7. 8.1 is reporting it incorrectly.

ShintaiDK · Nov 2, 2014

Windows 8.1 shows it correctly. (2 modules, 4 threads.)

Microsoft no longer accepts AMDs CMT as real cores. But rather on the same level as SMT.

Cerb · Nov 2, 2014

Phaetos said:
Correct. Device Manager shows 4 processors, and CPU-Z show 1 processor 4 core 4 thread. So what Windows is showing is not correct. It showed as 4 cores under Win7. 8.1 is reporting it incorrectly.

How is Windows incorrect? You haven't even told us what the CPU is.

My guess is that Windows is being quite correct, though, both versions, and it's a BD-based APU on an FM2(+) socket.

NostaSeronx · Nov 2, 2014

Phaetos said:
What's the deal here?

Windows 7 does not have the scheduling patch pre-installed. While, Windows 8 and 8.1 does have the patch pre-installed.

Currently, the CPU scheduling techniques that are used by Windows 7 and Windows Server 2008 R2 are not optimized for the AMD Bulldozer module architecture. This architecture is found on AMD FX series, AMD Opteron 4200/4300 Series, and AMD Opteron 6200/6300 Series processors. Therefore, multithreaded workloads may not be optimally distributed on computers that have one of these processors installed in a lightly-threaded environment. This may result in decreased system performance for some applications.

AMD's Bulldozer Module and Intel's Core i Hyperthreading, have the same threading technology in the front-end. So, Microsoft pulled the Hyperthreading optimization and put it on the Bulldozer. While, not correcting the terminology.

It's errata that most likely won't be fixed by Microsoft.

Microsoft officially considers the Bulldozer Module to be two cores, unlike what ShintaiDK states.

Yuriman · Nov 2, 2014

FM2 CPUs top out at 2 module 4 threads (cores). As NostaSeronx said, it's just how windows needs to look at the chip when scheduling tasks, because loading up the second core before the third can cause a hefty performance penalty due to shared resources.

VirtualLarry · Nov 2, 2014

It's not really a "4 core", any more than that "HP Hexacore" is really a hex-core.

Yuriman · Nov 2, 2014

I disagree, it's generally accepted that AMD cores are real cores, albeit slow individually and sharing resources. There was some debate when the FX's first came out but I don't know of anyone who questions whether an FX-8350 is really an 8 core processor.

NostaSeronx · Nov 2, 2014

Yuriman said:
FM2 CPUs top out at 2 module 4 threads (cores). As NostaSeronx said, it's just how windows needs to look at the chip when scheduling tasks, because loading up the second core before the third can cause a hefty performance penalty due to shared resources.

Well the actual purpose of the patch is SPMD.

Windows 7;
Task A = 2 parallel threads
Task B = 1 serial thread

Task A(1A) and Task B(1) would be run in Module A on Cores A and B.
While, Task A(2A) would be run in Module B on Cores A or B.

Windows 7+Hotfix / Windows 8 / Windows 8.1;
Task A = 2 parallel threads
Task B = 1 serial thread

Task A(1A) and Task A(2A) would be run in Module A on Cores A and B.
While, Task B(1) would be run in Module B on Cores A or B.

SPMD = Single Program Multiple Data. The module is built to optimize for such workloads. So, running Multiple Program Multiple Data(MPMD) workloads on a single module is non-optimal.

Cerb · Nov 2, 2014

Yuriman said:
I disagree, it's generally accepted that AMD cores are real cores, albeit slow individually and sharing resources. There was some debate when the FX's first came out but I don't know of anyone who questions whether an FX-8350 is really an 8 core processor.

There is no historical definition of specifically what is a core, v. not a core, until you get close to memory. Either way is correct, so long as the definitions are well defined and consistent.

Phaetos · Nov 2, 2014

Cerb said:
How is Windows incorrect? You haven't even told us what the CPU is.

My guess is that Windows is being quite correct, though, both versions, and it's a BD-based APU on an FM2(+) socket.

I didn't mention the APU? My bad, A10-6800K, socket FM2.

Cerb · Nov 2, 2014

Both 1are correct, then, in their view of the world. Windows 7, by default, does not recognize the shared caches, but that can be fixed. Windows 8 does out of the box, and treats it much like a HT CPU, which should be better for performance. But, with 4 sets of int processing units and L1Ds, 4 cores isn't all wrong, just more superficial than would be ideal.

sm625 · Nov 3, 2014

VirtualLarry said:
It's not really a "4 core", any more than that "HP Hexacore" is really a hex-core.

That's a totally different thing, and not entirely fair anyway. The CMT scaling is actually pretty good. A 2M 4C steamroller based cpu scales at around 80%. The problem isnt the CMT design, its just the fact that the cores are just plain bad/slow.

ShintaiDK · Nov 3, 2014

sm625 said:
That's a totally different thing, and not entirely fair anyway. The CMT scaling is actually pretty good. A 2M 4C steamroller based cpu scales at around 80%. The problem isnt the CMT design, its just the fact that the cores are just plain bad/slow.

Until you add FP loads. Then it scales 0%.

AtenRa · Nov 3, 2014

Bulldozer has 80% scaling, Steamroller has more closely to 90-95%.

AtenRa · Nov 3, 2014

ShintaiDK said:
Until you add FP loads. Then it scales 0%.

FP scales very nicely even in Bulldozer

Edit: Phenom II x6, FX8150 and Core i7 2600K

Abwx · Nov 3, 2014

ShintaiDK said:
Until you add FP loads. Then it scales 0%.

You know that it s wrong, do you.?.

ShintaiDK · Nov 3, 2014

Abwx said:
You know that it s wrong, do you.?.

If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉

Yuriman · Nov 3, 2014

As I understand it, with scaling of about "80%" you actually only get about 160% performance out of two cores as you would with one.

Anandtech bench results of FX-6300:

470/6 = 78.33, which means all cores are performing at around 81.5% due to sharing. Using a second core in a module doesn't make it 80% faster, but rather both cores take a 20% hit so loading up the module fully you get about 60% more performance.

An i5 4690 by comparison is 3% short of linear scaling with 4 cores (used as a control to show potential scaling in Cinebench):

EDIT: Is Cinebench a FP-heavy bench? It may not be representative of the average task. If so, what are some other multithreaded benches that don't use the FPUs as heavily?

soccerballtux · Nov 3, 2014

I still think if they had done a 3+2 wide decode instead of 2+2 wide they could have gotten 100% scaling right out of the box for like 99% of workloads

AtenRa · Nov 3, 2014

ShintaiDK said:
If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉

Scaling is not equal to throughput.

Here another one,

Abwx · Nov 3, 2014

ShintaiDK said:
If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉

What is the relevancy of a comparison with SB/IB in respect of your assumption that it didnt scale at all.?.

Dont try to change the goal posts, you did say that scaling was 0% with more threads, i guess that you mean 0% for more than 2 threads in a 2 modules configuration, either find us data that say so, and you know that you cant, or else it will mean that you re deliberatly misleading the general public.

NTMBK · Nov 3, 2014

ShintaiDK said:
If I was wrong there wouldnt be a need to share the FP unit in a module.

Even on a pure, 100% FPU workload with no integer code whatsoever, you would get >0% scaling. It can swap in the second thread when the first one stalls on memory access, branch misprediction, whatever. The FPU is basically SMT, and as such pure FPU workloads will scale much the same as on an SMT core. So yes, a 2 module PD chip scales much like a 2 core Sandy Bridge chip in that case.

But of course the vast majority of code isn't purely FPU bound. It depends on what your use case is what scaling you get. *shrug*

Abwx · Nov 3, 2014

Yuriman said:
As I understand it, with scaling of about "80%" you actually only get about 160% performance out of two cores as you would with one.

Anandtech bench results of FX-6300:

470/6 = 78.33, which means all cores are performing at around 81.5% due to sharing. Using a second core in a module doesn't make it 80% faster, but rather both cores take a 20% hit so loading up the module fully you get about 60% more performance.

These estimations dont hold with Kaveri since it did solve the shared front end penalty.

AMD APU not showing proper core amounts?

Senior member

No Lifer

Senior member

Lifer

Elite Member

Diamond Member

Diamond Member

No Lifer

Diamond Member

Diamond Member

Elite Member

Senior member

Elite Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer