• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

AMD APU not showing proper core amounts?

Phaetos

Senior member
Just installed Win 8.1 and looking at the Performance tab in Task Manager, it shows:
1 Socket (yes)
2 Cores (nope, got 4)
4 Logical Processors (shouldn't that be 8 if the cores were reading correctly?)

What's the deal here?
 
Just installed Win 8.1 and looking at the Performance tab in Task Manager, it shows:
1 Socket (yes)
2 Cores (nope, got 4)
4 Logical Processors (shouldn't that be 8 if the cores were reading correctly?)

What's the deal here?

You have an FM2/FM2+ APU, correct? Not an AM1?

In that case, what Windows has listed is correct.
 
You have an FM2/FM2+ APU, correct? Not an AM1?

In that case, what Windows has listed is correct.

Correct. Device Manager shows 4 processors, and CPU-Z show 1 processor 4 core 4 thread. So what Windows is showing is not correct. It showed as 4 cores under Win7. 8.1 is reporting it incorrectly.
 
Windows 8.1 shows it correctly. (2 modules, 4 threads.)

Microsoft no longer accepts AMDs CMT as real cores. But rather on the same level as SMT.
 
Correct. Device Manager shows 4 processors, and CPU-Z show 1 processor 4 core 4 thread. So what Windows is showing is not correct. It showed as 4 cores under Win7. 8.1 is reporting it incorrectly.
How is Windows incorrect? You haven't even told us what the CPU is.

My guess is that Windows is being quite correct, though, both versions, and it's a BD-based APU on an FM2(+) socket.
 
What's the deal here?
Windows 7 does not have the scheduling patch pre-installed. While, Windows 8 and 8.1 does have the patch pre-installed.
Currently, the CPU scheduling techniques that are used by Windows 7 and Windows Server 2008 R2 are not optimized for the AMD Bulldozer module architecture. This architecture is found on AMD FX series, AMD Opteron 4200/4300 Series, and AMD Opteron 6200/6300 Series processors. Therefore, multithreaded workloads may not be optimally distributed on computers that have one of these processors installed in a lightly-threaded environment. This may result in decreased system performance for some applications.
AMD's Bulldozer Module and Intel's Core i Hyperthreading, have the same threading technology in the front-end. So, Microsoft pulled the Hyperthreading optimization and put it on the Bulldozer. While, not correcting the terminology.

It's errata that most likely won't be fixed by Microsoft.

Microsoft officially considers the Bulldozer Module to be two cores, unlike what ShintaiDK states.
 
Last edited:
FM2 CPUs top out at 2 module 4 threads (cores). As NostaSeronx said, it's just how windows needs to look at the chip when scheduling tasks, because loading up the second core before the third can cause a hefty performance penalty due to shared resources.
 
Last edited:
I disagree, it's generally accepted that AMD cores are real cores, albeit slow individually and sharing resources. There was some debate when the FX's first came out but I don't know of anyone who questions whether an FX-8350 is really an 8 core processor.
 
FM2 CPUs top out at 2 module 4 threads (cores). As NostaSeronx said, it's just how windows needs to look at the chip when scheduling tasks, because loading up the second core before the third can cause a hefty performance penalty due to shared resources.
Well the actual purpose of the patch is SPMD.

Windows 7;
Task A = 2 parallel threads
Task B = 1 serial thread

Task A(1A) and Task B(1) would be run in Module A on Cores A and B.
While, Task A(2A) would be run in Module B on Cores A or B.

Windows 7+Hotfix / Windows 8 / Windows 8.1;
Task A = 2 parallel threads
Task B = 1 serial thread

Task A(1A) and Task A(2A) would be run in Module A on Cores A and B.
While, Task B(1) would be run in Module B on Cores A or B.

SPMD = Single Program Multiple Data. The module is built to optimize for such workloads. So, running Multiple Program Multiple Data(MPMD) workloads on a single module is non-optimal.
 
Last edited:
I disagree, it's generally accepted that AMD cores are real cores, albeit slow individually and sharing resources. There was some debate when the FX's first came out but I don't know of anyone who questions whether an FX-8350 is really an 8 core processor.
There is no historical definition of specifically what is a core, v. not a core, until you get close to memory. Either way is correct, so long as the definitions are well defined and consistent.
 
How is Windows incorrect? You haven't even told us what the CPU is.

My guess is that Windows is being quite correct, though, both versions, and it's a BD-based APU on an FM2(+) socket.

I didn't mention the APU? My bad, A10-6800K, socket FM2.
 
Both 1are correct, then, in their view of the world. Windows 7, by default, does not recognize the shared caches, but that can be fixed. Windows 8 does out of the box, and treats it much like a HT CPU, which should be better for performance. But, with 4 sets of int processing units and L1Ds, 4 cores isn't all wrong, just more superficial than would be ideal.
 
Last edited:
It's not really a "4 core", any more than that "HP Hexacore" is really a hex-core.

That's a totally different thing, and not entirely fair anyway. The CMT scaling is actually pretty good. A 2M 4C steamroller based cpu scales at around 80%. The problem isnt the CMT design, its just the fact that the cores are just plain bad/slow.
 
That's a totally different thing, and not entirely fair anyway. The CMT scaling is actually pretty good. A 2M 4C steamroller based cpu scales at around 80%. The problem isnt the CMT design, its just the fact that the cores are just plain bad/slow.

Until you add FP loads. Then it scales 0%.
 
Until you add FP loads. Then it scales 0%.

FP scales very nicely even in Bulldozer

Edit: Phenom II x6, FX8150 and Core i7 2600K

go0mqkj
 
You know that it s wrong, do you.?.

If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉
 
As I understand it, with scaling of about "80%" you actually only get about 160% performance out of two cores as you would with one.

Anandtech bench results of FX-6300:

xPaOJxq.png


470/6 = 78.33, which means all cores are performing at around 81.5% due to sharing. Using a second core in a module doesn't make it 80% faster, but rather both cores take a 20% hit so loading up the module fully you get about 60% more performance.


An i5 4690 by comparison is 3% short of linear scaling with 4 cores (used as a control to show potential scaling in Cinebench):

CoHdqhu.png


EDIT: Is Cinebench a FP-heavy bench? It may not be representative of the average task. If so, what are some other multithreaded benches that don't use the FPUs as heavily?
 
Last edited:
I still think if they had done a 3+2 wide decode instead of 2+2 wide they could have gotten 100% scaling right out of the box for like 99% of workloads
 
If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉

Scaling is not equal to throughput.

Here another one,
my4u1hj
 
If I was wrong there wouldnt be a need to share the FP unit in a module.

You can try run Linpack or something and tell me the throughput. It will for some odd reason of chance end up in the ballpark of a dualcore SB/IB 😉


What is the relevancy of a comparison with SB/IB in respect of your assumption that it didnt scale at all.?.

Dont try to change the goal posts, you did say that scaling was 0% with more threads, i guess that you mean 0% for more than 2 threads in a 2 modules configuration, either find us data that say so, and you know that you cant, or else it will mean that you re deliberatly misleading the general public.
 
If I was wrong there wouldnt be a need to share the FP unit in a module.

Even on a pure, 100% FPU workload with no integer code whatsoever, you would get >0% scaling. It can swap in the second thread when the first one stalls on memory access, branch misprediction, whatever. The FPU is basically SMT, and as such pure FPU workloads will scale much the same as on an SMT core. So yes, a 2 module PD chip scales much like a 2 core Sandy Bridge chip in that case.

But of course the vast majority of code isn't purely FPU bound. It depends on what your use case is what scaling you get. *shrug*
 
As I understand it, with scaling of about "80%" you actually only get about 160% performance out of two cores as you would with one.

Anandtech bench results of FX-6300:

470/6 = 78.33, which means all cores are performing at around 81.5% due to sharing. Using a second core in a module doesn't make it 80% faster, but rather both cores take a 20% hit so loading up the module fully you get about 60% more performance.

These estimations dont hold with Kaveri since it did solve the shared front end penalty.

CB115.png
 
Back
Top