Do AMD cpus at least give a smoother desktop experience w/more cores?

VirtualLarry · Mar 7, 2016

AtenRa said:
@ VirtualLarry

Abwx talks about Intel Core Execution ports (Int, FPU, Mul, Add etc, etc)

Is that not describing one single core? And is that core not duplicated, in it's entirety, four times, in Intel's quad-core desktop i5 CPUs?

Threads don't share resources within a core, without HyperThreading. They each get scheduled on their own core to execute, during their timeslice, by the OS scheduler.

VirtualLarry · Mar 7, 2016

coercitiv said:
If the scores and the archival times are from the same instance, the Cinebench + WinRaR test is very interesting. It does indeed show the FX chip to suffer less of a relative penalty when running this kind of loads, as compared to both i5 6600K and i7 6700k. The HT enabled i7 sees a smaller relative penalty in Cinebench, but the WinRaR times are considerably higher (again, relative to self).

However, other results should also be observed: the HEDT platform shows no weakness to the test, with very little degradation in both Cinebench scores and WinRaR times, a very stark contrast to Haswell scores from the mainstream platform. This indicates that either WinRaR is unable to really stress 12+ threads or that there is something else at play here other than common parts of the CPU architecture.

I thought about your post, and then, given my understanding of OS scheduling and multitasking, that the results that you see, are entirely down to L3 cache sizes, and have nothing to do with the i5 being a poor multitasker, core-wise, as Abwx erroneously claims.

After all, the FX CPUs, as well as HEDT, have much larger L3 caches, and WinRAR compressing a file, depending on the size of the file and the dictionary size used, could easily be blowing out the L3 cache on the i5.

So, scorecard, I would say that Abwx two statements about the i5 not being able to handle int + FP threads, and the "no scheduling" (taking place), to be incorrect, but I think that his statement about the L3 to likely be correct.

Abwx said:
as discussed ad nauseam it s likely the cache size that is the (willfull..) segmenting factor, at some point there s not enough bytes/thread in the L3.

That's what I also believe, that the "and WinRAR" benchmarks, are primarily testing / stressing the L3 cache, and that by judiciously limiting the dictionary size used by WinRAR for compression, that they likely could control the outcome of the benchmark, given knowledge of the size of the L3 present on the CPU, and the working-set footprint of Cinebench. (Indeed, they could have directly engineered the results of that benchmark, to show the FX in a better light.)

coercitiv · Mar 7, 2016

VirtualLarry said:
Threads don't share resources within a core, without HyperThreading. They each get scheduled on their own core to execute, during their timeslice, by the OS scheduler.

From what I understand, Abwx implies that due to their OoOE design, cores end up executing instructions from a thread even when their time slice is done and the CPU is currently servicing another thread. I find that hard to believe, but I'm already near the limit of my CPU&OS understanding, in which a thread requires execution context in order to run.

VirtualLarry said:
I thought about your post, and then, given my understanding of OS scheduling and multitasking, that the results that you see, are entirely down to L3 cache sizes, and have nothing to do with the i5 being a poor multitasker, core-wise, as Abwx erroneously claims.

That may be so, but for me the more important finding was that the CPU execution engine is not the main culprit for the performance loss, otherwise we would have seen it in the HEDT platform as well. Even with WinRar not scaling with more than 8 threads, we should still see somewhat similar behavior to the mainstream i7, and we do not.

I would really appreciate it if someone with better understanding of CPU internals could shed some light here.

VirtualLarry · Mar 7, 2016

coercitiv said:
From what I understand, Abwx implies that due to their OoOE design, cores end up executing instructions from a thread even when their time slice is done and the CPU is currently servicing another thread. I find that hard to believe, but I'm already near the limit of my CPU&OS understanding, in which a thread requires execution context in order to run.

Yeah, I don't buy Abwx's theory either. Yes, in HT-enabled Intel CPUs, that much IS true, you can have a core executing different thread's instructions, and I suppose that possibly, switching thread contexts doesn't serialize the core's pipeline, because that would flush and delay the other HyperThread. I guess I had always assuming that thread switches would serialize the core, but now that I think about it, it's possible that it does not. If it doesn't, then some of that could come into play, but you wouldn't have "lingering" threads executing on the core, while a newly-scheduled thread also executed, without HyperThreading enabled on that core.

That may be so, but for me the more important finding was that the CPU execution engine is not the main culprit for the performance loss, otherwise we would have seen it in the HEDT platform as well. Even with WinRar not scaling with more than 8 threads, we should still see somewhat similar behavior to the mainstream i7, and we do not.

That observation even more strongly confirms, IMO, that the issue is L3 cache size. Because if it were core exe resource contention, the Haswell-E / HEDT should behave the same way that the mainstream Haswell desktop CPUs do, since they have the same cores. The fact that they do not, points away from a core resource issue, and to something else in the design of the HEDT that is different than the mainstream desktop. And one of the major somethings, besides PCI-E bandwidth, is the L3 cache size, as well as the quad-channel RAM. (I had forgotten about that, but the additional RAM bandwidth could come into play here as well. But the FX still only has dual-channel RAM. So I feel more confident that the issue is L3 cache size.)

Edit: Thinking about it more, I feel strongly that there is at least SOME serialization going on in the core, when switching thread contexts, because otherwise, a few micro-ops from one (prior) thread, would get access to the page(s) of memory of the (newly-scheduled) thread, and that could present a security issue from the standpoint of the processor architecture.
Likewise, there would be two sets of registers for the memory access descriptors in the processor, one bank for each hyperthread. (This is my speculation.) Unless the AGUs and load/store units only deal with physical addresses somehow.

Abwx · Mar 7, 2016

VirtualLarry said:
Is that not describing one single core? And is that core not duplicated, in it's entirety, four times, in Intel's quad-core desktop i5 CPUs?

Threads don't share resources within a core, without HyperThreading. They each get scheduled on their own core to execute, during their timeslice, by the OS scheduler.

If during a serie of cycles a thread use only data manipulation or doesnt use all exe ressources then another thread can be executed concurrently if it require different exe ressources in the core than the ones needed for the first thread, and no HT is needed for that.

If it wasnt the case CPUS would be less efficient and running application A and app B simultaneously would amount to run app A and then app B with the total execution time being equal to the sum of the separtate execution times if processed one after the other.

VirtualLarry said:
Edit: Thinking about it more, I feel strongly that there is at least SOME serialization going on in the core, when switching thread contexts, because otherwise, a few micro-ops from one (prior) thread, would get access to the page(s) of memory of the (newly-scheduled) thread, and that could present a security issue from the standpoint of the processor architecture.
Likewise, there would be two sets of registers for the memory access descriptors in the processor, one bank for each hyperthread. (This is my speculation.) Unless the AGUs and load/store units only deal with physical addresses somehow.

A second set of register is necessary to implement HT but otherwise a given thread operands and ops are anyway tagged in the pipeline, so this latter can at a given moment contain ops and operands from two different threads.

VirtualLarry · Mar 7, 2016

Abwx said:
If during a serie of cycles a thread use only data manipulation or doesnt use all exe ressources then another thread can be executed concurrently if it require different exe ressources in the core than the ones needed for the first thread, and no HT is needed for that.

No, that's exactly what HT is for. And without HT enabled, what you suggest doesn't happen.

If it wasnt the case CPUS would be less efficient and running application A and app B simultaneously would amount to run app A and then app B with the total execution time being equal to the sum of the separtate execution times if processed one after the other.

That's exactly what a multi-tasking OS does, and if you add up the ~~TSC ticks~~ CPU scheduler process CPU time accounting for the two processes, they should be in the ballpark, whether or not the two processes are scheduled together or separately. (Excluding L3 cache effects from scheduling two processes on the same CPU.)

A second set of register is necessary to implement HT but otherwise a given thread operands and ops are anyway tagged in the pipeline, so this latter can at a given moment contain ops and operands from two different threads.

Yes, but that requires HyperThreading, to have two threads sharing a core. It just doesn't happen otherwise.

Edit: Why don't you tell me the APIC IDs for the virtual cores, to schedule multiple threads per core, on an i5? Because that's what you need to schedule threads per core. Then tell me the APIC IDs for the virtual cores on an i7 with HyperThreading.

Edit: Easier still, look at a CPU-Z screenshot for an i5, and an i7. 4 Cores, 4 Threads for an i5, and 4 Cores, 8 Threads for an i7.

How would you go about running multiple threads per core on an i5? If you ran two threads on the first core, than how are two more threads going to execute on three remaining cores? You would end up wasting complete cores. NO. There's one thread that executes PER CORE, and it gets the ENTIRE CORE RESOURCES, NO SHARING. (Not without HT.)

Abwx · Mar 7, 2016

VirtualLarry said:
How would you go about running multiple threads per core on an i5? If you ran two threads on the first core, than how are two more threads going to execute on three remaining cores? You would end up wasting complete cores. NO. There's one thread that executes PER CORE, and it gets the ENTIRE CORE RESOURCES, NO SHARING. (Not without HT.)

When there s not enough ILP in a thread it can be beneficial to send a second thread to the same core to increase the throughput, and as said HT has nothing to do with it.

For instance if only a Winrar/7zip thread is loading a core then it s likely than most of the Integer ressources will be wasted with a single ALU being busy during a lot of cycles.

VirtualLarry · Mar 7, 2016

Abwx said:
When there s not enough ILP in a thread it can be beneficial to send a second thread to the same core to increase the throughput, and as said HT has nothing to do with it.

VirtualLarry said:
Edit: Why don't you tell me the APIC IDs for the virtual cores, to schedule multiple threads per core, on an i5? Because that's what you need to schedule threads per core. Then tell me the APIC IDs for the virtual cores on an i7 with HyperThreading.

Do you even know what an APIC ID is, and how it relates to CPU cores, both physical and virtual, and how it relates to thread scheduling?

Here's the Intel MPS 1.4 spec:
http://www.intel.com/design/pentium/datashts/24201606.pdf

A better document would be the HyperThreading spec, the ACPI spec (ACPI APIC tables), and some other newer ones.

Basically, since the Pentium CPU, CPU cores have had an embedded local APIC, and at boot time, every logical CPU core gets assigned a unique local APIC ID. This is how cores (physical and virtual, all logical cores) are identified to software.

It's my understanding, that if a physical core only has one local APIC ID assigned to it, then only one logical thread can ever be scheduled to that core at one time.

Abwx · Mar 7, 2016

VirtualLarry said:
Do you even know what an APIC ID is, and how it relates to CPU cores, both physical and virtual, and how it relates to thread scheduling?

http://www.intel.com/design/pentium/datashts/24201606.pdf

That s not relevant to the discussion since this paper is for thread management at high level, that is at the OS level.

If you look at the diagram posted by Atenra then just imagine that a thread can be processed in one exe unit while another thread can be processed in another exe unit, all that is needed is to have two separated ALUs and no HT is required because each thread operands and ops are tagged in the pipeline, the core "know" to wich thread the operands/ops belong.

https://books.google.fr/books?id=roEZCgAAQBAJ&pg=PA425&lpg=PA425&dq=cpu+execute+two+threads+simultaneously&source=bl&ots=Z2GSex8lTq&sig=xdzLd6i61dyH16VSc7dumALRjbI&hl=fr&sa=X&ved=0ahUKEwiE7bOD8q7LAhWGSRoKHZZOBz8Q6AEIPjAE#v=onepage&q=cpu%20execute%20two%20threads%20simultaneously&f=false

AtenRa said:
@ VirtualLarry

Abwx talks about Intel Core Execution ports (Int, FPU, Mul, Add etc, etc)

VirtualLarry · Mar 7, 2016

Abwx said:
That s not relevant to the discussion since this paper is for thread management at high level, that is at the OS level.

It's directly relevant. Tell me how many local APIC IDs / logical processors are reported to the OS on a Skylake or Haswell i5? Then tell me how many cores are shown in the die shot?

If you look at the diagram posted by Atenra then just imagine that a thread can be processed in one exe unit while another thread can be processed in another exe unit, all that is needed is to have two separated ALUs and no HT is required because each thread operands and ops are tagged in the pipeline, the core "know" to wich thread the operands/ops belong.

Yes, your theories require quite a bit of "just imagine".

And on an i5, there's not "the core", there's FOUR identical cores.

cpu+execute+two+threads+simultaneously

Multi-core CPUs can execute multiple threads simultaneously (on multiple cores), that's called multi-threading. Running multiple threads on the same core, is called "simultaneous multi-threading", or SMT. Intel's name for their SMT implementation, on all of their CPUs thus far that have it, is called "HyperThreading".

Intel CPUs REQUIRE "HyperThreading" to be enabled, on their CPUs, in order to execute multiple (two, exactly) threads on each of the (four, in the case of the mainstream i7 CPUs) cores.

Edit: I clicked on your book link. It says restricted, and it's all in french anyways... I saw that it was a Wiley book, which is generally an academic publication.

Is that book just an example of the general theory of SMT and multi-threading, or is it specifically about the Haswell architecture?

Abwx · Mar 7, 2016

VirtualLarry said:
It's directly relevant. Tell me how many local APIC IDs / logical processors are reported to the OS on a Skylake or Haswell i5? Then tell me how many cores are shown in the die shot?

Yes, your theories require quite a bit of "just imagine".

And on an i5, there's not "the core", there's FOUR identical cores.

There s four cores that can deal with more than one thread in each pipeline, as for imagination, well, what do you think thoses millions transistors are used for.?.

Now as shown by Computerbase.de the number of threads is dependent of the cache size, wich is logical, now if Intel restricted the thread count in their APIC it means that they know where is the limit for the i5, you are confusing a software high level implementation with a low level capability, that it s limited to 4 at high level doesnt mean that at low level it s also limited to 4...

VirtualLarry said:
Is that book just an example of the general theory of SMT and multi-threading, or is it specifically about the Haswell architecture?

That s a book for generic multithreaded uarchs, wich is about all current uarchs, as said if a CPU was unable to make use of its unused exe ressources then throughputs would be very low.

And dont come again with HT as a typical thread may use only 35% of the exe ressources, so if a core wasnt using his remaining ressources to process another thread then HT would bring as much as 100% more throughput if not more, actualy HT only allow to extract the last drops.

Edit : Take one of your 2C/2T PC, use 7Zip integrated benchmark with 2 threads and then with 4 threads and see what happen, i just tested this with a Pentium C2D T4400...

VirtualLarry · Mar 7, 2016

Abwx said:
There s four cores that can deal with more than one thread in each pipeline, as for imagination, well, what do you think thoses millions transistors are used for.?.

Micro-ops, within the instruction decode window, from ONE OS thread (on a non-HT-enabled Intel CPU such as the i5 we are discussing), can execute simultaneously in multiple exe pipelines in ONE core, assuming that they are not dependent ops. That's the basics of OoO CPUs. Those micro-ops are not multiple threads. Those micro-ops come from ONE OS-level thread.

Now as shown by Computerbase.de the number of threads is dependent of the cache size, wich is logical, now if Intel restricted the thread count in their APIC it means that they know where is the limit for the i5, you are confusing a software high level implementation with a low level capability, that it s limited to 4 at high level doesnt mean that at low level it s also limited to 4...

It means precisely that. Four OS-level threads, executing at the same time, on a four-core, non-HT Intel CPU.

That s a book for generic multithreaded uarchs, wich is about all current uarchs, as said if a CPU was unable to make use of its unused exe ressources then throughputs would be very low.

And dont come again with HT as a typical thread may use only 35% of the exe ressources, so if a core wasnt using his remaining ressources to process another thread then HT would bring as much as 100% more throughput if not more, actualy HT only allow to extract the last drops.

I suggest that you read up on the difference between OoO and ILP, and SMT and TLP. Because you seem to be constantly confusing the two.

Phynaz · Mar 7, 2016

Abwx said:
When there s not enough ILP in a thread it can be beneficial to send a second thread to the same core to increase the throughput, and as said HT has nothing to do with it.

For instance if only a Winrar/7zip thread is loading a core then it s likely than most of the Integer ressources will be wasted with a single ALU being busy during a lot of cycles.

Larry,
ABWX has no idea what what he's talking about. When a new thread is scheduled on a single threaded CPU core a context switch needs to take place. https://en.wikipedia.org/wiki/Context_switch You can't send a second thread to a core without SMT hardware.

All his posts are nonsense.

VirtualLarry · Mar 7, 2016

Abwx said:
Edit : Take one of your 2C/2T PC, use 7Zip integrated benchmark with 2 threads and then with 4 threads and see what happen, i just tested this with a Pentium C2D T4400...

Why don't you try to tell me, how the OS can schedule 4 threads, to execute on a 2C/2T CPU.

Edit: If you're seeing performance differences, then that is due to cache effects, and not core processing issues.

VirtualLarry · Mar 7, 2016

Phynaz said:
Larry,
All his posts are nonsense.

Yeah, pretty-much.

He doesn't seem to understand multi-tasking (time-sliced) OSes.

TheELF · Mar 7, 2016

Abwx said:
There s four cores that can deal with more than one thread in each pipeline, as for imagination, well, what do you think thoses millions transistors are used for.?.

http://anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/5
Not exactly the same

Instruction level parallelism is one of the holy grails in processor design if you are able to separate a set of code into X instructions and process all X at once (due to a lack of dependencies), then the problem is cracked. The downside is that a lot of code and as a result, a lot of instructions, rely on other instructions and data. The best bit about an out-of-order architecture is that when many different branches of code and instructions are in flight at once, they can be grouped up in the scheduler and a level of instruction parallelism can be achieved.

Abwx said:
Edit : Take one of your 2C/2T PC, use 7Zip integrated benchmark with 2 threads and then with 4 threads and see what happen, i just tested this with a Pentium C2D T4400...

7zip is not well optimized leaving a lot of instructions idle, the same goes for x264.exe

Try the same with cinebench and you get zero improvement because one single thread can use up every available instruction.

AtenRa · Mar 7, 2016

I believe there is a confusion between Instructions, mOPs, Single Thread, SMT and Multi-core.

VirtualLarry · Mar 7, 2016

AtenRa said:
I believe there is a confusion between Instructions, mOPs, Single Thread, SMT and Multi-core.

Yeah, Abwx seems to be under the mistaken impression that just because a CPU core has unused exe resources / pipelines, that you can somehow magically schedule another (OS-level) thread on that same core, without SMT hardware in that core...

Abwx · Mar 7, 2016

VirtualLarry said:
Why don't you try to tell me, how the OS can schedule 4 threads, to execute on a 2C/2T CPU.
.

It can do so because the threads operands and ops are tagged in the core front and back ends, the core doesnt "know" what is a thread, all it "see' are ops and their respective operands.

When sending 2 threads the throughput increase, so that s not scheduling as you are thinking it, with 2 threads the core execute more operations/cycle, wich mean that the operands and ops of the two threads are present simultaneously in the pipeline so a thread can use the ressources that are not used by the other thread during a given or several cycles.

With low ILP like in 7Zip sending two threads to the core allow to make use of more ALUs during each cycle, as said ad nauseam an ALU can process an op of one thread while the other thread is managed in another ALU and this during the same cycle.

jhu · Mar 7, 2016

VirtualLarry said:
Yeah, Abwx seems to be under the mistaken impression that just because a CPU core has unused exe resources / pipelines, that you can somehow magically schedule another (OS-level) thread on that same core, without SMT hardware in that core...

My understanding is that you are correct Larry. Otherwise we would have been running multithreaded programs back before multi-core and SMT became popular to increase performance.

VirtualLarry · Mar 7, 2016

Abwx said:
It can do so because the threads operands and ops are tagged in the core front and back ends, the core doesnt "know" what is a thread, all it "see' are ops and their respective operands.

When sending 2 threads the throughput increase, so that s not scheduling as you are thinking it, with 2 threads the core execute more operations/cycle, wich mean that the operands and ops of the two threads are present simultaneously in the pipeline so a thread can use the ressources that are not used by the other thread during a given or several cycles.

With low ILP like in 7Zip sending two threads to the core allow to make use of more ALUs during each cycle, as said ad nauseam an ALU can process an op of one thread while the other thread is managed in another ALU and this during the same cycle.

You're so wrong it's pathetic. Someone on here has, in their .sig, a comment from an Intel engineer, that you can't know if you are incompetent, because the necessary knowledge to know whether you are incompetent or not, is the same knowledge to arrive at the correct answer. I feel that that applies to you in this situation.

The capability that you are talking about, is called "simultaneous multi-threading", as I have mentioned. If the CPU does not implement HyperThreading (for Intel), then you can't do what you are suggesting happens. No matter how much you wish and imagine it does.

Oh, and there's a reason it's a "2C" CPU, it doesn't just have "a" core, it has TWO. Each of which, execute an OS thread that is scheduled on each core.

Abwx · Mar 7, 2016

jhu said:
My understanding is that you are correct Larry. Otherwise we would have been running multithreaded programs back before multi-core and SMT became popular to increase performance.

If he was correct then stressing a single core with two threads instead of one wouldnt increase the throughput.

For the same matter the FX8350 throughputs in Winrar + CB 11.5 should be at least halved for one or for each apps when running simultaneously, yet CB lose 38% throughput and Winrar only 8%.

The big surprise is the 6700K which does -35% in CB , like the 2600K, but an annoying -29% in Winrar while SB is at -10%....

Notice that despite having no HT the FX get better results overall...

http://www.computerbase.de/2015-10/...gramm-multitasking-test-winrar-plus-cinebench

VirtualLarry said:
The capability that you are talking about, is called "simultaneous multi-threading", as I have mentioned. If the CPU does not implement HyperThreading (for Intel), then you can't do what you are suggesting happens. No matter how much you wish and imagine it does.

Then according to the test above the FX has HT since it can deal with a second soft without the first app throughput being significanty lower...

How could the FX get 62% of its throughput in CB while it only lose 8% of its perfs in Winrar, you think that s it s scheduling in the sense of computation distributed times, as you re implying it, that could get thoses results.?.

VirtualLarry · Mar 7, 2016

Abwx said:
How could the FX get 62% of its throughput in CB while it only lose 8% of its perfs in Winrar, you think that s it s scheduling in the sense of computation distributed times, as you re implying it, that could get thoses results.?.

First of all, the FX has CMT, clustered multi-threading, which I am less familiar with, other that seeing the architecture diagrams from AMD slides showing which portions of the "cores" are shared, within a "module".

But it's a similar idea to SMT. Two OS threads execute per module.

And one explanation for the FX losing only 8% of it's performance in WinRAR, might be that the WinRAR task is far more memory-bound than computation-bound, and thus, adding in another computationally-heavy, but not as memory-bound program, would naturally show very little loss in the memory-bound program.

Do you know if Cinebench scales to 8 cores? I don't know offhand.

http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2

We asked Bernard Seite, technical advisor, AMD, whether we really should regard the two execution units within a Bulldozer Module as cores and were told, ‘If you take the overall group of applications that are running on x86, 90 per cent is integer… We look at how efficient Hyper-Threading [is]. Sometimes you have negative impact, but most of the time, you have something which is in between zero and 40. The Bulldozer Module will never be negative [in its performance gains] – you have two threads, and the two threads are not going to clash.’

Seite did admit that the two execution cores within a Bulldozer module shared more than the floating point unit (the FMAC Scheduler and the unit itself that’s split into two 128-bit halves): ‘ we’re sharing the front end, the Level 2 cache – and there could [theoretically] be conflicts of course, because we have two cores.’ But the Level 2 cache is a healthy 2MB in size to compensate: ‘To avoid conflicts (in L2), we change the associativity, we change also the size… By having different types of associativity – more ways in the associativity – by having a bigger cache – you are avoiding this problem. [You’re] compensating for the sharing.’

Seite also confirmed that as part of the decoupling of the Fetch and Decode units in the front-end of a Bulldozer module (an innovation over previous designs), the front-end unit can accept two threads of work simultaneously and conduct simultaneous sequencing of these threads.

CMT, TWO threads PER MODULE. Not "as many as you magically want to run".

Arachnotronic · Mar 7, 2016

VirtualLarry said:
You're so wrong it's pathetic. Someone on here has, in their .sig, a comment from an Intel engineer, that you can't know if you are incompetent, because the necessary knowledge to know whether you are incompetent or not, is the same knowledge to arrive at the correct answer. I feel that that applies to you in this situation.

The capability that you are talking about, is called "simultaneous multi-threading", as I have mentioned. If the CPU does not implement HyperThreading (for Intel), then you can't do what you are suggesting happens. No matter how much you wish and imagine it does.

Oh, and there's a reason it's a "2C" CPU, it doesn't just have "a" core, it has TWO. Each of which, execute an OS thread that is scheduled on each core.

Owned.

Abwx · Mar 7, 2016

VirtualLarry said:
First of all, the FX has CMT, clustered multi-threading, which I am less familiar with, other that seeing the architecture diagrams from AMD slides showing which portions of the "cores" are shared, withing a "module".

But it's a similar idea to SMT. Two OS threads execute per module.

And one explanation for the FX losing only 8% of it's performance in WinRAR, might be that the WinRAR task is far more memory-bound than computation-bound, and thus, adding in another computationally-heavy, but not as memory-bound program, would naturally show very little loss in the memory-bound program.

Do you know if Cinebench scales to 8 cores? I don't know offhand.

http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2

You, you are offhand because CMT has nothing to do with it, i guess that you realized the weak point of your "logic" but you are now so much involved in sustaining the unsustainable that you are relying on obviously flawed assumptions.

Think a little, the 8 cores are used in Winrar and the FX score 100%, add CB and the FX will retain 92% of its Winrar throughput while providing 65% of its CB throughput, so what has CMT to do with thoses scores..?.

Are you implying that only 4 cores are used for each app..?.

But then why the Winrar score at 92%..?.
And CB at 65%, and all this without HT that you once used as explanation..

Do AMD cpus at least give a smoother desktop experience w/more cores?

No Lifer

No Lifer

Diamond Member

No Lifer

Lifer

No Lifer

Lifer

No Lifer

Lifer

No Lifer

Lifer

No Lifer

Lifer

No Lifer

No Lifer

Diamond Member

Lifer

No Lifer

Lifer

Lifer

No Lifer

Lifer

No Lifer

Lifer

Lifer