- Feb 12, 2013
- 3,818
- 1
- 0
For consumer workloads, more cores generally results in worse delivered performance. AKA you are quite far from both reality and theory.
how does it deliver worse performance? especially in relation to the mobile market...
For consumer workloads, more cores generally results in worse delivered performance. AKA you are quite far from both reality and theory.
how does it deliver worse performance? especially in relation to the mobile market...
More cores require more overall power resulting in lower overall clock speeds resulting in lower ST performance which is the only performance that actually matters for 99.99% of actual consumer applications.
More cores require more overall power resulting in lower overall clock speeds resulting in lower ST performance which is the only performance that actually matters for 99.99% of actual consumer applications.
You may be thinking of PCs with a CPU; these were never conceived for real-time use with the internet; mobile phone SoCs were. The A53 is efficient, because of its simplicity; if the real-time load can be met by A53s, then this is sufficient. The A57 or A72 is there for larger, intermittent real-time loads; it is a responsive system.
More cores require more overall power resulting in lower overall clock speeds resulting in lower ST performance which is the only performance that actually matters for 99.99% of actual consumer applications.
Maybe back in the Core 2 era before DVFS, turbo boost, and core power gating were things being done. While mobile cores may not call it turbo boost they still either limit things ability based on activity or they start throttling clock speeds based on thermal or power limits. For any given power load, a dual core and quad core variant are allowed to clock about the same if only one or two cores need to be active, as per your single threaded scenario.
DVFS doesn't turn off power, it only reduces it. More cores even with DVFS will result in a lower top frequency point. If we are going by your theory, 8C Haswell-E should have the same or higher top turbo frequency as 6c Haswell-E, and oh look, it doesn't... And why not? Because thermal limits matter, and more cores even in low DVFS states still take power both active and passive. And just a reminder, this is a Haswell-E design with the best DVFS on the market AND FIVR on die voltage regulation, giving it by far the best power management and response in the industry.
I do not think this discussion is going anywhere because now I understand that you are talking in abstract (i.e. in ideal). I had based my assumptions on what had actually happened.I don't see what this has to do with efficiency of 2+2 vs 4+4 cores, could you explain your line of reasoning to me? Or do you mean that 2+2 using global task switching can be more efficient than 4+4 using cluster migration?
The normal performance state is the system running all eight cores at their designed frequency targets, so nothing is unusual there. The "Balance" and "Power Saving" states differ from what Samsung employs in its own devices in that instead of modifying the scaling logic of the SoC, they simply disable CPU cores entirely via hot-plugging. The "Balance" mode disabled three A15 cores effectively turning the system into a 5-core system with only one big CPU and four little ones. The "Power Saving" mode entirely shuts off the big cluster and runs the SoC as if it were a quad-core A7 system. To see how this affects performance and power, we turn to the PCMark power measurements.
You may be confusing your fantasy land with reality. PCs have been doing real time workloads for DECADEs. Mobile phone SoCs do basically no realtime workloads. Nothing about mobile phones nor their workloads make them in any way more capable of dealing with real time workloads, nor do mobile phones and their workloads make more use of multiprocessor systems than PCs.
DVFS doesn't turn off power, it only reduces it. More cores even with DVFS will result in a lower top frequency point. If we are going by your theory, 8C Haswell-E should have the same or higher top turbo frequency as 6c Haswell-E, and oh look, it doesn't... And why not? Because thermal limits matter, and more cores even in low DVFS states still take power both active and passive. And just a reminder, this is a Haswell-E design with the best DVFS on the market AND FIVR on die voltage regulation, giving it by far the best power management and response in the industry.
Cores that are power gated consume virtually zero power, this is totally separate from DVFS (which is why I listed it separately). It involves a big fat transistor acting as an analog switch for the power supply. Save for a tiny leakage current it's like cutting the power line completely. Everything else created equal, there's no reason why simply having more cores (that can be power gated) would have a tangible impact on power consumption when they're off. But everything isn't always created equal.
There are more factors than core count that could be affecting Haswell-E, for example +5MB L3 cache that probably isn't gated off and more PCI-E lanes that may not be dynamically power budgeted. It does break Intel's previous tradition of having E series processors that turbo'd slightly higher than the non-E series ones in addition to having more cores (but also having a higher TDP)
Try reading:
big-LITTLE processing with ARM Coretex-A15 & Coretex-A7
by Peter Greenhalgh of ARM. He states that, at 1 GHz, handover between clusters of processors can be accomplished in 20 Micro-seconds.
There is more detail in:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0318e/index.html
I believe that PCs were introduced with the 8086 + optional 8087 FPU. They were used for office-work. It seems to me that the A7 and A15 and big.little were designed to be a very responsive real-time system.
ARM power management is described in:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0318e/index.html
Section 3 refers. Cores are powered down.
Power gating itself has lots of ancillary issues. dIdT issues, ripple issues, latency issues, etc. Also can have floorplan impacts as well. So it isn't exactly a panacea. Mostly useful for full sleep states and little else.
Cache power is absolutely minimal. 5930k and 5960x have the exact same number of PCI-E lanes. The frequency differential is really down to the additional power of the 2 additional cores in the design.
We can talk about complications that limit the scenarios you can enter power gating in but none of that changes the fact then when the cores are off and have been off for any significant amount of time they're not really using power. Every modern multicore design that I know of allows power gating individual cores (unless you except the Bulldozer line where it's individual modules...) so whatever impact it had on the design is moot.
You really don't know that.
Can you point to an example in the mobile SoC world (you know, what we're talking about) where an SoC with more of the same type of cores clocks lower? I can't.
The reality is most designs can't power gate a core unless the device is in a suspend or sleep state.
Lets just say I'm pretty certain. Granted its been a bit, but I've seen the numbers.
Can you point to an example where it doesn't or doesn't increase the power?
That hasn't been true for a mobile SoC since Tegra 2 (and that was probably a big reason why it didn't get design wins in phones). If it worked like that there'd be no reason for them to include per-core power gating in the first place as opposed to just one big gate on every core. Power gating actually goes a finer grained than per-core too, for example SIMD units can be power gated.
For a while Android used the hotplug governor to power gate individual cores that have been idle for a while (and bring them back up if there's sufficient demand for a while). There have also been apps that manually gate individual cores. More recently power gating functionality has been moving into the cpuidle manager.
You'll understand if that's not very meaningful to me...
There are several examples of mobile SoCs where new versions came out that included more of the same type of cores, made on the same process, and clock speeds stayed the same or decreased. Some examples:
Snapdragon S3: single core Scorpion 1.5GHz -> S3: dual core Scorpion 1.7GHz
Snapdragon S4: dual core Krait 200 1.5GHz -> S3: quad core Krait 200 1.7GHz
Snapdragon S4: dual core Krait 300 1.7GHz -> S3: quad core Krait 300 1.9GHz
Snapdragon 618: 2x 1.8GHz Cortex-A72 + 4x 1.2GHz Cortex-A53 -> Snapdragon 620: 4x 1.8GHz Cortex-A72 + 4x 1.2GHz Cortex-A53
Tegra 2: dual core Cortex-A9 1.2GHz -> Tegra 3: 1.6GHz Cortex-A9 1.7GHz ("up to 1.7GHz in single core mode")
Exynos 4: dual core Cortex-A9 1.5GHz -> quad core Cortex-A9 1.6GHz
Exynos 5: dual core Cortex-A15 1.7GHz -> 4x Cortex-A15 2.0GHz + 4x Cortex-A7 1.3GHz
i.MX6 dual: dual core Cortex-A9 1.2GHz -> i.MX6 quad: quad core Cortex-A9 1.2GHz
(in this case the SoCs are the same other than the core count)
MT6571: dual core Cortex-A7 1.3GHz -> MT6589 quad core Cortex-A7 1.5GHz
MT6588: quad core Cortex-A7 1.7Ghz -> MT6592 octa core Cortex-A7 2GHz
MT6732: quad core Cortex-A53 1.5GHz -> MT6752: octa core Cortex-A53 1.5GHz
Z2480: dual core Saltwell 2GHz -> Z2580: quad core Saltwell 2GHz
Z3480: dual core Silvermont 2.13GHz -> Z3580: quad core Silvermont 2.33GHz
I don't have any power consumption numbers for how one SoC with higher core counts and cores disabled compares to another with lower counts, that are otherwise of similar/same design because I haven't seen anyone test for this. Not a lot of people really do good testing for power consumption to begin with. But suffice it to say that in the mobile SoC world core count has not been a limiter of clock speed in any case I know of.
Maybe things are different in the server world. Maybe the extra cores not being fused off really dp cost thermal headroom, which could be due to different design priorities. Or maybe there are other reasons that make them not as aggressive as possible with throttling. The server world is very different, here when you get an 12 core processor you're not going to be expected to only need to have one or two cores active very often. But on mobile devices this is the case much of the time, and it's critical that power consumption is minimized.
Power savings of using big.LITTLE vs big only doesn't really have anything to do with what we're talking about (the assertion that extra cores eat away at peak clocks in mobile SoCs)
That's nonsense. Basically all SoCs beginning with A15/A7 have low-latency fine-grained per-core power gating when they're idling for more than 1ms. Before that they still had it on a more coarse granularity via hot-plugging (Around 100-500ms periods) or when only 1 core was online (Samsung's AFTR powers down the whole CPU complex in running mode - this has been around since the Galaxy S2). And this is full-core power gating implemented by the vendor, not internal architectural power-gating like gating off the FPUs and stuff like that which happen on instruction cycle latencies. How do you assume Apple handles leakage on those gigantic cores without burning through the battery?The reality is most designs can't power gate a core unless the device is in a suspend or sleep state.
