sefsefsefsef
Senior member
Some ARM cores are really tiny. For example, at 40 nm, an ARM Cortex A5 is only 0.53 mm^2, including caches, and would consume only 80 mW if you could run it at 1 GHz.
http://www.arm.com/products/processors/cortex-a/cortex-a5.php
In terms of die area, and ignoring on-chip networks, and uncore-type stuff, you could fit 554 such ARM cores in the area of a single quad-core Sandy Bridge CPU (and this is comparing TSMC 40 nm to Intel 32 nm, so on the same node, it would be even more). In terms of power budget, you could fit 1187 ARM cores in the power budget of a single quad-core Sandy Bridge CPU.
Let's say you want to build a ~300 mm^2 chip (like 4C Sandy Bridge) out of these ARM cores, and that you're going to have to dedicate 50% of your die area to things like uncore, memory controllers, and some last level cache. And then let's round down for some nice numbers, so we end up with a 256 core ARM chip. This is a totally feasible thing that could be built right now.
FYI, Amdahl's law is not why these things aren't being made. If you can find just the right workload it might make sense to build such a chip. Namely, an embarrassingly parallel / data parallel workload that is bound neither by memory bandwidth/core nor memory capacity/core, and for some reason doesn't map well to a SIMD GPU, or needs more memory capacity than a GPU can offer. Gustafson's law trumps Amdahl's law in most practical situations, so people on this board shouldn't be worried about Amdahl's law, in my opinion.
http://www.arm.com/products/processors/cortex-a/cortex-a5.php
In terms of die area, and ignoring on-chip networks, and uncore-type stuff, you could fit 554 such ARM cores in the area of a single quad-core Sandy Bridge CPU (and this is comparing TSMC 40 nm to Intel 32 nm, so on the same node, it would be even more). In terms of power budget, you could fit 1187 ARM cores in the power budget of a single quad-core Sandy Bridge CPU.
Let's say you want to build a ~300 mm^2 chip (like 4C Sandy Bridge) out of these ARM cores, and that you're going to have to dedicate 50% of your die area to things like uncore, memory controllers, and some last level cache. And then let's round down for some nice numbers, so we end up with a 256 core ARM chip. This is a totally feasible thing that could be built right now.
FYI, Amdahl's law is not why these things aren't being made. If you can find just the right workload it might make sense to build such a chip. Namely, an embarrassingly parallel / data parallel workload that is bound neither by memory bandwidth/core nor memory capacity/core, and for some reason doesn't map well to a SIMD GPU, or needs more memory capacity than a GPU can offer. Gustafson's law trumps Amdahl's law in most practical situations, so people on this board shouldn't be worried about Amdahl's law, in my opinion.
