I am going to say because it wouldn't be possible to make a 3072 core Fermi chip right now. There are many reasons for this. Fermi architecture might have internal bottlenecks elsewhere which means adding extra cores wouldn't allow it to scale well. Secondly, not all cores are equal as you noticed. 2880 CUDA core 780Ti is 2X faster than the 512 CUDA core 580Ti. Therefore, don't assume that a single Fermi CUDA core takes as much transistor space as a single Maxwell core; and they cannot be compared directly.
You cannot use that logic even when comparing Kepler vs. Maxwell as 128 Maxwell cores provide 90% of the performance of 192 Kepler cores at the same clocks. Efficiency changes.
6800Ultra had
16 pipelines @ 400mhz
GTX580 has
512 cores @ 772mhz core / 1544mhz shader
That means if you use simple math, GTX580 should be between 61.76X and 123.52X faster, right....?
No, it's only
18X faster.
http://forums.anandtech.com/showthread.php?t=2298406
That's why the few top electrical engineers who design GPUs get paid so much $; and why there are so few of them. They have to know what is the most optimal way to design future GPU architectures and that means abandoning what you thought worked well during last 2-4 years and adapting. That's why whoever designed VLIW and GCN is almost a genius since that person was able to design an architecture that was flexible for so many years. Think about it, who wouldn't want a GPU architecture that could scale and scale and scale with just newer nodes, more functional units and higher GPU clocks? It's not that simple as eventually all GPU architectures hit bottlenecks and require a full redesign.
Someone who actually designs GPUs or has GPU architectural knowledge would be able to give you a great answer but if it were so easy to just scale existing architectures with higher transistor density, NV/AMD wouldn't be spending $3-4 billion dollars on new architectures every so often. Look at DX12 and Asynchronous Compute shaders. If you need massive parallelism, lots of DirectCompute power and flexibility for future software, Fermi, Kepler and Maxwell are already outdated for NV. You need something entirely new or heavily redesigned. That's why you cannot just scale architectures that were never meant to adapt to software that takes advantage of things the old architecture was never designed to do well. This is similar to how HD5870/6970 were good for games but were horrible for parallelism/compute. That's how GCN came about. Similar to that, let's say GCN was ahead of its time for parallelism, but nothing is free. That means it had to give up something to be better at something else - in this case pixel shading power, geometry performance, integer16 texture performance, polygon throughput, voxel lighting/voxelization performance, conservative rasterization are big problem areas for that AMD architecture.
The irony here is that some programs you may run as an end user may not benefit much from future GPU architectures because future GPU architectures focus on the most popular software and future trends. If the program(s) you use start becoming outdated from a software perspective, future architectures won't linearly improve your performance. It sounds to me like Octane is one of those programs. Think about it, if the goal is to make more realistic games, you have to focus on say lighting. Then the engineers have to figure out the future trends, all the known lighting techniques and how to use scarce transistors to maximize their goals. Let's say they pick that voxel lighting is the future and throw their resources on that technique. The result would be a dramatic improvement for that software technique:
Let's say the software you use doesn't even use voxelization, well too bad. The next gen's graphics card just used up 5% of its transistors on making sure lighting runs 3X faster than last generation's hardware because that's the future trend. That's the risk of designing new GPu architectures is that you must focus on something knowing you can't do everything for everyone. Why am I telling you this because we cannot possibly predict how well Pascal will improve the performance in Octane since we'd have to figure out what areas of the GPU architecture Octane stresses the most and what the focus for NV on Pascal is. That's how you could easily end up with a situation where an $80 GTX580 offers just half the performance of a $1000 Pascal chip in Octane -- as stupid as that sounds that is the reality of GPU design.
I'll give you an example like this on the AMD side. During HD7970 series, there was a heavy focus on double precision performance but since the focus shifted to perf/watt, this was put aside during the current Fiji architecture. There are some distributed computing users on this very forum who use double precision software. Look what happens for them:
2048 shader HD7970Ghz = 1.075Tflops of FP64 performance
4096 shader Fury X = 0.538Tflops of FP64 performance
That means a $650 2015 Fury X with twice as many cores/shaders is theoretically
up to twice as slow as $120 HD7970Ghz for FP64 software. ()
Thankfully for you, there are people who use Octane and will benchmark the latest NV cards so that you know if something is worth purchasing. You don't have to spend 1000 EURO on Pascal and not know how well it'll perform. Thank you Internet!