And why not both?4x shader engines just dont work.With 6x shader engines they gain +50% geometry performance and +50% pixel fillrate and afr better shader utilization.And what if they add 512bit like with hawaii?😀
4224sp
96rops
6xhader engines
all is +50% vs hawaii
with 512bit and GDDR6 14Ghz it will have 896gb/s bandwidth
But i think navy will be small GPU(still GCN and 256bit) around GTX 1080 performance competing with GTX2060.
http://www.freepatentsonline.com/20180121386.pdf
Quote from the patent:
A super single instruction, multiple data (SIMD) computing structure and a method of executing instructions in the super-SIMD is disclosed. The super-SIMD structure is capable of executing more than one instruction from a single or multiple thread and includes a plurality of vector general purpose registers (VGPRs), a first arithmetic logic unit (ALU), the first ALU coupled to the plurality of VGPRs, a second ALU, the second ALU coupled to the plurality of VGPRs, and a destination cache (Do$) that is coupled via bypass and forwarding logic to the first ALU, the second ALU and receiving an output of the first ALU and the second ALU. The Do$ holds multiple instructions results to extend an operand by-pass network to save read and write transactions power. A compute unit (CU) and a small CU including a plurality of super-SIMDs are also disclosed.
This patent allows AMD to schedule more job with each cycle, because the amount of instructions, are smaller in size, and more easy for the GPU execute, and fill the gaps in workload distribution more efficiently. It saves a lot of Register File Size, memory bandwidth and power, however to fully function it requires second bit:
http://www.freepatentsonline.com/20180144435.pdf
Direct quote from the patent:
The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.
Both of those patents actually give you pretty nice look at what actually was the bottleneck in graphics pipeline of AMD GPUs, apart from Geometry performance.
If this is bases of Navi, it will be interesting to see how it affects performance. At the very least GPUs should be more efficient on architectural level.
And remember guys. 4 Shader Engine, 2304 GCN core chip, with high enough core clocks(1.9 - 2.1 GHz) will be in the ranges of GTX 1080. And with 80 mTR/mm2 from TSMC 7 nm HPC 100 mm2 die GPU will have 8 Bln transistors. Almost 50% more than Polaris 10 had with the same core count. AMD will have a ton of transistors to burn on the silicon design.
Balancing 4096 GCN core chip still can provide amazing effects, if it can be scheduled efficiently.
I am actually baffled that nobody is concerned about RTX 2070 performance which will register only 3 triangles with each clock, because it has only 3 GP Clusters, compared to TU104 which has 4, and to TU2 which has 6. Why it won't have problems? Because Nvidia GPUs already have the tech AMD patented recently since Maxwell architecture.