Bulldozer wasn't supposed to be good at everything, it was supposed to excel at INT workloads while being reasonable on FP workloads. It also relied on more cores in order to compensate for the lower IPC.
Not that I think AMD will release a product so bad as Bulldozer, but the strategy behind it isn't anything new.
Cache latency was one of the big IPC killers on Bulldozer and its progeny. AMD's presentation slides specifically indicated improved cache performance for Zen, so at least there's an internal awareness of this issue and efforts taken to fix it.
Also, it's important to distinguish between trading off FPU performance in general (which Bulldozer did) and trading off AVX performance in particular. The FPU in Bulldozer is used for a lot of stuff, much of which routinely shows up in mainstream apps; not only actual floating point, but also MMX and SSE2 integer vector operations, go through it. This is where the corner cutting really hurts. On the other hand, AVX performance is a much more acceptable sacrifice. AVX, in practice, often gives only marginal improvements over SSE2, and is used in a lot fewer applications.
It's worth pointing out that even Sandy Bridge made some
compromises with AVX to save die space:
Sandy Bridge allows 256-bit AVX instructions to borrow 128-bits of the integer SIMD datapath. This minimizes the impact of AVX on the execution die area while enabling twice the FP throughput, you get two 256-bit AVX operations per clock (+ one 256-bit AVX load).
I think we can expect something similar from AMD this time around.
Also, remember that Intel didn't consider AVX-512 important enough to justify its die-space expense on the desktop and mobile SKUs. Once you get above 128 bits, you run into diminishing returns with vector operations - unless your whole code path is massively parallel, in which case you should just be running it on a GPU instead.