Remember the reason why we were even having this conversation in the first place, is because of SIMD performance. Bandwidth is extremely important for the sustained operations of wide vector SIMD performance, and the primary reason why Intel doubled the bandwidth of the L1 cache in Haswell in the first place was to make using AVX2 worth it.
And we should all know by now that AMD's AVX2 implementation is greatly subpar compared to Intel's.
I never said that Intel's is better, or that AMD's is worse. We were talking about SIMD performance and I mentioned that cache bandwidth is crucial to its performance. Intel's cache design philosophy seems to be focused on making their L1 cache as fast and as accurate as possible, and using the L2 cache for support only. The L3 is used for interprocessor communications and reducing memory traffic, as it's completely unified.
AMD's cache setup is obviously a lot different and I don't know what their focus is, but it's probably not SIMD performance.
You are so wrong and so clueless its not even funny.
Intels AVX/AVX2 implementation is not significantly better then AMD's.
1. Intel take a big hit when in 128bit mode when executing 256bit data vs a traditional 128bit SIMD unit until the top 1/2 of the 256bit unit becomes active.
2. They both have around the same instruction latency/throughput with some better for amd some better for intel.
3. They both decode similarly ( this was an actual problem for Bulldozer)
Now on to the Cache, i dont know what you are smoking but it is so wrong.
The L2 in core is not for "support", it is where the streaming prefetchers are and the L1 and L2 aren't inclusive or exclusive of each other but are both inclusive in the L3.
In Zen the the L1 is write back so i assume it is exclusive(it might be inclusive) of the L2, The L3 hold L2+L1 tag data and maybe some inclusive lines but is largely exclusive.
Whats the difference between the two? really its about multi core scaling and handling of cache coherency. In terms of general performance, generally speaking you can treat them largely as equal.
Now one obvious difference is the width of the read/write ports. intel has end to end 256 bit datapaths ( execution, load store , cache). AMD has end to end 128bit datapaths (execution , load store, cache).
intel can most definitely hit higher throughput, but instruction throughput and latency isn't any better.
What this all
actually means is only in workloads where 128bit load and store becomes a bottleneck does intels design offer advantage. Go look at the stilts data to see how many apps across a large suite of app's that actually is. If high ILP 256bit was actually that common they wouldn't be shutting off 1/2 of their SIMD units by default, would they?
So no, AMD's avx/avx2 is not greatly subpar compared to Intel's, thats just FUD from someone who has no clue what they are talking about!