The difference between heckling and accurate assertions backed by industry standard information :
https://www.lanl.gov/conferences/salishan/salishan2011/3moore.pdf

Frequency and Single-threaded performance .... The focus of processors for decades hits a fundamental wall... The focus shifts to multi-core/multi-threaded performance. Single-threaded performance primarily achieves gains w/ gate shrink as opposed to fundamental architectural changes as one would expect from a mature and well engineered execution pipeline formed over decades.

Single-threaded performance walls hit... Industry transitions to Multi-core. Multi-core architectures have negative impacts on single threaded performance thus the divergence in performance going forward on single thread as reflected in both slide captions. Later comes HSA which uses custom accelerators for various work flows that cannot be accelerated by CPUs/GPUs which both have fundamental limitations. Take for instance dedicated hardware encode/decode asics found on GPUs for : .h264 and .h265.
http://www.gotw.ca/publications/concurrency-ddj.htm
CFD
A cute deflection from the core reality that can't be refuted :
(single threaded performance has fundamentally plateaued).... The hacks and optimizations have largely been figured out. The present-day improvements are not and will not be inline with the past. This is proven by the graph I posted and numerous other individuals who've put forth their own graphs. Out of order execution only works if the current instruction has no dependency on the prior's data output. Random data access leads to cache misses and tanks performance. Modern pipelines have been tweaked to reasonable limits with room for small iterative increases. With gate shrinks comes the ability to efficiently target elements that may yield single-threaded performance boosts. However, the hard work and big gains has been done for decades in this area... Known to anyone who knows computing history or taking an intro course to computing...Which is why single-threaded performance can be
accurately stated as sufficient. Given that
memory is the true bottleneck and subsequent memory stalls in which the CPU can't compute anything for a thread, the speed up for single-threaded performance will actually come from memory advancements as they have for some time.
The solver portion of CFD is embarrassingly parallel which is why compute clusters and super computers are used. This is where the embarrassing speed up comes from... Not from a 300Mhz bump in processor speed that executes instructions faster.
Single threaded performance is going to be what its going to be : sufficient.
IF/ID/EX/MEM/WB.. Comp Arch 101. Take even the most complex micro-architecture and it all comes down to the basics : IF/ID/EX/
MEM/WB. The bottleneck is memory access which is why so much work goes into memory hacks and optimizations regarding caching, pre-fetch, and things like Out of order execution.
So, one can rant all they want about their specific use case to deflect from the core discussion. The debate can become as complicated as one would like.. However, that doesn't change physical reality and basics which all chip architectures are built on. As it relates to CFD, Workstations with lower core counts, higher clocks, and single threaded performance for pre/post.. compute clusters for solvers. The future has been multi-core and multi-threaded for some time. A workstation isn't a single core.. Current recommendation is 8 core. AMD is recommended as of Ryzen (because its sufficient). Absolutely everyone who legitimately works in the industry will laugh at you if you suggest otherwise and for basic reasons that are taught freshman year in college or standard industry knowledge.
For execution :
A->B->C->D with chain dependencies can only execute as fast as you can clock a processor, how quickly you can get data into the CPU, and how simple you can make your pipeline for a given flow. This goes directly against multi-core design requirements which is why the graph shows single-threaded performance diverging and in some cases decreasing going forward.
The bottleneck for single-threaded performance is memory access.
When your CPU doesn't have to stall for 10-100 cycles for the data it needs to push a thread through the pipeline, your thread will execute faster. Comp Arch 101.
This is the quote that started off the entire argument....
It was an accurate one as backed by data. The assumption was that I was an idiot and was directly stated by one user which is why this went off the rails. I find this stance to be common for someone who actually doesn't know what they're talking about (projection) and I eventually walk away from such mediums. A huge number of threads are filled with the usual detractors and the same template nonsensical arguments.
I don't think most competent/knowledgeable people in this thread would argue against you regarding the importance of MT performance. But please don't mistake other's qualification of your statement as a personal attack. I'm here trying to keep my interest/passion in computer architecture up in the face of my boring classes, and it makes me sad if people (who claim) to work in the field I want to work in can't have a sane discussion.
And yet that's exactly what certain users did w/o competency/knowledge which is why this went off the rails. But i'm the bad guy for calling them out w/ supported data?
Sad? You are in for a rollercoaster of emotions in this industry if you think you can inaccurately make assertions, feign knowledge you don't have, and talk down someone's data backed points with nonsense... I've directly witnessed cases where people were fired on the spot for doing so. If you don't know what you're talking about, you better ask questions and be silent. Trying to over talk someone w/ unbacked foolishness is a quick way to get yourself put out of a job especially in a no fault employment state.
My time here has expired.