Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.
IPC is application specific. There are two different areas where AMD does not have intel beat. One is certain high end server applications (database servers and such) that can make use of a large monolithic last level cache. Zen 2 can only access up to 16 MB from any one core with good latency. Intel xeons go up to 38.5 MB mostly. They have one specialized 55 MB part. AMD will be looking to surpass intel with Zen 3 in this area also. They need to make sure IT departments don’t have any excuses to continue buying intel. I hope that they have a large cache variant for niche server applications and HPC.
The other area is for AVX512 applications. It is obvious that some of the cases are software de-optimizations; intel software forcing bad code paths for competing products. I suspect that AMD may support AVX512 for Zen 3. Zen 2 only has 64 bytes per clock read and 32 bytes per clock write to the L1 cache. That fits pretty well with AVX256 since 256 bits = 32 bytes. Even if they want to increase the AVX256 throughput, they would probably need to double that to 128 bytes per clock read bandwidth. They could then support AVX512 as a full AVX512 unit, by combining two 256 bit units, or doing 512 bit instructions in 2 clocks. Intel has 2 AVX512 units in each core in some cpus though. AMD would need at least 4 AVX 256 units to match that, so they need to double the cache bandwidth over Zen 2. I tend to think that if you have code that can really make use of 512-bit units then you probably should be looking at running it on a gpu anyway, which is much more parallel with much more bandwidth available. The memory capacity issues on gpus will be much less with new HBM 2e. It allows 16 GB capacity per stack, so a 4 stack device could have 64 GB. I assume AMD wants people to use gpus more also, but they still need to keep up with intel or surpass them if possible.
About the only other thing is just raw clock speed. There are probably some applications that do respond to raw clock speed, but I don’t think they are that important. I have to wonder if the ridiculous low resolution gaming benchmarks are responding more to high clock than low memory latency. Intel is going to have a very hard time competing with themselves if that is the case. Their 10 and 7 nm parts are not going to clock as well as the 14 nm process that they have been tweaking for like 5+ years. This one isn’t going to be something AMD can do, but it looks like Intel can’t do it either. Saying that Intel is better for gaming seems ridiculous at this point. For actual reasonable quality settings, most benchmarks are gpu limited and there is no difference.
There doesn’t seem to be a core count increase for Zen 3, but I don’t think they really need it. I had thought that SMT 4 might be a replacement for not increasing the core count, but intel isn’t going to be able to compete on core count anytime soon. Also, after seeing the speed increases of the 3950x, 3960x and 3960x, I don’t think they will need a higher core count if Zen 3 is another large increase. The 2000 series ThreadRippers are way behind 3000 series parts, and that wasn’t even supposed to be that big of an upgrade.
So, 40% IPC in general is certainly not going to happen. For some specific cases, it may be possible. If you go from something that doesn’t fit in cache to something that does, the performance increase can be huge. They also could get massive performance increases if they double the floating point throughput again, but only for specific applications.