AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

bjt2 · Feb 17, 2017

The Stilt said:
I don't think anything particular happens. If the reviewers would only use 256-bit workloads, then there wouldn't be too many workloads in their test suite.
I don't see any reason to avoid 256-bit workloads either, since they are becoming more and more common (which they would already be, if Intel hadn't been sand bagging for years). For example it is hard if not impossible to find a modern video encoder which wouldn't support 256-bit AVX/2 (X264, X265, VPX).

For example in X265, the gain from AVX2 is >20% on Haswell and newer. Also many of the heavier workloads e.g rendering (Blender, Embree, etc) support AVX/AVX2 as do many of the scientific workloads / libraries. Since Ryzen in 8C/16T configuration is a HEDT oriented
part, I see no reason to exclude those workloads.

Since in Handbrake from new horizon seems than Zen is 7% faster than BDWE (and so 4% faster than skylake), I suppose that also Zen gains from 256 bit instructions...

itsmydamnation · Feb 17, 2017

The Stilt said:
Indeed.
NBody, Linpack, X265, WinRar and Himeno do.
5/26.

Awesome and are you going to also test on a 8 core haswell/broadwell with the same forced vector widths?

looks like your going to have the best data of any of the reviewers! Are you going to publish it or just dump it all in a forum post?

bjt2 said:
Since in Handbrake from new horizon seems than Zen is 7% faster than BDWE (and so 4% faster than skylake), I suppose that also Zen gains from 256 bit instructions...

I think it would be very much determined by load store, if you get good register reuse and thus lower the percentage of loads and stores per x86 op then Zen should be fine.

for example linpack will probably fall quite a long way behind broadwell-E

The number of floating-point operations is 2/3n3 and the number of data references, both loads and stores, is 2/3n3. Thus, for every add/multiply pair we must perform a load and store of the elements, unfortunately obtaining little reuse of data

Justinbaileyman · Feb 17, 2017

Um does Ryzen actually have 256bit instructions??I thought it was 128bit split into 2 sets acting like 256bit?? I am really really liking the fact its kicking butt in Handbrake as this is the whole main purpose of me purchasing this CPU!!

imported_jjj · Feb 17, 2017

cytg111 said:
Its an old debate, how much software do you use that is properly threaded? That needs threading?

To me the lack of scaling above 4 cores feels like a myth at this point and that's why i am asking as i can't quite find what needs to scale and doesn't.
Wasn't being hostile, just trying to find the truth.

itsmydamnation · Feb 17, 2017

Justinbaileyman said:
Um does Ryzen actually have 256bit instructions??I thought it was 128bit split into 2 sets acting like 256bit?? I am really really liking the fact its kicking butt in Handbrake as this is the whole main purpose of me purchasing this CPU!!

Yes Zen supports 256bit instructions (bulldozer did as well). They decode to 1 uop and when they reach the FPU scheduler/dispatch it is split into 2 uop one for the lower order and one for the higher order. These are then executed over the 4 128bit pipelines as needed. Remember these pipelines are full pipelined meaning in the typical case even if an instruction takes 5 cycles to complete it can still issue a new instruction to the same pipeline in the next cycle, so your 256bit op takes 6 cycles where your 128bit op takes 5.

Because Zen has 4 128bit pipelines that can all perform a large amount of operations throughput in the FPU shouldn't be a problem except for FMA. The bigger problem is getting these 256bit vectors in and out of the core. Zen can execute 512bits of AVX/2 data a cycle but can only store 128bits. Skylake can execute 512bits of AVX/2/FMA a cycle but can store 256bits.

imported_jjj · Feb 17, 2017

The Stilt said:
Frankly I cannot see anything wrong in the "conclusion" example you gave, as long as both the strengths and the weaknesses are equally taken into account, well documented & explained.

That is slightly harder to arrange than before, but I should be able to provide application specific power measurements at least for some workloads.

I am very curious about 2-5% load too,maybe you could try to plot a curve - W vs load in percentage, at least vs the 6900k since doing it for everything would be a lot of work.
Multitasking gaming plus light workloads could be seen as an attempt to favor Ryzen but i think it's a valid real world scenario, seen it done for Broadwell-E before.

The Stilt · Feb 17, 2017

itsmydamnation said:
Awesome and are you going to also test on a 8 core haswell/broadwell with the same forced vector widths?

looks like your going to have the best data of any of the reviewers! Are you going to publish it or just dump it all in a forum post?

I'm wasn't using any specific compiler options in terms of the vector width. I left that to the compiler to decide.
As for the other compiler settings for the custom binaries, I used QaxCORE-AVX2 for ICL (combined with /O2 and /fp: precise) and -O3 together with the common (µarch) supported instruction sets for GCC.

No µarch specific optimization (ICL 2017 now allows that too, for some reason

) was done for any of the binaries and everything that was compiled with ICL / iFortran was manually patched to treat both of the vendors equally (dispatcher).

I didn't choose ICL by an accident, since I tested all of the compilers (when possible) prior selecting the one I ended up using. On average ICL 2017 performed 13% better than MSVC 2015 or GCC in FP code, while total range of a difference was -100% - +700%... Pretty funny that the µarchs which gained most from using ICL 2017 were Excavator and Zen. The difference with Zen wasn't nearly as large as it was with XV thou.

In my tests Zen goes against Excavator, Haswell-E and Kaby Lake for IPC tests, and head to head (absolute performance) with 5960X and 7700K. The SMT yield is also measured on all three.

Due to the amount of stuff in my write-up it isn't really feasible to publish it at the forums. I plan to release it as a PDF document. Regardless what it end up being, I need to have it looked at before I release any of it. Some of the stuff might need some sanitization...
If everything goes as planned, it will contain some explanation how various things work in Zen, among with other interesting things.

CentroX · Feb 17, 2017

people say this is flagship 1800X score

if so, it is kicking a 1800 dollar cpu!!

Justinbaileyman · Feb 17, 2017

Ok thanks for the info.. So basically Skylake should have faster throughput since it can store 256bit vs. Rysen which only can store 128bit? Is this only with AVX and AVX2 Instructions?

itsmydamnation · Feb 17, 2017

Justinbaileyman said:
Ok thanks for the info.. So basically Skylake should have faster throughput since it can store 256bit vs. Rysen which only can store 128bit? Is this only with AVX and AVX2 Instructions?

Yes only with 256bit AVX and AVX2 instructions both can have 128bit ops as well, also FMA. But if your instruction mix has a lower level of load/store then there is enough time to load and store the data without it becoming a bottleneck. But a lot of SIMD FP workloads are very load /store heavy.

imported_jjj · Feb 17, 2017

We also need memory scaling ,clocks and timings.
I usually favor tight timings over clocks but with 8 cores and 2 channels, decent BW might matter more, to a point.

PPB · Feb 17, 2017

The Stilt said:
I don't think anything particular happens. If the reviewers would only use 256-bit workloads, then there wouldn't be too many workloads in their test suite.
I don't see any reason to avoid 256-bit workloads either, since they are becoming more and more common (which they would already be, if Intel hadn't been sand bagging for years). For example it is hard if not impossible to find a modern video encoder which wouldn't support 256-bit AVX/2 (X264, X265, VPX).

For example in X265, the gain from AVX2 is >20% on Haswell and newer. Also many of the heavier workloads e.g rendering (Blender, Embree, etc) support AVX/AVX2 as do many of the scientific workloads / libraries. Since Ryzen in 8C/16T configuration is a HEDT oriented
part, I see no reason to exclude those workloads.

Vray supports embree, but only for certain calvulations (and uses fp32 instead of DP) . The rest is good ol' SSE2/3 and mostly fp64

And we are talking about THE biased renderer of the archviz/cg industry

Sent from my XT1040 using Tapatalk

CatMerc · Feb 17, 2017

The Stilt said:
I'm wasn't using any specific compiler options in terms of the vector width. I left that to the compiler to decide.
As for the other compiler settings for the custom binaries, I used QaxCORE-AVX2 for ICL (combined with /O2 and /fp: precise) and -O3 together with the common (µarch) supported instruction sets for GCC.

No µarch specific optimization (ICL 2017 now allows that too, for some reason ) was done for any of the binaries and everything that was compiled with ICL / iFortran was manually patched to treat both of the vendors equally (dispatcher).

I didn't choose ICL by an accident, since I tested all of the compilers (when possible) prior selecting the one I ended up using. On average ICL 2017 performed 13% better than MSVC 2015 or GCC in FP code, while total range of a difference was -100% - +700%... Pretty funny that the µarchs which gained most from using ICL 2017 were Excavator and Zen. The difference with Zen wasn't nearly as large as it was with XV thou.

In my tests Zen goes against Excavator, Haswell-E and Kaby Lake for IPC tests, and head to head (absolute performance) with 5960X and 7700K. The SMT yield is also measured on all three.

Due to the amount of stuff in my write-up it isn't really feasible to publish it at the forums. I plan to release it as a PDF document. Regardless what it end up being, I need to have it looked at before I release any of it. Some of the stuff might need some sanitization...
If everything goes as planned, it will contain some explanation how various things work in Zen, among with other interesting things.

Make sure to spam that PDF link everywhere, wouldn't want to miss it

imported_jjj · Feb 17, 2017

CentroX said:
people say this is flagship 1800X score

if so, it is kicking a 1800 dollar cpu!!

It it were real and the 6 cores result is also real , would suggest that XFR pushes ST clocks at some 4.3GHz for this result.
If it is real.

bjt2 · Feb 17, 2017

itsmydamnation said:
Awesome and are you going to also test on a 8 core haswell/broadwell with the same forced vector widths?

looks like your going to have the best data of any of the reviewers! Are you going to publish it or just dump it all in a forum post?

I think it would be very much determined by load store, if you get good register reuse and thus lower the percentage of loads and stores per x86 op then Zen should be fine.

for example linpack will probably fall quite a long way behind broadwell-E

If the data are truly streamed, without reuse, then only the RAM BW matters. Even if you have 512 bit datapath, the bottleneck will be RAM BW...

lopri · Feb 17, 2017

I really don't need anything other than these two:

1. Super Pi (any digits 1M or more)
2. Linpack using SSE4. (turn H/T off) Problem size something like 50000.

Someone please run them on Zen and report back.

lolfail9001 · Feb 17, 2017

CentroX said:
people say this is flagship 1800X score

Listen and believe? Please, you know better.

Besides, that SMT scaling looks ******* good.

EDIT: Noticed The Stilt's explanation. Well, that should cover it, actually. May just be sort of legitimate.

.vodka · Feb 17, 2017

CentroX said:
people say this is flagship 1800X score

if so, it is kicking a 1800 dollar cpu!!

Where did you find that?

KTE · Feb 17, 2017

The Stilt said:
Frankly I cannot see anything wrong in the "conclusion" example you gave, as long as both the strengths and the weaknesses are equally taken into account, well documented & explained.

That is slightly harder to arrange than before, but I should be able to provide application specific power measurements at least for some workloads.

This might be a lot harder for you to test, but is Docker containerisation (build) also something you can test?

The reason I'm asking is as it is critical for enterprise cloud migrations right now.

Sent from HTC 10
(Opinions are own)

lobz · Feb 17, 2017

Arachnotronic said:
Oh, really?

http://wccftech.com/amd-bristol-ridge-a12-9800-am4-platform-performance/

Bristol Ridge was to Carrizo what Kaby Lake is to Skylake.

Really?? Did kaby lake get ddr5? Have I missed something?

inf64 · Feb 17, 2017

CentroX said:
people say this is flagship 1800X score

if so, it is kicking a 1800 dollar cpu!!

Looks legit if compared to 1500 12T model:
ST 1888 x 4.2 (XFR?) / 3.7 = 2143 ; XFR at work? just by using 4Ghz as Turbo we land 7% short of the alleged score of top model.
MT 12544 x 8 /6 x 3.6 / 3.4 = 17709 ; 6C part seems to have ran this test at 3.4Ghz.

zinfamous · Feb 17, 2017

richierich1212 said:
Should've waited for official reviews of Ryzen since it was going to launch soon. It all depends on if you need to build your system now.

If not, there's no harm in waiting for Ryzen and/or Intel's response.

Intel is already offering a 7740k. It's like a 7700k, but it has a new hat.

IEC · Feb 17, 2017

imported_jjj said:
It it were real and the 6 cores result is also real , would suggest that XFR pushes ST clocks at some 4.3GHz for this result.
If it is real.

I'm going to assume it's fake until proven otherwise. I've seen a bunch of fakes from Chinese forums already...

unseenmorbidity · Feb 17, 2017

I automatically assume anything from Asia is fake...

Teizo · Feb 17, 2017

imported_jjj said:
It it were real and the 6 cores result is also real , would suggest that XFR pushes ST clocks at some 4.3GHz for this result.
If it is real.

Noticed in the supposed 6 core picture that it was running at only .374v at 3.4ghz without turbo (unless CPU-Z just isn't reading everything correctly). Color me skeptical on how legit that picture is. Not that I want it to be...just we live in the era of fake news now.

http://pclab.pl/zdjecia/artykuly/blind/2017/02/amdryz/amd582.jpg

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Senior member

Platinum Member

Golden Member

Senior member

Platinum Member

Senior member

Golden Member

Senior member

Golden Member

Platinum Member

Senior member

Golden Member

Golden Member

Senior member

Senior member

Elite Member

Golden Member

Golden Member

Senior member

Platinum Member

Diamond Member

No Lifer

Elite Member

Golden Member

Golden Member