So, where is AMD Seattle?

jpiniero · Apr 23, 2015

imported_ats said:
The problem is the microservers as you envision them don't have enough single thread performance. Hence why people like facebook are using D-1540 instead of Seattle/Avaton for their light tiers.

I wouldn't think ST would matter too much. Now, MT perf/W... yes. And I would expect them to be using it for far more than just their light tiers. Application serving, I'm sure they are using some sort of distributed database, memory caching... would be better off using the 1540 compared to Intel (or anyone elses) traditional server type products.

imported_ats · Apr 23, 2015

jpiniero said:
I wouldn't think ST would matter too much. Now, MT perf/W... yes. And I would expect them to be using it for far more than just their light tiers. Application serving, I'm sure they are using some sort of distributed database, memory caching... would be better off using the 1540 compared to Intel (or anyone elses) traditional server type products.

You would be wrong thinking that ST wouldn't matter too much. This is not really a great fault in your thinking, because many have made the exact same mistake before. This all goes back to Cray's great line about the 1000 chickens and Amdahl's law.

The problem is really expected response latency. While an infinite number of slow cores could theoretically serve an infinite number of users, it is highly unlikely to serve any of them at a reasonable latency. At a maximum, companies and people do not like a perceived response time greater than 1 second. Realistically, you want a perceived response time much less than 1 second, and of course that 1 second includes both sides of the transaction, and all communication latency. As such the acceptable response time on any given part of the transaction is significantly less than 1 second.

Then you have to factor in the actual software stacks, etc. The reality is that ST performance really does matter, even on the light tiers of an application infrastructure. Though you don't have to take my word for it, Facebook has said as much wrt their experiments with lightweight/microservers.

As for why the 1540 would be restricted to the light tier, that mainly an issue of scale. For places like facebook, the majority of their application/db tier is handled by moderately high end DP XEON systems. This is both because they have significant frequency advantages (upwards of 50%), significant concurrency advantages (upwards of 4x the processors), significant interconnect advantages (upwards of 40-50x), and significant capacity advantages (upwards of 8x(memory) to 20x(local storage)). As an example, the MS (I know this node better and its probably the best design out atm) DP Xeon node supports up to 1TB of memory, 8 M.2 4x PCIe 3.0 SSDs, and 8 in packaging HDDs. In contrast, the 1540 nodes that people are putting together generally are composed of 32-128GB of memory and 1-2 Sata based SSDs. As such the DP Xeon nodes allow significantly more in memory work, and the caching of significantly more of the total DB. This makes it significantly easier on the programmer to get sufficient performance and throughput.

Abwx · Apr 23, 2015

mrmt said:
They are wrong in not pricing lower enough. 30% less price isn't enough to offset the lower 20% performance with 25% more power consumption compared to Atom C. If you will run this server in your garage or in a corner of your office, then it's a fine deal, but if you are going to a Datacenter, then Seattle TCO will be bad.

Clulessness say that the 10GBE power comsumption is not factored on those innacurate computations, nor are the perfs since the 103 SPEC score is for SPECint_Rate wich is measure of bandwith more than anything else, for the record an A10 7850K is at 90-93 SPEC int_Rate while its SPECint score is at 31, to compare with Avoton s 17.5....

https://www.spec.org/cgi-bin/osgresults

DrMrLordX · Apr 23, 2015

imported_ats said:
The problem is that the kinds of workloads viable for Seattle have also been viable for Avaton, and Avaton has been available for almost 2 years and in already in production enviroments.

Yeah, it is a problem. AMD is shooting at a moving target, and they haven't even pulled the trigger. Delays are never good. Add in to that the problem of software ecosystem . . . now, in all fairness to the server ARMy, there are ARM servers out there and there is an existant software stack on Linux (for people who want that) that has been in maturation for years, at least in non-production environments anyway. The potentially-big issue I see is in device driver headaches for onboard hardware that comes with Seattle systems that may not be currently-supported out-of-the-box by standard ARM-compiled Linux distros.

I'm still not 100% sold on the idea that Seattle is below the minimum latency threshold for light-duty server work. There's a cap (perhaps a soft cap) on how much raw ST throughput is necessary to respond to and retire a webserver request, database query, or whatever other task you have in mind. The entire idea of microservers is predicated on the idea that said cap is pretty low. That's the main reason why firms like the now-defunct Calxeda, Applied Micro, and Cavium took the ARM server plunge in the first place.

I'm not trying to pooh-pooh Broadwell-D. That's an excellent chip for the power envelope, and it's no surprise to me why people are buying so many of them. I'm just not sure that is the precise cutoff point where people issuing queries to the db/webserver will stop being pissed about overall turnaround time. Until we see production silicon (hurry up AMD) it will be hard to know exactly what kind of latency issues we'll see with Seattle.

Abwx · Apr 24, 2015

ARM unveiled their next gen perfs, wich will set the bar for AMD K12 and A1100 replacements..

cortex-a72-a15-a53-power-consumption-at-different-processes-640x360-635x357.png

http://wccftech.com/arm-cortex-a72-unveiled/

ShintaiDK · Apr 24, 2015

Shame its basicly a copy/paste of the X-Gene type marketing.

jpiniero · Apr 24, 2015

ShintaiDK said:
Shame its basicly a copy/paste of the X-Gene type marketing.

You know, I only realized that X-Gene is TSMC 40 nm. So that's a big part of the problem, heh.

C2750 gets around 6650 or so in the Geekbench MT score. The Galaxy S6/Edge? Roughly 5000 or so. I'm sure Samsung is cheating, but still it's a phone. So it'd be quite disappointing if AMD can't beat the C2750 at 25 W or so.

ShintaiDK · Apr 25, 2015

jpiniero said:
You know, I only realized that X-Gene is TSMC 40 nm. So that's a big part of the problem, heh.

C2750 gets around 6650 or so in the Geekbench MT score. The Galaxy S6/Edge? Roughly 5000 or so. I'm sure Samsung is cheating, but still it's a phone. So it'd be quite disappointing if AMD can't beat the C2750 at 25 W or so.

Even at 28nm for example. The X-gene wont be touching Atom in performance/watt. The difference is simply too great. And the Core series is even further away for ARM.

Fjodor2001 · Apr 25, 2015

Looks like Intel will get spanked by ARM in the server segment once the new ARM cores on 14/16 nm hit the market. Better on both price and perf/watt.

Enigmoid · Apr 25, 2015

Spec int rate is a memory benchmark. Not surprising.

This is looking like the x-gene marketing claim where the intel setup was configured with 4 GB RAM.

Notice that the comparison is power consumption (CPU, interconnect, and cache) for the ARM platform and total TDP for the intel comparison.

Lets not forget this.

http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/6

Not really optimized and preliminary but A57 is pretty much on par with A15 on the same process (20nm).

Nothingness · Apr 25, 2015

Enigmoid said:
Spec int rate is a memory benchmark.

Really? Anything to back this claim?

Enigmoid · Apr 25, 2015

Nothingness said:
Really? Anything to back this claim?

Look up the literature and benchmarks.

The tests put a lot of cache pressure on the CPU and combined with the large data size means a ton of memory accesses.

Considering the X-gene never wins a single real world benchmark against avaton by this margin it is suspicious (X-gene is quad channel). Its not wholly memory but memory strongly influences it.

I would expect the A1100 to do much better than this chart indicates.

Nothingness · Apr 25, 2015

Enigmoid said:
Look up the literature and benchmarks.

You're the one doing the claim, so what about linking the litterature you read?

The tests put a lot of cache pressure on the CPU and combined with the large data size means a ton of memory accesses.

That doesn't make it a memory benchmark.

ShintaiDK · Apr 25, 2015

Nothingness said:
You're the one doing the claim, so what about linking the litterature you read?

That doesn't make it a memory benchmark.

It doesnt take more than a quick look on spec.org to see rates is very memory/cache focused.

Example.
i7 4790K 4 cores 4Ghz Haswell.
~225 INT_rates
~160 FP_rates

E5-2637v3(1chip) 4 cores, 3.5Ghz Haswell.
~244 INT_rates (Expected ~197)
~225 FP_rates (Expected ~140)

Benefit of 7MB more cache and quadchannel DDR4 is 23.8% for INT_rates and 60.7% for FP_rates.
Remember this is actually not that bandwidth limited like more cores would be.

Enigmoid · Apr 25, 2015

ShintaiDK said:
It doesnt take more than a quick look on spec.org to see rates is very memory/cache focused.

Example.
i7 4790K 4 cores 4Ghz Haswell.
~225 INT_rates
~160 FP_rates

E5-2637v3(1chip) 4 cores, 3.5Ghz Haswell.
~244 INT_rates (Expected ~197)
~225 FP_rates (Expected ~140)

Benefit of 7MB more cache and quadchannel DDR4 is 23.8% for INT_rates and 60.7% for FP_rates.
Remember this is actually not that bandwidth limited like more cores would be.

Thanks.

IMO spec rate isn't terribly useful as it can give misleading results with regards to real-world performance.

I believe that seattle will perform quite a bit better than avaton despite spec rate preliminary benchmarks.

Seattle has a much better caching system with L3 (remember avaton is dual core L2 module based and there is an access penalty for syncing threads across modules) and will perform better than expected against avaton in server benchmarks. It absolutely will not be able to compete with 8 core BW-D (which seems be be around a haswell 6 core) but may put up a decent fight against the 4 core D-1520

Edit: Though it certaintly does not look good for seattle to have such low spec rate scores.

monstercameron · Apr 25, 2015

Enigmoid said:
Thanks.

IMO spec rate isn't terribly useful as it can give misleading results with regards to real-world performance.

I believe that seattle will perform quite a bit better than avaton despite spec rate preliminary benchmarks.

Seattle has a much better caching system with L3 (remember avaton is dual core L2 module based and there is an access penalty for syncing threads across modules) and will perform better than expected against avaton in server benchmarks. It absolutely will not be able to compete with 8 core BW-D (which seems be be around a haswell 6 core) but may put up a decent fight against the 4 core D-1520

Edit: Though it certaintly does not look good for seattle to have such low spec rate scores.

Is avaton bt or ct cores? And did ct get an ipc bump over bt?

Arachnotronic · Apr 25, 2015

monstercameron said:
Is avaton bt or ct cores? And did ct get an ipc bump over bt?

Avoton is Silvermont; same CPU core found inside of Bay Trail.

ShintaiDK · Apr 25, 2015

monstercameron said:
Is avaton bt or ct cores? And did ct get an ipc bump over bt?

CT is simply shrinked BT. Its mainly better turbos, cost saving and substaintially better IGP.

Avoton(name not used anymore) is 22nm. So its "BT". 14nm comes later this year. Expect 2x or more cores with "Denverton".

jpiniero · Apr 25, 2015

In Looking at the Avoton series, I would think that the C2730 would be a bigger problem for something like Seattle. The clock speed on it drops to 1.7 Ghz but the TDP is only 12 W. Intel did cripple the memory to only support 32 GB of ram instead of 64 though.

Abwx · Apr 25, 2015

According to Geekbench the A57 is somewhat faster per MHz than Kabini, on integer workloads Kabini has 20-25% better IPC than Silvermont, the Geekbench scores suggest that a 2GHz A57 is roughly as fast as a 2.7-2.8GHz Silvermont.

Besides, and as i pointed it, the A1100 TDP include a 2 x 10GbE ethernet controler, wich is not the case of the AVoton s 25W, add a chip for thoses controlers and the total TDP will exceed the one of the A1100.

http://www.anandtech.com/show/7724/...arm-based-server-soc-64bit8core-opteron-a1100

DrMrLordX · Apr 25, 2015

Sadly, Geekbench is just one other data point, and we haven't actually seen Seattle running Geekbench yet. AMD might have screwed-up their implementation of the A57 somehow.

Abwx · Apr 25, 2015

They screwed up nothing, the SPECint_rate number just say that one plateform is likely using 1330MHz RAM while the other is at 1600MHz, the SPECint2006 score of the C2750 is 17.5, this number is indicative of integer computation throughput, for the record a A10 7850K has a SPECint2006 score of 31 or so.

https://www.spec.org/cgi-bin/osgresults

https://www.spec.org/cpu2006/results/res2014q3/cpu2006-20140715-30431.html

imported_ats · Apr 25, 2015

jpiniero said:
In Looking at the Avoton series, I would think that the C2730 would be a bigger problem for something like Seattle. The clock speed on it drops to 1.7 Ghz but the TDP is only 12 W. Intel did cripple the memory to only support 32 GB of ram instead of 64 though.

Memory really isn't an issue. To get to 64GB on Avoton requires 16GB dimms which requires buying IM products which carry about an extreme premium, so about the only people who would ever do 64GB are web reviews.

imported_ats · Apr 25, 2015

Abwx said:
According to Geekbench the A57 is somewhat faster per MHz than Kabini, on integer workloads Kabini has 20-25% better IPC than Silvermont, the Geekbench scores suggest that a 2GHz A57 is roughly as fast as a 2.7-2.8GHz Silvermont.

Besides, and as i pointed it, the A1100 TDP include a 2 x 10GbE ethernet controler, wich is not the case of the AVoton s 25W, add a chip for thoses controlers and the total TDP will exceed the one of the A1100.

http://www.anandtech.com/show/7724/...arm-based-server-soc-64bit8core-opteron-a1100

You might be able to one day make a point if you actually used actual data. FYI, C2750 TDP is 20W.

And as always, geekbench is a useless.

Abwx · Apr 25, 2015

imported_ats said:
You might be able to one day make a point if you actually used actual data. FYI, C2750 TDP is 20W.

And as always, geekbench is a useless.

This is actual data :

Abwx said:
They screwed up nothing, the SPECint_rate number just say that one plateform is likely using 1330MHz RAM while the other is at 1600MHz, the SPECint2006 score of the C2750 is 17.5, this number is indicative of integer computation throughput, for the record a A10 7850K has a SPECint2006 score of 31 or so.

https://www.spec.org/cgi-bin/osgresults

https://www.spec.org/cpu2006/results/res2014q3/cpu2006-20140715-30431.html

So.?..

So, where is AMD Seattle?

Lifer

Senior member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Senior member

Senior member

Lifer