How should AMD proceed forward with Validated ECC Support for AM4?

cbn

Lifer
Mar 27, 2009
12,968
221
106
According to the following link ECC on AM4 is only partially functional:

http://www.hardwarecanucks.com/foru...ws/75030-ecc-memory-amds-ryzen-deep-dive.html

In conclusion, what is currently available on the AM4 platform is an incomplete implementation of ECC. This is very likely why motherboard manufacturers have been relatively hesitant about claiming that their products support ECC memory in ECC mode. Based on our findings, there is clearly some level of ECC functionality that is working right now, but it does not cover the full spectrum of memory error detection and correction. Having said that, the status quo is arguably better than nothing, especially since single-bit errors are much more likely than multi-bit errors (which are often caused by a failing memory module), so I suspect that many people will still want the extra protection that is available right now.

While actual ECC validation will likely never occur on this consumer platform, if public interest in this feature keeps growing we fully expect motherboard manufacturers to step up to the plate and improve their ECC support. However, we strongly suspect that AMD will first have to release an update to their CPU microcode to fully unlock all of the necessary settings. Furthermore, there definitely needs to be some work done at the operating system level to let users know when ECC is enabled and what it is doing, more so on the Windows side than the Linux one.

So what do you think AMD should do?

Fully validate ECC on the new AM4 when Pinnacle Ridge debuts (Feb 2018)?

Maybe have ECC enabled Opteron (and ECC enabled FirePro APUs) in addition to non-ECC Pinnacle Ridge Ryzen offerings.....but also allow the existing Summit Ridge Ryzen and Bristol Ridge APU Processors to work with ECC on the new AM4.

Other ideas?

P.S. The Bristol Ridge APUs have 1/2 rate Double Precision Floating point, but this needs ECC in order to useful. (SIDE NOTE: In the past, I have thought APUs for desktop were not desirable for reasons I posted here....but if the APU dies were used for FirePro desktop APUs* (rather than regular consumer desktop APUs)......)

*Hopefully Raven Ridge also have 1/2 rate Double Precision Floating Point. (and also HBM2 to enhance the usage of the 1/2 rate Double Precision Floating Point). If such a processor existed I think it could provide incentive for software makers (like Autodesk with their CFD application) to shift simulations over to the GPU (from CPU).
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Thoughts on chipsets for validated ECC on the new AM4?

X470 only?

X470 plus B450?

Or specific server/workstation chipset?
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
AMD will Not spend the time nor money on that. They did that for the more used CPU scenarios, like Epyc and TR. Its up to mobo makers and I've yet to see a mobo maker fully commit on regular Ryzen for ECC. Although there is a market for it. As least AMD doesn't block it on mainstream like the typical Intel. Intel even locks ECC from their SK-X setup it seems.
 
  • Like
Reactions: scannall

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
If there will be a validated version of AM4. It will not be as a platform. It will be Ryzen Pro and the OEM's taking the time to validate their boards.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,034
126
If there will be a validated version of AM4. It will not be as a platform. It will be Ryzen Pro and the OEM's taking the time to validate their boards.
You don't think that they will come out with an "A320 Pro" chipset, or "B350 Pro", or whatnot, that supports validated ECC? Or do you think that it will solely come down to the CPU validation plus mobo vendor validation?
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
You don't think that they will come out with an "A320 Pro" chipset, or "B350 Pro", or whatnot, that supports validated ECC? Or do you think that it will solely come down to the CPU validation plus mobo vendor validation?

If AMD comes out with Opteron and/or FirePro APU for AM4......then I'm thinking that any motherboard (X470, B350, A420, X/B/A 4xx, etc.) that supports these workstation chips would have validated ECC support. In contrast, AM4 boards that don't support Opteron and FirePro wouldn't have validated ECC Support.

With that mentioned, what would happen if a person has a B450 board (supporting Opteron) and then installs a Ryzen chip from the current AM4 generation (eg, 1800X)? Does this processor also have validated ECC support?

P.S. Something else to think about besides ECC UDIMM support would be ECC RDIMM support.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Although there is a market for it. As least AMD doesn't block it on mainstream like the typical Intel.

Low end Intel chips (Celeron, Pentium, Core i3) do have ECC support. It's the Core i5 and Core i7 that don't support ECC. For these higher core counts a person needs a E3 Xeon.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Intel even locks ECC from their SK-X setup it seems.

Intel also did that with the LGA 2011-3 consumer processors (eg, Core i7 6800K). However, a person could still install a E5 Xeon v3 or v4 on an X99 and get ECC Support (even for ECC RDIMMs).
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
You don't think that they will come out with an "A320 Pro" chipset, or "B350 Pro", or whatnot, that supports validated ECC? Or do you think that it will solely come down to the CPU validation plus mobo vendor validation?

CPU + Mobo vendor. This isn't even a Segmentation thing. I doubt that AMD wants to invest a bunch of money into Validating a consumer/general business platform. AMD might (key word) take if there is enough demand to validate the Pro CPU's but leave the general validation to the OEMs to negotiate.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
I doubt that AMD wants to invest a bunch of money into Validating a consumer/general business platform.

What about Workstation? Opteron (or FirePro APU) processor rather than Ryzen or Ryzen Pro.

P.S. At one time Intel used a consumer chipset for some of its professional workstations. (Example: Dell Precision T3500, HP Z400 and Lenovo S20 (LGA 1366 1P Workstations) used the X58 chipset along with Xeons and ECC UDIMMs. The LGA 1366 2P Workstations did, of course, use a sever chipset (Intel 5520).)
 
Last edited:

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
What about Workstation?

P.S. At one time Intel used a consumer chipset for some of its professional workstations. (Example: Dell Precision T3500, HP Z400 and Lenovo S20 (LGA 1366 1P Workstations) used the X58 chipset along with Xeons and ECC UDIMMs)
AMD sells two platforms single socket platforms with Validated ECC support. Both have consumer core count solutions at consumer CPU prices for workstation and server level feature sets. Ryzen 7 Pro is Pro in the sense of having business level features like remote systems and advanced encryption systems Ryzen Pro are still meant for consumer level system usage. Ryzen 7 might be able to compete with previous workstation in throughput, but it is still a basic product stack like a i7 or i5.

Keep in mind that Intel with the X58 was only in the early stages of their segmentation. But this isn't about Intel anyways. The X399 is X370, AMD has said as much. It's not chipset it's platform and while AMD could take X370 with AM4 Ryzen's and sell them as X370 Pro with ECC I doubt it. That's not the kind of segmentation that AMD seems interested in. This is a resource vs. value lineup and in the same way Intel has taken any post Nehalem Core I product line or platform an "workstationed" it. AMD isn't as well Core count makes it seem like a light server or workstation system. But it's not it's IO, memory capacity, and memory bandwidth limited. AMD is a resource limited company. Validated ECC support on this type of product is severely limited niche. At least in the sense of OEM support. This is an off market use case and allowing ECC without validation is probably the most we will see from AMD. Its only compounded by the fact that the people want this functionality are unlikely to be buying validated OEM systems. While I can understand the desire not to have to pay more than you want to the $200-$300 increase to cost for Intel to offer a 1900x over an 1800x as an AMD Precision offering is really a drop in the bucket for the companies that would be purchasing them.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
AMD sells two platforms single socket platforms with Validated ECC support. Both have consumer core count solutions at consumer CPU prices for workstation and server level feature sets.

That is true (EDIT: This TR board didn't have ECC working on launch so doesn't that mean TR ECC validation is on the OEM side, not from AMD?), but these platforms don't support APUs.

And ECC is needed for the 1/2 rate double precision floating point of the Bristol Ridge APU iGPU*. (See discussion below about the old Nvidia Titan Black vs. Nvidia Tesla for an example of why ECC is important with FP64)

http://www.advancedclustering.com/hpc-cluster-blog-gtx-vs-tesla/

First, and most importantly, the high-end GTX GPUs like the Tesla do not use ECC (error checking and correction) memory. ECC memory includes extra memory bits designed to detect and fix memory errors, which is of paramount importance to the successful completion of high performance, double-precision code. ECC memory ensures that the results of computations run on a Tesla are the same every time; the same tasks run on a high-end GTX card like the Titan can vary from job to job. Clearly, for scientific computing, the Tesla offers the best consistency.

*Though with this mentioned Bristol Ridge does lack Memory bandwidth. Perhaps if Raven Ridge also gets 1/2 rate double precision on its iGPU this problem with be fixed via a single stack of HBM2 (with ECC).
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
What AMD really needs is more OEM support. AM4 is their "low end" catch all platform which allows for a full range of products between low priced budget systems to high end workstations. Leaving ECC support enabled but leaving it up to the OEMs allows the OEMs to both segment and cut corners. In the ideal case OEM support would eventually reach a level where solely cutting corners is no longer the best way to offer new AMD based products and ECC validation turns out an efficient way of value differentiation against competitors.
 

Octoploid

Junior Member
Sep 21, 2017
3
1
16
ECC is fully working with Ryzen even on cheap ASROCK motherboards.
You can easily check by carefully overclocking your RAM timings until
corrected ECC errors occur while running a userspace memory checker.

If you want your machine to stop on uncorrected errors, run:
Code:
 echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue
(It looks like the hardwarecanucks guy doesn't know about this
and rambles about possible data corruption.)
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
That is true, but these platforms don't support APUs.

Not to be a jerk but feels better saying it this way. You say that like it means something. Unlike let's say Intel who has no other solution than to Knight Landing as a co-processor/dedicated task CPU. AMD sells these Computational units as part of their almost as large GPU business. In a server where you could fit 2 APUs, AMD can sell you 4 cards with 256 CU's total. The APU's would at best if EPYCfied would only be 88. With much better clock speeds and memory bandwidth. In many ways the APU lineup outside of compact but "powerful" "ultrabook" workstation laptops, there is almost no need for ECC and even less for double precision FP from the GPU part of the APU. Even those workstation laptops go waaaay against the grain because maximizing perform by adding weight and lowering portability has never been a problem for that market.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
In many ways the APU lineup outside of compact but "powerful" "ultrabook" workstation laptops, there is almost no need for ECC and even less for double precision FP from the GPU part of the APU. Even those workstation laptops go waaaay against the grain because maximizing perform by adding weight and lowering portability has never been a problem for that market.

Workstation (and HPC) would have been the original reason AMD included 1/2 rate double precision floating point on Bristol Ridge. If Raven Ridge has the same 1/2 rate double precision floating point with HBM2 (ECC enabled) then I think that could be a real breakthrough for programs like Autodesk CFD that use a double precision solver on the CPU only. (Re: Having 1/2 rate double precision become more affordable (and prevalent) on graphics should encourage the addition of GPU offloading capability).

SIDE NOTE: Open Foam (an open source CFD program) does have the ability to use the GPU for FP64 via the SpeedIT plugin.

AMD sells these Computational units as part of their almost as large GPU business. In a server where you could fit 2 APUs, AMD can sell you 4 cards with 256 CU's total. The APU's would at best if EPYCfied would only be 88. With much better clock speeds and memory bandwidth.

The current 64 CU dGPUs (Radeon, Instinct, Frontier edition, FirePro) only have 1/16 rate double precision floating point.

It won't be until Vega 20 (also 64 CU) that an AMD dGPU has 1/2 rate double precision floating point.

With that mentioned, I don't expect too many workstation users to have a Vega 20 dGPU (it will be very expensive like a Quadro GP100).
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
What AMD really needs is more OEM support. AM4 is their "low end" catch all platform which allows for a full range of products between low priced budget systems to high end workstations. Leaving ECC support enabled but leaving it up to the OEMs allows the OEMs to both segment and cut corners. In the ideal case OEM support would eventually reach a level where solely cutting corners is no longer the best way to offer new AMD based products and ECC validation turns out an efficient way of value differentiation against competitors.

Intel is very strong in business desktops and lower end consumer desktops. (They have the chip volume to really push low prices in this area.)

So maybe it is just better for AMD to focus on Workstations (CPUs, APUs with 1/2 rate DP FP on the iGPU) and Enthusiast (CPUs) with AM4?