- Jul 3, 2005
- 594
- 0
- 0
I purchased a couple of 1GB sticks of ECC RAM from newegg recently, haven't gotten around to testing them until now. I just finished a 2 months long process of reconfiguring my main system (and the purchase of these RAM was an early part of this reconfiguration); and I tested these with another pair of 1GB sticks that I have had for a while. My system set up is:
Antec True550 EPS12V
Asus K8N-DL
2 X Opteron 275
GF 6800NU
2 X SATA
... ...
The K8N-DL is a NUMA board, with NUMA enabled on the current version of BIOS (1006) of my machine.
The 4 X 1GB RAM is configured as two sticks on the two channels of CPU0 in the first socket, the other two sticks on the two channels of CPU1 in the second socket. The new sticks were on CPU0, while the old sticks are on CPU1.
This system had never given me any major trouble before. To my dismay, when memtest started, it reported the first error about 5 min into the test, and has reported numerous errors since; and it's still running as we speak (although I'm not home right now)
This was within the first hour:
tst - pass - failing address --------- good -------- bad ------- err-bits ---- count -- chan
3 -- 0 ---- 0003368748c 822.4MB - fefefefe --- fefffefe ---- 00010000 ---- 5
6 -- 0 ---- 000336850ac 822.3MB - fffeffff ------ ffffffff ------- 00400000 ---- 1
6 -- 0 ---- 0003368748c 822.4MB - fffeffff ------ ffffffff ------- 00010000 ---- 1
6 -- 0 ---- 000336870cc 822.4MB - fffbffff ------ ffffffff ------ 00040000 ---- 1
7 -- 0 ---- 000336850ac 822.3MB - 49b9fab8 - 49f9fab8 -- 00400000 ---- 2
8 -- 0 ---- 0003dd0798c 989.4MB - 00000000 - 00100000 - 00100000 ---- 1
8 -- 0 ---- 000336850ac 822.3MB - 00000000 - 00400000 - 00400000 ---- 1
3 -- 1 ---- 0003dd0798c 989.4MB - efefefef ---- efffefef ---- 00100000 --- 1
All of the errors in fact were concentrated around two locations in physical memory.
I was wondering, how would NUMA map physical addresses, would it just simple map the first 1GB module in first channel of CPU0 onto the first 1024MB, hence, the module in that channel would have the addresses of 00000000 to 3fffffff; the second channel of CPU0 would map onto the second 1024MB, from 4000000 to 7fffffff; and so on?
Am I correct? even close to being correct. If I'm correct, that would mean that all of the errors are on the first module on CPU0, since all of th errors occur within the first 1024MB of physical address, and that's the only one I would need to RMA?
If anyone has any comments or suggestions, please let me know. I'll take any help I can get.
Thanks in advance.
--HB
Antec True550 EPS12V
Asus K8N-DL
2 X Opteron 275
GF 6800NU
2 X SATA
... ...
The K8N-DL is a NUMA board, with NUMA enabled on the current version of BIOS (1006) of my machine.
The 4 X 1GB RAM is configured as two sticks on the two channels of CPU0 in the first socket, the other two sticks on the two channels of CPU1 in the second socket. The new sticks were on CPU0, while the old sticks are on CPU1.
This system had never given me any major trouble before. To my dismay, when memtest started, it reported the first error about 5 min into the test, and has reported numerous errors since; and it's still running as we speak (although I'm not home right now)
This was within the first hour:
tst - pass - failing address --------- good -------- bad ------- err-bits ---- count -- chan
3 -- 0 ---- 0003368748c 822.4MB - fefefefe --- fefffefe ---- 00010000 ---- 5
6 -- 0 ---- 000336850ac 822.3MB - fffeffff ------ ffffffff ------- 00400000 ---- 1
6 -- 0 ---- 0003368748c 822.4MB - fffeffff ------ ffffffff ------- 00010000 ---- 1
6 -- 0 ---- 000336870cc 822.4MB - fffbffff ------ ffffffff ------ 00040000 ---- 1
7 -- 0 ---- 000336850ac 822.3MB - 49b9fab8 - 49f9fab8 -- 00400000 ---- 2
8 -- 0 ---- 0003dd0798c 989.4MB - 00000000 - 00100000 - 00100000 ---- 1
8 -- 0 ---- 000336850ac 822.3MB - 00000000 - 00400000 - 00400000 ---- 1
3 -- 1 ---- 0003dd0798c 989.4MB - efefefef ---- efffefef ---- 00100000 --- 1
All of the errors in fact were concentrated around two locations in physical memory.
I was wondering, how would NUMA map physical addresses, would it just simple map the first 1GB module in first channel of CPU0 onto the first 1024MB, hence, the module in that channel would have the addresses of 00000000 to 3fffffff; the second channel of CPU0 would map onto the second 1024MB, from 4000000 to 7fffffff; and so on?
Am I correct? even close to being correct. If I'm correct, that would mean that all of the errors are on the first module on CPU0, since all of th errors occur within the first 1024MB of physical address, and that's the only one I would need to RMA?
If anyone has any comments or suggestions, please let me know. I'll take any help I can get.
Thanks in advance.
--HB