Custom Dual Xeon build restarting needed a couple of times to finally start up

Animajosser

Junior Member
Apr 21, 2018
11
0
6
This is my first post on this forum, but I've been into computer hardware and electronics for some time now. I've stumbled upon a problem I think is interesting and I would like to know if someone else knows what it could be.
I did post on another forum, but couldn't find people there who knew what to do.

I bought a pair of second hand Intel Xeon X5650 CPU's and to use them I bought a refurbished HP Z800 motherboard (Rev 1.02) and upgraded the bios to 3.61 Rev.A with a 2 dollar quad-core that fitted the socket. I also bought 2 memory sticks of 8GB ECC (M393B-CH9051K70CH0). I knew the motherboard only needed 12V and a little 5V. So I bought a second hand 1000W Server PSU (DPS800GB A) and some ATX wires. I soldered them very strong, used a transistor to handle the PS_On and it worked.

At first I used one X5650 to test it with and I used a desktop GPU from AMD (since then I've swapped it with others without result). The computer would just work great. Of course I wanted more power, so I put in the second CPU and that's when It started to act weird. Sometimes when I try to boot it up it'll just startup in one try, but the next time it won't and I'll have to try a few times to get it working.

Of course I thought: does the PSU have issues? So, I took the oscilloscope and measured the voltages, but they are all good: 25mV ripple. Under load (around 400W) I thought I could see a little noise up to 60mV, but still a very low ripple and that doesn't have anything to do with starting up. The second thing I checked was the Pwr_OK-line; I don't have a memory scope, but It didn't take too long for it to trigger (I think around 500ms). The voltages are alright, even under load and the motherboard's self generated 3.33V is alright too. I can't find sensors with Windows, HWMonitor or Linux lm-sensors that read the CPU voltages or the other voltages. So I'll have to measure at the PSU. The temperatures when used for a long time are alright too, especially the CPU's.

Things I noticed when it didn't startup (without HDD):
Fans start spinning (not too fast, but at moderate speed).
Power on led lights up.
HDD-led lights up at pushing the button and dies after that.
No screen, the screen knows there is something, but doesn't get a signal.
When I push the power button once, the HDD-led lights up, but nothing happens.

Things different when it does startup:
HDD-led lights up, goes blank and after that lights up again and goes blank.
I do get a screen.
When I push the power button once, the HDD-led lights up and the computer immediately shuts down.

I hooked up a speaker on the PB/LED Pins (I remembered it didn't have a beeper) and you can hear music when listening to a song and makes, during startup for every error-message a beep when starting up because I did not connect Front USB, Front audio, Rear Chassis fan and the Memory fan. When it doesn't startup though, it doesn't make any sound. So there are no error-codes I can examine.

That's actually everything 'till now and I really don't know where to look next. I considered looking at the capacitors on the motherboard or replacing the PSU, but the first is very much work and the latter very expensive. So what do you think?

Moved from Highly Technical.
admin allisolm
 
Last edited by a moderator:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,552
14,510
136
Normally there are multiple PSU connectors, I don't know that motherboard, but most have the 26 pin, and 2 8 pin plugs. If all 3 are not connected, that can cause issues like what you see.
 
  • Like
Reactions: Drazick

Animajosser

Junior Member
Apr 21, 2018
11
0
6
Thank you for your answer, I should have clarified that.
It has a custom 18-pins connector for the motherboard:
atx_pinout.png

And a separate one for the memory:
mem_pinout.png

And of course it has an 8-pin connector for the CPU's.

All are connected as they should (I've connected them really carefully and checked them several times and couldn't find anything)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,552
14,510
136
That sounds very proprietary, never heard of an 18 pin cable. It could also be that while it boots after a bios flash, this motherboard may not really be designed to support these CPU's.

Edit: Also, those are quad core, correct ? so you have 8 cores and 16 threads at like 2.8 ghz ? A used 1700 from the last Ryzen series (they should be plentiful and cheap now) probably has more horsepower than your 2 Xeons.
 
  • Like
Reactions: Drazick

Animajosser

Junior Member
Apr 21, 2018
11
0
6
Originally this motherboard supports X55xx, but there have been updates for the bios that made it possible to use X56xx. It would be odd that it only sometimes works with 2 and always with 1.

It has 2 hexacores with hyperthreading, which is nice for rendering, but you're probably right that one modern AMD CPU is better, but I am happy with the power it has and I don't really want to spend more money on another computer right now.
 

Animajosser

Junior Member
Apr 21, 2018
11
0
6
I found out by carefully reading little sidenotes and comments on this blog (http://andybrown.me.uk/2014/11/01/z800/), that there might be a problem with the bootblock and people said they wrote a new bootblock to a blank bios, which can be soldered pretty easily. It isn't really expensive (3 dollar on aliexpress for 5 IC's (25vf016b) and a usb bios programmer for under 10 dollar).

EDIT:
I found a CH341A Minprogrammer on Aliexpress to flash the BIOS and I bought some IC's. On this page (http://andybrown.me.uk/forum/index.php/topic,24.0.html) I downloaded the image that has to be written to it. I'm curious if it'll work and what software I'll need (maybe this one: https://sourceforge.net/projects/ch341eepromtool/).
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,552
14,510
136
I found out by carefully reading little sidenotes and comments on this blog (http://andybrown.me.uk/2014/11/01/z800/), that there might be a problem with the bootblock and people said they wrote a new bootblock to a blank bios, which can be soldered pretty easily (http://andybrown.me.uk/forum/index.php/topic,24.0.html). It isn't really expensive (3 dollar on aliexpress for 5 pcs IC's (25vf016b) and a usb bios programmer for under 10 dollar). Would it be worth a try?
I never mess with hardware. I just do bios flash and build the boxes. Maybe someone else will answer.
 
  • Like
Reactions: Drazick

Charlie22911

Senior member
Mar 19, 2005
614
228
116
Sounds like it could possibly be a memory training thing... are the sticks known good and configured correctly?

EDIT: a word
 

Animajosser

Junior Member
Apr 21, 2018
11
0
6
There is very little information as to that topic, but I checked their checklist and all my memory should be fine. I used this memory also with only one CPU and it worked fine. I'm gonna go with what I read on the internet and try to flash a newer version of the bios. It is not really messing with hardware, I am good at soldering and I am only writing new software, which can't be written with a normal bios-flash (I suppose to make sure it can't be corrupted and never boot up again). I am keeping the old BIOS to make sure I have a way back.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
If it's 100% stable with only a single CPU, I don't think it's the boot block. Have you swapped the CPU's? Meaning take the one that's currently in CPU2, put it in CPU1 and run it standalone? The memory controller is on the CPU, so want to see if the issues follow the CPU. Have you tried putting the memory in a different pair of slots? Possible one of the banks on CPU2 is dead.
 

Animajosser

Junior Member
Apr 21, 2018
11
0
6
If it's 100% stable with only a single CPU, I don't think it's the boot block. Have you swapped the CPU's? Meaning take the one that's currently in CPU2, put it in CPU1 and run it standalone? The memory controller is on the CPU, so want to see if the issues follow the CPU. Have you tried putting the memory in a different pair of slots? Possible one of the banks on CPU2 is dead.

I did swap the memory a lot and I tried all kinds of configurations and none of that seemed to help.

I did not yet swap the CPU's yet and that is certainly something I'm going to try, but I really think this is something that has to do with the bootblock. On the aforementioned blog I read, after much searching, that people with rev. 001 and rev. 002 of this motherboard had problems with X56xx CPU's and after the BIOS update they had a 1005 stable system, but it just needed several tries to start up when using 2 CPU's. When using one CPU it would boot normally and with a rev. 003 board everything would boot normally. I have rev. 002 and this is exactly what I am experiencing. Some people said that the problem could be related to the bootblock in the BIOS and if you do a BIOS flash that part is read-only. Using a new BIOS IC and flashing that with the right bootblock should solve the problem then. It is little work, not dangerous at all and not very expensive, so I think I should try it.
 

Animajosser

Junior Member
Apr 21, 2018
11
0
6
I just did it. I flashed a new bios chip with the software downloaded from andybrown.me.uk with the CH341A bios programmer, replaced the bios chip on the motherboard with it, and booted it up. It has not yet not started up and I tried around 10 times now. Normally it would start up once in 5-10 times. I will continue testing it, but it seems like the software on the bios chip really was the problem. It is not a hardware-mod, but a software-mod. I only had to replace the bios chip because I couldn't flash that particular part of the bios chip with the motherboard and I wanted to keep the old bios chip just in case. Also, reflashing that particular part has been said to brick your bios chip, so a new one was needed. Now it has the software from a revision of this motherboard that officially supports these CPU's and it has the newest bios software (2018), so I think this is the solution I looked for.

EDIT: How I did it: I bought a CH341A bios programmer and five SST 25VF016B IC's on Aliexpress and flashed the new bios chip with the software mentioned earlier. I used a tool from this page: https://tosiek.pl/ch341-eeprom-and-spi-flash-programmer/ and I installed the drivers with Snappy Driver Installer. Something I noted was that the software says wrongfully how to place the chip in the programmer. the notch of the chip needs to be turned 180 degrees, otherwise it won't work. To conclude, I unsoldered the old bios chip as insurance and soldered in the new one in.
 
Last edited:

Mohammed Kadhim

Junior Member
Jun 22, 2018
1
0
1
@Animajosser,
Thanks for your input. Looks like I'm having the exact same issue except I'm using dual Xeon X5680 instead and I believe I tried everything possible from swapping CPUs, trying different RAM sticks, replacing PSU with a brand new one, and updating BIOS from 3.60 to 3.61 nothing seems to work.
The issue seems exactly as you described, sometimes the computer boots up from first try, and sometimes it takes up to 5 (in rare cases 6 or 7) times to boot properly. I expected updating BIOS firmware to the latest, 2018 version, would fix the problem but it didn't. So I think i'll end up following your route hopefully this would finally fix the problem but before I do I wanted to double check with you that indeed this had fixed your problem?
 

Animajosser

Junior Member
Apr 21, 2018
11
0
6
Yes. It really seems like it has fixed the problem. It hasn't behaved bad since I placed the new bios. So I really recommend it if you're comfortable with soldering small ICs.