Just when I was boasting about my gurr-eat Z170 system . . .

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
HARDWARE:

Sabertooth Z170 S motherboard
i7-6700K Skylake processor
G.SKILL 4x8GB = 32 GB (16GB removed yesterday for troubleshooting)
Samsung 960 Pro NVME M.2 PCIE x4 boot-system disk
Samsung 960 EVO NVME M.2 PCIE x4 250GB caching drive and volume (see caching discussion below)
Crucial MX300 1TB SATA SSD
Two Seagate 2.5" 2TB HDDs (#1 is media; #2 is for Macrium Backup)
GTX 1070 Gigabyte OC "Mini" graphics card
There is also an Hauppauge 2250 PCIE x1 tuner card -- been meaning to remove it.
There is a Marvel PCIE x1 SATA controller as well (connected to external eSATA ports)

PSU: Seasonic PRIME Titanium 650W

====
OK. I had been chewing the fat with colleagues on Memory & Storage about using Primo Cache to cache SATA drives to NVME (the EVO) and to RAM (heretofore, the additional kit of TridentZ 3200 14-14-14 I added to the system last summer with the Crucial MX300).

This was an extremely standup reliable and flawless system for the last eight months (since the additional hardware was added), and -- before that. There had only been problems with the Creators Build 1703 of April 2017, and we got through all that.

So, just a few days after posting my benchmark scores for the caching system and comparing them to scores others posted regarding their Intel 3D Xpoint M.2 devices, I put the system into hibernation. I had not had trouble with hibernation previously, or -- only in regard to a Primo Cache setting required to actually have hibernation at all: "Unload cache for hybrid sleep or hibernation."

Attempting to bring the system out of hibernation, it would not post. Keep in mind this is a dual-boot Win7/Win10 system. I couldn't get it to post; couldn't get into the BIOS. Finally, I cleared the CMOS. I could get into BIOS, but could not boot to Windows without a stop-code 21A. I removed 16GB of RAM. Finally able to get it to post in Safe Mode; uninstalled the Primo Cache. Was able to boot into Win 10 again. On rare occasions, often occurring when swapping hardware, one is wise to uninstall the caching until the problem is resolved.

System seemed stable and working normally, but after a day, I decided to Shutdown. Powering up again, I had to hit the reset button a few times to get it to post.

This problem persists. I must resolve it. It doesn't appear to be the RAM. It COULD be the motherboard, but this is a 5-year-warrantied Sabertooth furchrissake.

I could suspect the power supply, but this is a year-and-a-half-old PRIME Titanium. Luckily, I have a spare in another system I was building -- another PRIME Titanium -- unused.

I suspect that will be my next step.

But there have been BIOS and chipset updates to this system, which likely addressed the vulnerabilities announced last year. I'm wondering if it isn't possible the latest build of Win 10 is interacting with the BIOS and chipset in some way to cause this problem.

Or -- maybe -- it's the PSU.

Comments? Observations? Suggestions? Before I proceed? I FREAKING HATE things like this happening, when this had been the best system I've ever built since I started fiddling with computers in the 1980s.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,571
10,206
126
Well, it could be the PSU, but ... did you install any BIOS updates, since the "system was working for 8 months"? Just wondering, how much of the trouble that you're currently seeing, is due to Meltdown / Spectre patches. I've heard that current patches are a bit half-baked, and can cause issues.

If the answer is NO, no updates to BIOS or OS, then perhaps the issue is hardware-related.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
Well, it could be the PSU, but ... did you install any BIOS updates, since the "system was working for 8 months"? Just wondering, how much of the trouble that you're currently seeing, is due to Meltdown / Spectre patches. I've heard that current patches are a bit half-baked, and can cause issues.

If the answer is NO, no updates to BIOS or OS, then perhaps the issue is hardware-related.

[That was quick!] The last BIOS update was a year ago, before the 1703 Creators Build. [I must insert a remark here -- it wasn't "just working" for 8 months. It was stellar-in-development through March 2017; any difficulties I had were a matter of discovering how to make the dual-boot work flawlessly and keep it backed up with overall Macrium imaging for all partitions/volumes across drives. 1703 was a hurdle, because it F**ed up a lot of things for a lot of people. It was a single-handed triumph for me. Then I added and tested the remaining RAM and SSD. Flawless. Then, suddenly, after posting on your thread about Optane this month, I discover this . . .problem.]

I had deferred by 1709 Fall Creators Build until it somehow insisted on installing itself. That is -- it was deferred until some few weeks ago, and there may have been large blocks of time when the system was asleep or in use after that, but never hibernated. Event ID 41 errors are far and few between on this rig. Of course, one can panic over something and it happens -- or you decide to lean on your computer case and accidentally press the reset button. The 1709 build installed leaving the dual-boot configuration intact and operable. I can't really say if it was a matter of the 1709 build, or something else. HOWEVER, EVEN I could suspect that this was somehow a result of the caching program or configuration, but -- nah. That wouldn't cause the system to fail hibernation, fail to boot, in some cases fail to reset, with the problem persisting after I temporarily uninstalled the program and its caches in Safe Mode.

This is what I did this evening, after troublesome cold-boots following an orderly shutdown.

I had noticed there are these additional voltage settings in the BIOS, allowing you to fix the boot-time VCORE and such things as the VCCIO (IMC). So I bumped up the VCCIO to 1.1875V, and set the VCORE boot-time voltage to 1.35V.

Then, I hibernated the system, let it sit dead and dark for 20 minutes, and clicked my mouse with anxious anticipation. These symtoms are not "out of the woods" if the lights come on, the monitor initializes but the system doesn't post.

But -- it did. I'll experiment some more and report any further observations. I could swap out the PSU immediately, but I'll need to take things apart. At least I have a duplicate PSU.

Just another remark: I somehow imagined that the Spectre and Meltdown patches, together with the 1709 CU I installed recently, might have required more power at boot time. There's a lot of hardware in this rig. But this is just idle guessing on my part, at this time.

I even thought that it could be an interaction between latest Win 10 updates and the old BIOS and chipset driver: there would've been efforts to resolve the recent threats through both OS, BIOS and chipset drivers -- or so I could guess again. What do I know?

If I can just keep it working for my personal needs at the moment, I should be able to arrange testing one thing at a time. I think even the graphics card could be suspect, so I'll need to try setting up the onboard iGPU and monitor connection.

See - ordinarily this is all great fun,, building and tweaking systems. But I have business to perform, people to e-mail, bank accounts to make online bill-pays. Sure, it's not that difficult to arrange for those activities on my other computers. But it's still a pain in the arse.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
OK . . . . The system will go into sleep state (not "hybrid" sleep, but just "sleep"), and recovers without a problem.

Further, it will hibernate and return from hibernation. Once I re-enabled hibernation, it did this twice so far without fail or flaw. So after two iterations, no indication yet that there is a statistical scatter of "success" or "failure" for return from hibernation.

But an orderly Shutdown of the system leads to this troublesome sequence of getting it to post on cold-start or boot-up.

The two easiest items to check at this point would be the video card (G-byte GTX 1070) and the Seasonic PSU. As stable as the system is in variety of overclock profiles from "stock" to 4.7Ghz, there's no indication there is a problem with RAM. If it's the motherboard, that would be the most disruptive change-out, meaning the least tolerable lapse of "down time."

As for the graphics card, it has run at stock speeds for many more hours than under its overclock setting. It defaults to the stock speeds at every restart or boot up. The graphics card informs the monitor through the troublesome cold startup -- initializing the monitor when it should, with the monitor reporting the connection mode, resolution, sleep and wake reset that you would expect at boot time. But on start-up, the system fans spin up, then slow down (as you would expect), and the system won't post without various combinations of "reset", power-down and restart, CTRL-ALT-DEL and/or holding down SHIFT-CTRL-ALT left and right together.

My money is on the power-supply. But I started this thread hoping to get additional points of view.

Even easier than replacing those other items, I can remove unnecessary cards that I was planning anyway to uninstall, like the Hauppauge tuner. IN FACT, the tuner driver had been indicated in the Blue-Screen-View analysis of some few mini-dumps resulting from the most occasional BSODs I'd had. Those seemed to have disappeared, and might have occurred maybe once in 30 or 40 days.

The system is powered through a UPS with a set of fairly recent battery replacements.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
The full error message on the BSOD would be helpful but rule #1 when troubleshooting hardware issues is remove all unnecessary hardware, return settings to defaults, and add things back in one at a time until you find the problem.
 

vailr

Diamond Member
Oct 9, 1999
5,365
54
91
But there have been BIOS and chipset updates to this system, which likely addressed the vulnerabilities announced last year. I'm wondering if it isn't possible the latest build of Win 10 is interacting with the BIOS and chipset in some way to cause this problem.
The latest bios firmware available for that board is version 3703, dated January 12:
https://www.asus.com/us/Motherboards/SABERTOOTH-Z170-S/HelpDesk_BIOS/
A still more recent bios firmware update (containing a more current Intel CPU microcode, which more accurately addresses the Spectre/Meltdown security issue) dated March 2018 or later should be made available shortly.
So: either wait for that newer update to appear, or go with the January 2018 version for now, and maybe hope that solves the problem.
Other than that, you could also fresh install using the latest Win 10 version 1803, which is due for release any time now.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
The full error message on the BSOD would be helpful but rule #1 when troubleshooting hardware issues is remove all unnecessary hardware, return settings to defaults, and add things back in one at a time until you find the problem.
=====
Yes indeed. As long as the system will sleep and wake, and as long as its running stability shows no problems, I'm going to take my time to solve this to minimize cost and minimize overall work. So I'm plotting, planning and posting. We're definitely going to follow a plan of disconnection and removal.

It is interesting you had said that my project had already increased complexity -- perhaps with the spare memory kit, replacement of HDD with an SSD, etc. -- with the inaccurate assumption that I'd built the whole system around Primo-Cache. But none of those things have contributed to this situation. Worst thing for me would be the motherboard, even with a five-year warranty on Sabertooths. Unless they can cross-ship a replacement board, that sort of thing can take weeks.
So . . . first thing that comes out will be the tuner-card. I've got network tuners. It just provides an input for digitizing some VCRs, and I can get TV via internet if the network tuners are down. And I have an Hauppuage USB device for that purpose, anyway.

But you'd likely agree -- I bought the motherboard to be "TUF." I think that they'd touted it for having a "MIL-SPEC." They did enough things right with that motherboard, I even saw it touted as "rare limited edition, hard to find . . ." So far, I see that I can get a new one for twice what I paid as a minimum, and four times my Egg price as a maximum. Much about the board's public reputation and my own experience offers some hope, but . . . What if?

I've got four clocking profiles -- "stock," "4500," "4600" and "4700." There was no change in behavior for reverting to "stock." I could do one more thing in that regard: reset the board to its "ASUS optimal" defaults, and then tweak it for the attached hardware. I think I may have done that for the "stock" profile," but I could take another pass at it and create "stock#2".


The latest bios firmware available for that board is version 3703, dated January 12:
https://www.asus.com/us/Motherboards/SABERTOOTH-Z170-S/HelpDesk_BIOS/
A still more recent bios firmware update (containing a more current Intel CPU microcode, which more accurately addresses the Spectre/Meltdown security issue) dated March 2018 or later should be made available shortly.
So: either wait for that newer update to appear, or go with the January 2018 version for now, and maybe hope that solves the problem.
Other than that, you could also fresh install using the latest Win 10 version 1803, which is due for release any time now.
=====
I'm trying to think where I'd insert that step into my plan. Let me say smugly that I downloaded the latest BIOS and chipset driver day before yesterday.

Now sometimes, they offer extra instructions about installing a BIOS and related chipset. I can install the chipset drivers in Windows. I'll flash the BIOS from within the BIOS.
Either way, the system will need to go through a power cycle and reboot.

I think I can install the chipset drivers right away. But I'll wait until I can remove a couple pieces of hardware. Unplanned troubles generate panic, and panic causes mistakes.

Another thing I thought of was a drive enumeration process that occurs at boot time. One of the HDDs had a reputation for throwing yellow warning events early after boot-time, because of features used in laptops. This had also been a problem initially with either a USB or eSATA optical drive I needed to install the original Windows 7 with slip-streamed drivers provided by ASUS. Their initial presence caused long delays in system post. But there was nothing wrong in the operation of that HDD drive -- but only the yellow events after boot-up. Yes -- all laptop HDDs. I can drop them both temporarily.

This is just the wrong time for this stuff to happen.

But isn't it always, anyway?
 
Last edited:

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
PROGRESS! But not for removing hardware per se.

I removed the 960 EVO caching drive and the Hauppauge 2250 card -- the latter was overdue for removal when I decided some time ago that it wasn't needed.

I ran Macrium Reflect Rescue media from a bootable USB stick. First time -- it wouldn't automatically detect ANY operating system, nor would its "search" function. System continues from Macrium to easily boot to the Windows boot-menu with both OSes. Before proceeding, I added a drive letter in Windows 10 for the system volume of Windows 7.

Second pass: Macrium recognizes only Windows 7. Exit Macrium, boot to Windows 7 this time, and added a drive letter for the Windows 10 volume.

Third pass: Macrium recognizes both OS volumes on the disk, and repairs the boot record for multi-boot.

So after rebooting to Windows 10, putting the system asleep for a few hours and waking up, I decided to select "Shutdown" from the Start menu. System shuts down. I wait a minute or so, and hit the power button. System posts almost immediately, and boot-time to the multi-boot menu seemed almost too quick.

Seems like WINDOWS UPDATES was guilty -- once again. It fouled up my multi-boot menu with #1703 last year, and #1709 apparently caused some sort of corruption in the boot record. This time, the boot menu functioned properly if you could just get the system to post. But I'll want to test a few more times.

If there's anything else, I'll post another note.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,327
1,888
126
ADVISORY: Avoid going on a goose-chase for hardware worries.

Have something like a Macrium WinPE bootable rescue disk or USB stick handy IMMEDIATELY after installing a Feature Update or Creators Update on a dual-boot system with Windows 10.

Windows 10 will mess with your MBR or boot record. So just boot to such a rescue environment and repair the boot record as a matter of routine. You may not even notice the symptoms for a while if you don't.