Reboot problem with TR

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
OK, trying a mild OC, and I enabled boot logging, but I can't find the log to see the problem. Currently only running 1.25 vcore and 3900. Seems fine for ages (hours) then I find it has rebooted, and I want to know why. Memory NOT overclocked, and this CPU has been over 4.1 for an hour straight no problem. Maybe my LLC is set wrong ? Level 2 for CPU and SOC and all other to auto.

Temps max 75c, and NVME 52c and video cards 47c (full load, both 1080TIs)
 
  • Like
Reactions: Drazick

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
I've had a power supply issue reboot me more than once even while stock settings. You got a good enough psu? If you oc make sure you do a decent LLC for your cpu but it looks like you may have already. But it's probably you need just a 'touch' more cpu vcore above 1.25v, like 1.3c or so. Try a small Cpu and/or soc vcore boost. It should help at least to isolate your issue. A 16 core CPU oc above a certain point even with the best of dice can be tricky.
 
Last edited:

Elixer

Lifer
May 7, 2002
10,376
762
126
Hmm, check event viewer to see if it is a BSOD?
I usually turn that off, so I can see what exact BSOD happened, instead of windows rebooting.

The only other rebooting I am aware of comes from PSU or voltages, like formulav8 said.
Oh, and one time a flaky USB drive caused that.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
So, I have an AX1200 Corsair, possibly the best PSU ever made, and it is drawing 850 watts from the wall, I don't think thats it. As for vcore, the temps are already at my max 24/7/365 comfort level, so if thats it I have to scale back the OC. But I did 4.0 on 1.3, and 4.1 on 1.36, so I really think I have enough vcore. I want to see the boot log, where is that ?
 
Last edited:
  • Like
Reactions: Drazick

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
I want to see the boot log, where is that ?

Try what Elixer mentioned, the Event Viewer and see what you can come up with. Is your cooling covering your entire heat-spreader? Some so-called TR compatible cooler's aren't fully covering the heat-spreader even though it may be perfectly fine at stock or mild oc's. But not for real-deal ocing.
 

Elixer

Lifer
May 7, 2002
10,376
762
126
ntbootlog.txt (or is it ntbtlog.txt?) in the root of C:\Windows I think it is... not 100% sure, I can't reboot right now to test.
 

Fir

Senior member
Jan 15, 2010
484
194
116
Test voltages with a DIMM. My AX1200s started slipping after a year of aging.
AX1500i is the best (Corsair) unit today.
I'm using the EVGA 1600T2. It seems to do a bit better than the AX1500i and I don't need all the fancy link stuff.

My system restarted too after an 8 hour run of Realbench. It was 100% stock too.
Checked the event viewer, it was a damn Windows Update! I'm really beginning to hate MS...

Going to mount the 115i cooler and see how that fares. The Tt360 isn't great and its because of that highly concave coldplate.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
OK, since I posted this, its been over 5 hours@100% CPU and GPU @ 75c. No problems, If I could find the boot logs, it might be windows update. I have disabled the service, but you never know. I hate this not knowing.
 
  • Like
Reactions: Drazick

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
So, I have an AX1200 Corsair, possibly the best PSU ever made, and it is drawing 850 watts from the wall, I don't think thats it. As for vcore, the temps are already at my max 24/7/365 comfort level, so if thats it I have to scale back the OC. But I did 4.0 on 1.3, and 4.1 on 1.36, so I really think I have enough vcore. I want to see the boot log, where is that ?
Just as a side comment I helped my brother build a SB-E system awhile back and the system ran fine for a little while, after a week or so it started spontaneously rebooting. I thought it was memory, but turned out it was the PSU (he went with a Seasonic based EVGA with a lot of headroom). So while it might be the best PSU ever when working its possible you got a dud like my brother.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
ok, here is an odd thing. So it rebooted like 3 times in 8 hours before I posted this. Now its gone 15 hours straight and still going, and I haven't touched anything, I didn't even reboot.
 
  • Like
Reactions: Drazick

AdamK47

Lifer
Oct 9, 1999
15,216
2,840
126
Is it hitting any current limits? Are there current limits for Threadripper in the BIOS?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
Is it hitting any current limits? Are there current limits for Threadripper in the BIOS?
Not that I know of. And the temps, voltages, etc are WAY below what it was at 4.1. Also, stilling going strong. Blame it on Windows ? I will look for those log files.
 
  • Like
Reactions: Drazick

AdamK47

Lifer
Oct 9, 1999
15,216
2,840
126
I would think if it was Windows it would write something in the system event log for the reboot. No log entry for the reason for the reboot most likely makes it hardware related.
 

GoNavy1776

Member
Jul 7, 2017
52
8
41
I'm gonna second his ooinion. AX1200 may actually be the best PSU ever crafted even by today's standards. It has one of the lowest measured ripple ever seen in the land of PCs and many other things. I wouldn't even attempt to OC TR until I had full cover water block or huge air cooler.
 

mmaenpaa

Member
Aug 4, 2009
78
138
106
If there is anything useful in the system logs, you can use this *great* tool:

http://www.resplendence.com/whocrashed

(free for personal use)

If reboot is hardware related in like "pressing reset button" there won't be much to read.
But I have found some driver related problems (GPU/Nvidia, audio/Lenovo notebook etc.)

Markku
 
  • Like
Reactions: Ajay

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
Up for almost 24 hours now.

OK, that tool is great most of the reboots are Nvidia driver caused, but here are the details on the last 3:

On Wed 8/16/2017 2:21:08 PM your computer crashed
crash dump file: C:\Windows\Minidump\081617-6546-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x16C560)
Bugcheck code: 0x1 (0x7FF91B041E04, 0x0, 0xFFFF, 0xFFFF908090A11B80)
Error: APC_INDEX_MISMATCH
file path: C:\Windows\system32\ntoskrnl.exe
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that there has been a mismatch in the APC state index.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Wed 8/16/2017 2:21:08 PM your computer crashed
crash dump file: C:\Windows\memory.dmp
This was probably caused by the following module: ntkrnlmp.exe (nt!KeBugCheckEx+0x0)
Bugcheck code: 0x1 (0x7FF91B041E04, 0x0, 0xFFFF, 0xFFFF908090A11B80)
Error: APC_INDEX_MISMATCH
Bug check description: This indicates that there has been a mismatch in the APC state index.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.



On Wed 8/16/2017 12:44:46 PM your computer crashed
crash dump file: C:\Windows\Minidump\081617-6828-01.dmp
This was probably caused by the following module: nvlddmkm.sys (nvlddmkm+0x24E4C9)
Bugcheck code: 0xD1 (0xFFFFB70EE2337005, 0xB, 0x1, 0xFFFFF80D8D0CE4C9)
Error: DRIVER_IRQL_NOT_LESS_OR_EQUAL
file path: C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_9ab613610b40aa98\nvlddmkm.sys
product: NVIDIA Windows Kernel Mode Driver, Version 385.28
company: NVIDIA Corporation
description: NVIDIA Windows Kernel Mode Driver, Version 385.28
Bug check description: This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.
This appears to be a typical software driver bug and is not likely to be caused by a hardware problem.
A third party driver was identified as the probable root cause of this system error. It is suggested you look for an update for the following driver: nvlddmkm.sys (NVIDIA Windows Kernel Mode Driver, Version 385.28 , NVIDIA Corporation).
Google query: NVIDIA Corporation DRIVER_IRQL_NOT_LESS_OR_EQUAL
 

mmaenpaa

Member
Aug 4, 2009
78
138
106
Regarding Nvidia driver, it seems to be released 14.08.2017 (or is 08/14/2017).

Not many days to find out those hard to find combination bugs :)

Markku
 

formulav8

Diamond Member
Sep 18, 2000
7,004
522
126
I'm leaning towards ram. If it does it again, lower the ram speed or raise timings. Some of what comes up with the APC Mixmatch's is ram and/or driver issues.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
It just struck me, since I have another computer that keeps rebooting. It ALSO has a 1080TI. ! I think its driver issues for this card. I just had a reboot on my other machine !
 
Last edited:
  • Like
Reactions: Drazick

Fir

Senior member
Jan 15, 2010
484
194
116
It just struck me, since I have another computer that keeps rebooting. It ALSO has a 1080TI. ! I think its driver issues for this card. I just had a reboot on my other machine !

What are your GPU temps?
Are these cards 100% load (compute)?

As for the CPU, I've tried the H115i and temp results are similar to the TT360. I'm done with overclocking on this platform until I find a proper block/plate that ensures the best contact and cooling.

And TBH, I don't feel like I'm losing much either. The system is very fast and very close to SILENT! It's like the movement in my Breitling Twin Sixty, I have to put my ear right next to the board to hear anything. And that, I'm fine with! :D
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,555
14,510
136
What are your GPU temps?
Are these cards 100% load (compute)?

As for the CPU, I've tried the H115i and temp results are similar to the TT360. I'm done with overclocking on this platform until I find a proper block/plate that ensures the best contact and cooling.

And TBH, I don't feel like I'm losing much either. The system is very fast and very close to SILENT! It's like the movement in my Breitling Twin Sixty, I have to put my ear right next to the board to hear anything. And that, I'm fine with! :D
Cards are at 80%fan 24/7 and only run 48-55c CPU is 75c, peak 84. Since the cards are the problems on 2 different systems, and temps are fine, I assume its drivers at this point.
 
  • Like
Reactions: Drazick

Fir

Senior member
Jan 15, 2010
484
194
116
Quite possibly. They are more focused on gaming performance over compute stability.
I remember back in 2011 with dual hydrocopper 590s and cuda compute being a nightmare.
 

FiLeZz

Diamond Member
Jun 16, 2000
4,778
47
91
So, I have an AX1200 Corsair, possibly the best PSU ever made, and it is drawing 850 watts from the wall, I don't think thats it. As for vcore, the temps are already at my max 24/7/365 comfort level, so if thats it I have to scale back the OC. But I did 4.0 on 1.3, and 4.1 on 1.36, so I really think I have enough vcore. I want to see the boot log, where is that ?
I have killed two of those power supplies. Just saying.
 

StefanR5R

Elite Member
Dec 10, 2016
5,510
7,817
136
NVIDIA Windows Kernel Mode Driver, Version 385.28
Since the cards are the problems on 2 different systems, and temps are fine, I assume its drivers at this point.
I am currently using driver version 384.94. Installed it on August 1 and had no issues yet with Folding, SETI, and Collatz. (Win 7, X99 and X79 based PCs, 1080ti, 1080, 1070.)