DNS server randomly died. Can only resolve local hostnames, not internet

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
At noon today my DNS server just kind of half died. It still resolves local names but not internet ones. I thought my internet was down but found it odd that I can still ping the gateway, and I could ping the ISP DNS server, and if I used NSlookup and set the DNS as the server, I could resolve hostnames. But I can't resolve anything external on the DNS server.

I also noticed when I restart it, I get this error:

Code:
service named restart
Stopping named: .umount: /var/named/chroot/var/named: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
                                                           [  OK  ]
Starting named:                                            [  OK  ]

OS is CentOS 6.10.

I don't even know where to start, what would cause this to happen suddenly? I tried rebooting the firewall (first thing I tried originally) and also the DNS server itself (physical machine).

I googled the unmount error but only getting results having to do with actual disk mount/umounting and nothing to do with named.


Edit:

Getting this in /var/log/messages too:

Code:
Mar 25 15:15:27 hal9000 named[11221]:   validating @0x7f88444f3380: uogateway.com SOA: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:27 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/AAAA/IN': 142.166.166.166#53
Mar 25 15:15:27 hal9000 named[11221]:   validating @0x7f883c01e150: uogateway.com SOA: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:27 hal9000 named[11221]: validating @0x7f883c01ede0: uogateway.com A: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:27 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/AAAA/IN': 142.166.166.166#53
Mar 25 15:15:27 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/A/IN': 142.166.166.166#53
Mar 25 15:15:29 hal9000 named[11221]: validating @0x7f883404f510: ssl.empirehost.me A: bad cache hit (ssl.empirehost.me.dlv.isc.org/DLV)
Mar 25 15:15:29 hal9000 named[11221]: error (broken trust chain) resolving 'ssl.empirehost.me/A/IN': 142.166.166.166#53
Mar 25 15:15:34 hal9000 named[11221]: validating @0x7f883c01e150: ssl.empirehost.me A: bad cache hit (ssl.empirehost.me.dlv.isc.org/DLV)
Mar 25 15:15:34 hal9000 named[11221]: error (broken trust chain) resolving 'ssl.empirehost.me/A/IN': 142.166.166.166#53
Mar 25 15:15:39 hal9000 named[11221]: validating @0x7f883c0008c0: ssl.empirehost.me A: bad cache hit (ssl.empirehost.me.dlv.isc.org/DLV)
Mar 25 15:15:39 hal9000 named[11221]: error (broken trust chain) resolving 'ssl.empirehost.me/A/IN': 142.166.166.166#53
Mar 25 15:15:40 hal9000 named[11221]:   validating @0x7f883c01e150: ipv6.microsoft.com SOA: bad cache hit (ipv6.microsoft.com.dlv.isc.org/DLV)
Mar 25 15:15:40 hal9000 named[11221]: error (broken trust chain) resolving 'teredo.ipv6.microsoft.com/A/IN': 142.166.166.166#53
Mar 25 15:15:44 hal9000 named[11221]: validating @0x7f88445320c0: localhost.stackoverflow.tech A: bad cache hit (localhost.stackoverflow.tech.dlv.isc.org/DLV)
Mar 25 15:15:44 hal9000 named[11221]: error (broken trust chain) resolving 'localhost.stackoverflow.tech/A/IN': 142.166.166.166#53
Mar 25 15:15:46 hal9000 named[11221]:   validating @0x7f88400a2140: dlv.isc.org SOA: bad cache hit (dlv.isc.org/DNSKEY)
Mar 25 15:15:46 hal9000 named[11221]:   validating @0x7f88400a2140: dlv.isc.org NSEC: bad cache hit (dlv.isc.org/DNSKEY)
Mar 25 15:15:46 hal9000 named[11221]: error (broken trust chain) resolving 'prod.flightaware.com.dlv.isc.org/DLV/IN': 47.55.55.55#53
Mar 25 15:15:46 hal9000 named[11221]: error (broken trust chain) resolving 'prod.flightaware.com/A/IN': 142.166.166.166#53
Mar 25 15:15:49 hal9000 named[11221]: validating @0x7f883c01e150: localhost.stackoverflow.tech A: bad cache hit (localhost.stackoverflow.tech.dlv.isc.org/DLV)
Mar 25 15:15:49 hal9000 named[11221]: error (broken trust chain) resolving 'localhost.stackoverflow.tech/A/IN': 142.166.166.166#53
Mar 25 15:15:50 hal9000 named[11221]:   validating @0x7f883404f190: ipv6.microsoft.com SOA: bad cache hit (ipv6.microsoft.com.dlv.isc.org/DLV)
Mar 25 15:15:50 hal9000 named[11221]: error (broken trust chain) resolving 'teredo.ipv6.microsoft.com/A/IN': 142.166.166.166#53
Mar 25 15:15:51 hal9000 named[11221]: validating @0x7f8840081c90: prod.flightaware.com A: bad cache hit (prod.flightaware.com.dlv.isc.org/DLV)
Mar 25 15:15:51 hal9000 named[11221]: error (broken trust chain) resolving 'prod.flightaware.com/A/IN': 142.166.166.166#53
Mar 25 15:15:54 hal9000 named[11221]: validating @0x7f883c01e150: ssl.empirehost.me A: bad cache hit (ssl.empirehost.me.dlv.isc.org/DLV)
Mar 25 15:15:54 hal9000 named[11221]: error (broken trust chain) resolving 'ssl.empirehost.me/A/IN': 142.166.166.166#53
Mar 25 15:15:54 hal9000 named[11221]:   validating @0x7f8844513730: ipv6.microsoft.com SOA: bad cache hit (ipv6.microsoft.com.dlv.isc.org/DLV)
Mar 25 15:15:54 hal9000 named[11221]: error (broken trust chain) resolving 'teredo.ipv6.microsoft.com/A/IN': 142.166.166.166#53
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f883404f190: dlv.isc.org SOA: bad cache hit (dlv.isc.org/DNSKEY)
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f883404f190: dlv.isc.org NSEC: bad cache hit (dlv.isc.org/DNSKEY)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com.dlv.isc.org/DLV/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 142.166.166.166#53
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 142.166.166.166#53
Mar 25 15:15:56 hal9000 named[11221]: validating @0x7f883c01f160: iceteks.com A: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f883c0008c0: iceteks.com SOA: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: validating @0x7f883c01e150: iceteks.com A: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f883c041eb0: iceteks.com SOA: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: validating @0x7f88445320c0: iceteks.com A: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f88445432d0: iceteks.com SOA: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: validating @0x7f883c01e150: iceteks.com A: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f883c01e150: iceteks.com SOA: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: validating @0x7f88445432d0: iceteks.com A: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]:   validating @0x7f88444f3380: iceteks.com SOA: bad cache hit (iceteks.com.dlv.isc.org/DLV)
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/A/IN': 47.55.55.55#53
Mar 25 15:15:56 hal9000 named[11221]: error (broken trust chain) resolving 'iceteks.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:57 hal9000 named[11221]: validating @0x7f883404e500: uogateway.com A: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:57 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/A/IN': 47.55.55.55#53
Mar 25 15:15:57 hal9000 named[11221]:   validating @0x7f883c01e150: uogateway.com SOA: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:57 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:57 hal9000 named[11221]: validating @0x7f883404e500: uogateway.com A: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:57 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/A/IN': 47.55.55.55#53
Mar 25 15:15:57 hal9000 named[11221]:   validating @0x7f883404f190: uogateway.com SOA: bad cache hit (uogateway.com.dlv.isc.org/DLV)
Mar 25 15:15:57 hal9000 named[11221]: error (broken trust chain) resolving 'uogateway.com/AAAA/IN': 47.55.55.55#53
Mar 25 15:15:59 hal9000 named[11221]: validating @0x7f883c0008c0: ssl.empirehost.me A: bad cache hit (ssl.empirehost.me.dlv.isc.org/DLV)
Mar 25 15:15:59 hal9000 named[11221]: error (broken trust chain) resolving 'ssl.empirehost.me/A/IN': 47.55.55.55#53
Mar 25 15:15:59 hal9000 named[11221]:   validating @0x7f8840081c90: ipv6.microsoft.com SOA: bad cache hit (ipv6.microsoft.com.dlv.isc.org/DLV)
Mar 25 15:15:59 hal9000 named[11221]: error (broken trust chain) resolving 'teredo.ipv6.microsoft.com/A/IN': 47.55.55.55#53
Mar 25 15:16:04 hal9000 named[11221]: validating @0x7f883404e500: localhost.stackoverflow.tech A: bad cache hit (localhost.stackoverflow.tech.dlv.isc.org/DLV)
Mar 25 15:16:04 hal9000 named[11221]: error (broken trust chain) resolving 'localhost.stackoverflow.tech/A/IN': 47.55.55.55#53


I'm I hacked? This almost sounds like cache poisoning or something. This DNS server does not face the internet though.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
So, no idea if this is what made it work or if it randomly decided to start working back on it's own.

I added 8.8.8.8 in the forwarders as the primary. Then it worked. I went back to remove it, just to see, and it still works. Basically, I didn't actually change anything, all the changes I was trying that were not working I would just revert. At one point I even got rid of the forwarders so I can use the root servers directly and that was not working either.

So yeah, no idea WTF this was about or how long it will keep working, but it works now...
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
Not familiar with BIND DNS, but I think your DNS does face internet.

I used online port scanning and it did show there are 3 ports (80,443 and 53 (DNS)) open at 192.99.10.155, and whois tools


and

C:\Windows\System32>nslookup
Default Server: UnKnown
Address: 192.168.0.1

> server ns1.iceteks.ca
DNS request timed out.
timeout was 2 seconds.
Default Server: ns1.iceteks.ca
Address: 192.99.10.155

> uogateway.com
Server: ns1.iceteks.ca
Address: 192.99.10.155

Name: uogateway.com
Address: 192.99.10.155

==

Don't know if your DNS stopped working has anything to do with this bug
https://bugzilla.redhat.com/show_bug.cgi?id=577639, too long to read.

but online port scanning tool shows you are running BIND 9.8.2rc1
 

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
Oh that's my web server, and yeah it does DNS and HTTP for all my online domains, as well as mail. That one is running fine. Just so happens I was trying to resolve one of my own websites as a test which is what generated those logs.

It's my internal one at home that was not running well. I'll read up that bug page, it might actually be that. Oh actually I landed on that in my search. I removed the forwarders but it still didn't work, I added the forwarders back, and still didn't work. I added 8.8.8.8 as first forwarder, and suddenly it worked. Removed 8.8.8.8 and it also still worked. So yeah... really oddball one.
 

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
in named.conf I found this line:

Code:
bindkeys-file "/etc/named.iscdlv.key"

Is that what this is referring to? I commented it out to see if it does anything, I no longer get that error about failing to umount a folder when I restart named and it also seems to still work.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
Last edited:

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
So far so good. Still odd though that it just randomly decided to die so not even sure if this was indeed the issue, but figured it didn't hurt to comment that stuff out anyway.

CIRA also has some free DNS servers to use now so I ended up putting those as my primary forwarders which in theory should make things faster as they are closer.
 

Red Squirrel

No Lifer
May 24, 2003
67,395
12,141
126
www.anyf.ca
I forgot about this... the problem never came back. Not sure what did it, possibly the bindkeys line. This was still an odd one considering it would only affect a few hostnames and not everything.