DNS server on AIX causing weird pause for linux clients

Ares2600

Member
May 30, 2000
124
0
76
So I'll spare you the big messy details of this benchmark topology I have set up, but an interesting issue has cropped up and I'm wondering if you guys here might have some ideas. Even some new ideas on debugging would be interesting. I'm using BIND to distribute my workload via round-robin DNS. Up until this point we've been in a testing phase and I just threw BIND with my zone definition on some SLES 9 linux machine. In the 'production' simulation I want BIND to reside on the AIX machine that's hosting my database. It's a nearly identical version of BIND (both 9.3.x) and absolutely identical zone definition files. I'm not doing anything all that interesting with BIND.. just a simple round robin distribution... I've also got some differentiation based on 'match client' clauses in the actual named.boot file, but in general it's a fairly basic BIND config.

The problem is this... my middleware machines are a bunch of blades running Linux.. when they look up a host name against the SLES BIND server, no worries. When I point them to my AIX BIND server, there's a pause during the lookup. This isn't just under heavy load or anything like that.. whenever they look up ANY name against the AIX server it pauses.. even for a simple ping. The REALLY puzzling part is that the client seems to already have the ip address associated with the name BEFORE the pause occurs. I can tell this with the ping command:

PING d-apex1.apexpublish (192.168.2.151) 56(84) bytes of data.
64 bytes from 192.168.2.151: icmp_seq=1 ttl=64 time=0.714 ms
64 bytes from 192.168.2.151: icmp_seq=2 ttl=64 time=0.086 ms
64 bytes from 192.168.2.151: icmp_seq=3 ttl=64 time=0.086 m

The first line of that actually comes out BEFORE the pause.. and it's already got the ip!

Also, a set of windows clients that I have in this environment don't seem to have the same pause when looking up against the AIX DNS server. So I'm stumped. What kind of interaction could be happening AFTER the ip address is retrieved? I can't even think of a new debug step.

cliffs:
1) Two dns servers.. AIX and Linux based.
2) Linux clients pause when using the AIX DNS to look up a host name, but not the Linux DNS
3) Pause comes AFTER it seems to have the ip address
4) Windows clients don't have the same problem

 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
run a trace on a problem machine to see what it is sending/asking.

My first guess would be something with a reverse lookup not being setup correctly.
 

cleverhandle

Diamond Member
Dec 17, 2001
3,566
3
81
When you're testing with the Linux clients, is the AIX server the only DNS server listed in /etc/resolv.conf? Off the top of my head, I'm not sure it should matter in this situation, but it could be that the Linux machines are looking for another DNS server (the SLES one presumably) that's not online.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Run 'strace ping d-apex1.apexpublish' and see what syscall it hangs on, you'll need to run that as root since strace doesn't work on setuid binaries for regular users.
 

Ares2600

Member
May 30, 2000
124
0
76
why didn't I think of strace? I'm going to do that today.. thanks for the idea. I'll post the results.