Domain replication errors

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
We are getting lot of replication issues on both our DCs, I've done research but it basically just repeats what the event log says, which is not very helpful. What would cause this?

Replication does seem to work ok, since i created a test user and it ended up on the other DC, I deleted it, it was then deleted on the other DC.

These are the errors we are getting:

Code:
Event Type:	Error
Event Source:	NTDS Replication
Event Category:	Replication 
Event ID:	1864
Date:		2/15/2010
Time:		3:12:56 PM
User:		NT AUTHORITY\ANONYMOUS LOGON
Computer:	TDHDC1
Description:
This is the replication status for the following directory partition on the local domain controller. 
 
Directory partition:
DC=DomainDnsZones,DC=DOMAIN,DC=LOCAL 
 
The local domain controller has not recently received replication information from a number of domain controllers.   The count of domain controllers is shown, divided into the following intervals. 
 
More than 24 hours:
1 
More than a week:
1 
More than one month:
1 
More than two months:
1 
More than a tombstone lifetime:
0 
Tombstone lifetime (days):
180 
 Domain controllers that do not replicate in a timely manner may encounter errors. It may miss password changes and be unable to authenticate. A DC that has not replicated in a tombstone lifetime may have missed the deletion of some objects, and may be automatically blocked from future replication until it is reconciled. 
 
To identify the domain controllers by name, install the support tools included on the installation  CD and run dcdiag.exe. 
You can also use the support tool repadmin.exe to display the replication latencies of the domain controllers in the forest.   The command is "repadmin /showvector /latency <partition-dn>".

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


Event Type:	Warning
Event Source:	NTDS Replication
Event Category:	Backup 
Event ID:	2089
Date:		2/15/2010
Time:		3:12:56 PM
User:		NT AUTHORITY\ANONYMOUS LOGON
Computer:	TDHDC1
Description:
This directory partition has not been backed up since at least the following number of days. 
 
Directory partition: 
DC=ForestDnsZones,DC=DOMAIN,DC=LOCAL 
 
'Backup latency interval' (days): 
90 
 
It is recommended that you take a backup as often as possible to recover from accidental loss of data. However if you haven't taken a backup since at least the 'backup latency interval' number of days, this message will be logged every day until a backup is taken. You can take a backup of any replica that holds this partition. 
 
By default the 'Backup latency interval' is set to half the 'Tombstone Lifetime Interval'. If you want to change the default 'Backup latency interval', you could do so by adding the following registry key. 
 
'Backup latency interval' (days) registry key: 
System\CurrentControlSet\Services\NTDS\Parameters\Backup Latency Threshold (days) 


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


There seems to be a pattern, easier to post a pic then to try to explain:

replication_errors.png


It always starts at 3:12:56pm. Anyone ever seen this before? Oh and on the BDC it actually does the same, but at 6:01:06. And yes, I say BDC even though it's a 2003 environment. It's been migrated from NT4 long time ago so it still acts as PCD / BDC. Not sure why. If the primary DC goes down, everything goes down. The primary is the FSMO.
 
Last edited:

stash

Diamond Member
Jun 22, 2000
5,468
0
0
Quite a while back, we had 3, one died and it was the primary so we had to force over the roles.

Did you do a metadata cleanup to remove the dead server?

How do i go about forcing that to existing DC?

The same way you did the other roles (ntdsutil)?

It's been migrated from NT4 long time ago so it still acts as PCD / BDC. Not sure why. If the primary DC goes down, everything goes down. The primary is the FSMO.

Is your DNS configured correctly?
 

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
Did you do a metadata cleanup to remove the dead server?



The same way you did the other roles (ntdsutil)?



Is your DNS configured correctly?


To be quite honest I'm not sure about metadata cleanup, but thanks for pointing it out, I will google that and read up on it further. I was not 100% involved in the process. I will look at ntdsutil, and now that you mention it think that's in fact what we used. Been a few months so I forgot.

I was pretty sure we had cleared all instances of the dead DC but I stumbled upon the schema master one so think that should be my primary concern and probably IS the cause of the errors.
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
I was pretty sure we had cleared all instances of the dead DC but I stumbled upon the schema master one so think that should be my primary concern and probably IS the cause of the errors.

No, the cause of the error is a domain controller that was not removed properly. The error is telling you that it hasn't replicated with one DC is a couple of months. If the DC had been removed properly, the remaining DCs wouldn't be trying to replicate with it.
 

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
No, the cause of the error is a domain controller that was not removed properly. The error is telling you that it hasn't replicated with one DC is a couple of months. If the DC had been removed properly, the remaining DCs wouldn't be trying to replicate with it.

We were unable to remove it properly (do a demote) as it crashed. One day, it just started to freeze up very badly and brought the whole domain down. Once we took it offline, it was responsive as soon as we put it back on the network, bam. We never figured out the cause, we just had to be up and running due to our crazy IT manager breathing down our neck. So we just forced the next good dc to be the primary and have the FSMO roles. Though I missed this one. Problem I see is there are still references to the crashed DC so I need to figure how to rip those references out. We can't bring it online or it will crap out the domain again.

I am looking into the metadata cleanup now, since that looks like it might do the trick.
 

dphantom

Diamond Member
Jan 14, 2005
4,763
327
126
We are getting lot of replication issues on both our DCs, I've done research but it basically just repeats what the event log says, which is not very helpful. What would cause this?

Replication does seem to work ok, since i created a test user and it ended up on the other DC, I deleted it, it was then deleted on the other DC.

These are the errors we are getting:

Code:
Event Type:	Error
Event Source:	NTDS Replication
Event Category:	Replication 
Event ID:	1864
Date:		2/15/2010
Time:		3:12:56 PM
User:		NT AUTHORITY\ANONYMOUS LOGON
Computer:	TDHDC1
Description:
This is the replication status for the following directory partition on the local domain controller. 
 
Directory partition:
DC=DomainDnsZones,DC=DOMAIN,DC=LOCAL 
 
The local domain controller has not recently received replication information from a number of domain controllers.   The count of domain controllers is shown, divided into the following intervals. 
 
More than 24 hours:
1 
More than a week:
1 
More than one month:
1 
More than two months:
1 
More than a tombstone lifetime:
0 
Tombstone lifetime (days):
180 
 Domain controllers that do not replicate in a timely manner may encounter errors. It may miss password changes and be unable to authenticate. A DC that has not replicated in a tombstone lifetime may have missed the deletion of some objects, and may be automatically blocked from future replication until it is reconciled. 
 
To identify the domain controllers by name, install the support tools included on the installation  CD and run dcdiag.exe. 
You can also use the support tool repadmin.exe to display the replication latencies of the domain controllers in the forest.   The command is "repadmin /showvector /latency <partition-dn>".

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


Event Type:	Warning
Event Source:	NTDS Replication
Event Category:	Backup 
Event ID:	2089
Date:		2/15/2010
Time:		3:12:56 PM
User:		NT AUTHORITY\ANONYMOUS LOGON
Computer:	TDHDC1
Description:
This directory partition has not been backed up since at least the following number of days. 
 
Directory partition: 
DC=ForestDnsZones,DC=DOMAIN,DC=LOCAL 
 
'Backup latency interval' (days): 
90 
 
It is recommended that you take a backup as often as possible to recover from accidental loss of data. However if you haven't taken a backup since at least the 'backup latency interval' number of days, this message will be logged every day until a backup is taken. You can take a backup of any replica that holds this partition. 
 
By default the 'Backup latency interval' is set to half the 'Tombstone Lifetime Interval'. If you want to change the default 'Backup latency interval', you could do so by adding the following registry key. 
 
'Backup latency interval' (days) registry key: 
System\CurrentControlSet\Services\NTDS\Parameters\Backup Latency Threshold (days) 


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


There seems to be a pattern, easier to post a pic then to try to explain:

replication_errors.png


It always starts at 3:12:56pm. Anyone ever seen this before? Oh and on the BDC it actually does the same, but at 6:01:06. And yes, I say BDC even though it's a 2003 environment. It's been migrated from NT4 long time ago so it still acts as PCD / BDC. Not sure why. If the primary DC goes down, everything goes down. The primary is the FSMO.

No, it is not a BDC. Once it was upgraded to Windows 2003, it becomes another DC, probably a GC as well. As others have pointed out, the old DC that was not gracefully demoted is still in your AD schema and needs to be cleaned out.
 

WicKeD

Golden Member
Nov 20, 2000
1,893
0
0
Do you have Windows Support Tools installed? If so, clear out your Event Viewer and run

dcdiag /fix

See what your outcome is.

You may also have to go into ADSI Edit and remove any traces of your old DC.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
AS a side note, there is 5 FSMO roles. Make sure you moved all five. Only 2 or 3 of them are obvious depending on which MMC your using. Also NT4 -> 2003 will leave the domain in an intermediate state called "Windows 2003 interim". You generally don't want to be in that state.

Do the meta data clean up, make sure all the FSMO roles are seized properly and make sure the Domain is elevated properly.
 

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
AS a side note, there is 5 FSMO roles. Make sure you moved all five. Only 2 or 3 of them are obvious depending on which MMC your using. Also NT4 -> 2003 will leave the domain in an intermediate state called "Windows 2003 interim". You generally don't want to be in that state.

Do the meta data clean up, make sure all the FSMO roles are seized properly and make sure the Domain is elevated properly.

Yeah I thought there was only 4 as originally when I did research, an article I read did say there were 4, but it failed to mention the schema master.

I will try out the meta data cleanup tomorrow. I was getting errors today when I tried it but it was late during the day and did not get enough chance to look into it. It said something about already having a global connection, or something... If I can't figure out I'll post the exact error tomorrow.
 

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
Ok so i did the metadata cleanup, now I don't see references to the old DC anymore so I'm getting somewhere. I have a few errors in dxdiag, such as IsmServ service not running (on primary) and it is running on the secondary. On the secondary there is a bit more errors though such as failing test frsevent and kccevent. I will do more research.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Ok so i did the metadata cleanup, now I don't see references to the old DC anymore so I'm getting somewhere. I have a few errors in dxdiag, such as IsmServ service not running (on primary) and it is running on the secondary. On the secondary there is a bit more errors though such as failing test frsevent and kccevent. I will do more research.

IsmServ service: this is legacy. Check your sites and services and if you see 'smtp' as the transport you want to flip it to IP or RPC. This might have something to do with Windows 2003 interm if you still in that legacy mode.

frsevent: This means there is an error logged in the event log in the last 24 hours. It might be worth taking a copy of your sysvol folder in case FRS is messed up as you can get some weird errors later. Grab the copy on the machine that has the FSMO domain roles. You might be able to terminate FRS on the other machine, delete the folder and let FRS grab new copies off the main.

Check the error and see what it is complaining about before you do anything though. It is in in the File Replication Service Event log.

kccevent: This often is a symptom not a problem itself. Fix the above and this one might 'go away'

Post the events here if you have questions.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
Thanks for the info. There are the daily replication errors so I will wait to see if they go away now that I took the removed the crapped out DC. Probably related.

When I check site and services I do see IP and SMTP under Inter-site transports. There's nothing under SMTP, and under IP there is an object called DEFAULTIPSITELINK. Is this how it should be?
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Thanks for the info. There are the daily replication errors so I will wait to see if they go away now that I took the removed the crapped out DC. Probably related.

When I check site and services I do see IP and SMTP under Inter-site transports. There's nothing under SMTP, and under IP there is an object called DEFAULTIPSITELINK. Is this how it should be?

All the servers and clients are on the same subnet?

If yes: What your seeing is fine.

If no: I would be surprised if it worked right at all.

Edit: Brain fart: In AD Sites and Services, Check the 'DefaultSite' and you can drill down to the servers. Pick one, then drill down to NTDS settings you should see some items with a type of connection. Right click on one > Properties. Transport should be RPC in most cases, IP the rest of the time and very rarely SMTP. So rarely I would doubt there is a need anymore.

Also on that same menu is where you can find the 'Replicate now' command to force the replication and check for errors.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
70,087
13,536
126
www.anyf.ca
Yeah they are on the same subnet as far as I know. The other domains that are part of the trust arn't, or maybe they are... we can ping them and such. They're routable.