why is rsync sometimes so slow?

replica9000 · Mar 20, 2015

Red Squirrel said:
Did it again. The reasoning it's giving is these two:

t -> mtime differs
p -> permissions differ

Not sure how permissions are changing when I did not change anything on the destination.

What type are the filesystems you are syncing?

Red Squirrel · Mar 20, 2015

replica9000 said:
What type are the filesystems you are syncing?

Mostly ext3/4 over NFS. In this particular example they're both on the same NFS share, just copying to another folder.

I also just double checked and I don't have any cron job or anything that would change permissions. Since Linux does not have permission/owner inheritance in some situations I need a cron script to fix any permissions that may not be right but I tend to do that through NFS now by using all_squash and anouid and anongid.

Red Squirrel · Mar 28, 2015

Well ended up having to go back to --checksum mode. This other mode is just too sporadic, it seems to copy what it wants when it wants, even skipping files that changed sometimes while other times deciding to copy a whole group of files that have not even been touched.

The issue with checksum is how it seems to hang on random files at times, otherwise it's fast enough. Is there a way to stop it from hanging like that?

replica9000 · Mar 29, 2015

This is the command I use to back up my things.

Code:

rsync -Pavx --delete /source /destination

Red Squirrel · Apr 14, 2015

I find without checksum it's too unpredictable, it copies what it wants and fails to copy stuff that did change.

But with checksum it randomly stalls on random files. Is there a way to at least fix the stalling? It should not take 10 minutes to copy a 100 byte text file. The stalling is very sporadic too, for example an offsite backup can manage to saturate my 50mb download then suddently it will just stall out on one file, then continue. As I'm typing this it's been on the same file for over 15 minutes, it's just a small shell script probably under 100 bytes. After it's finally done stalling it then starts to copy more files (bigger ones to boot) at much faster rate before it decides to stall again on another file. Why is it doing this?

KillerBee · Apr 16, 2015

What specific H/W and S/W is running between the source and destination?

ex: a Dell Server running Centos 7.x with a NAS attached to it running what version s/w?

I remember an old server that always caused trouble rsyncing
running Centos 5.x with a NAS which ran it's own Windows based storage s/w

No problem rsyncing the internal server drives since they were under control of Centos
but rsyncing the NAS attached to it(which ran Windows internally) always had trouble

mv2devnull · Apr 16, 2015

Switches can malfunction too. For example, pass only part of the traffic.

Red Squirrel · Apr 16, 2015

Lot of different machines, I see this problem all the time. In this particular case it's CentOS 6.x (6.5 but have some 6.6 too I think) over NFS. File system is mostly all ext4. Most of them are running in vmware.

Essence_of_War · Apr 16, 2015

Red Squirrel said:
I find without checksum it's too unpredictable, it copies what it wants and fails to copy stuff that did change.

But with checksum it randomly stalls on random files. Is there a way to at least fix the stalling? It should not take 10 minutes to copy a 100 byte text file. The stalling is very sporadic too, for example an offsite backup can manage to saturate my 50mb download then suddently it will just stall out on one file, then continue. As I'm typing this it's been on the same file for over 15 minutes, it's just a small shell script probably under 100 bytes. After it's finally done stalling it then starts to copy more files (bigger ones to boot) at much faster rate before it decides to stall again on another file. Why is it doing this?

From the problems we've discussed in this thread, it isn't at all clear to me that the problem is rsync and not your network, or some other piece of hardware in the NFS stack. Or perhaps something else entirely.

Have you tried rsyncing to local directories or over directly attached storage? That could be a useful diagnostic.

Red Squirrel · Apr 16, 2015

I could create dummy jobs to test but I presume locally it will work fine, it's NFS and remote storage it seems to have trouble with. Some of the jobs also rsync directly using key pair ex: using user@host for source or destination. The issues are very sporadic which makes it harder to troubleshoot. I know it's not network congestion since when I'm watching the traffic graph during the delays it goes from full saturation to zero during the delay period, not to mention that it happens locally too and there's no way I'm actually congesting my gb LAN. The load average of the server is also not that high during this pause, so it's like if it's actually really waiting. Waiting for what, I don't know.

All my data is on my NAS so the local jobs are to or from the NAS, while internet jobs are through ssh. I don't tend to notice how well the backup jobs work as they happen overnight, but when running them manually I've noticed these delays as well. If I try without checksum, then at random it will decide it wants to update entire folders and it will take longer than it needs to, so even with the random glitches checksum ends up being faster.

Without very expensive temperature compensated GPS based sync/clocking equipment it's impossible to ensure that each system clock is 100% in sync so I have a feeling slight variations of time could be at play here too, for when trying to use time based syncing instead of checksum. Is there perhaps an option to make it so it allows a few seconds in variation perhaps? Idealy it should compare the source and destination system time too and compensate based on that too. Ex: if the destination time is 1 second behind then when checking time stamps it should add +1 second because all files will be 1 second behind and trigger an update when really it should not.

Worse comes to worse I might just write my own app, I don't imagine it would be that hard to do really.

Essence_of_War · Apr 17, 2015

Worse comes to worse I might just write my own app, I don't imagine it would be that hard to do really.

I suppose you could. A LOT of time, effort, and debugging have gone into rsync, though. I'm not sure why you'd think that you'd necessarily do it better starting from scratch. Programmers regularly think that starting from scratch is "better" because the code is "a mess" and they're almost always wrong.

I've previously mentioned the rsync "mtime" flag. I've used it successfully in several contexts, mostly recently with a a regular sync job from a pair of mac pros to a distant CIFS/Samba file server. Adding mtime=1 prevented a bunch of extra data being transmitted due to mtime mismatches (according to the itemized output). It sounds like you were still having problems with using mtime though, right?

If the total size of the files isn't too big, and local network traffic isn't too much of a constraint, perhaps the correct strategy is to just bypass the "smart" part of rsync and just copy/overwrite with something simpler like SCP?

Red Squirrel · Mar 26, 2016

So this started happening again. It thinks a whole bunch of files changed when in reality they didn't.

The reasoning it shows is FCSTP. Which does not make sense because none of those things have actually changed. Some of these scripts I have not touched since like 2007. Why does it suddenly think they changed now? And it's not a few files here and there, if it was I'd just figure it's bit rot or something, but, it's entire directory structures.

As a side note, it seems even git thinks the files changed. Why does it do this? The files did not change!

KillerBee · Mar 29, 2016

Have you figured this out yet Red Squirrel?

Red Squirrel · Mar 30, 2016

Nope, it seems to be really sporadic. Most of my backups happen in the background so I don't pay much attention to it, but it's when I'm watching it where I realize just how bad it is.

Part of the issue is not only that it randomly starts being bloody slow, but also that at random it decides that an entire directory structure have changed even when it didn't.

KillerBee · Mar 30, 2016

It sucks when it's an intermittent problem

Have you tried rsyncing via SSH vs. NFS to help narrow down the problem?

ie: Take one machine where you've seen the problem occur before
unmount the NFS backup directory
change your script to rsync via SSH
then see if you can make it fail again

Crusty · Mar 30, 2016

If git is telling you files have changed then they did.

What does the output of 'git diff' look like? Git will report changes in file permissions as changes to the file FYI.

Red Squirrel · Apr 1, 2016

Is there a way to make it not care about permissions then? Just thought of that now, I have various scripts on my file server that ensure permissions are consistent throughout folders as Linux does not have inheritance and you end up with all sorts of different permissions depending on who/what writes a file so the work around is cron jobs that reset permissions properly. Kinda dirty but I have not figured out anything better. Wonder if these scripts might be causing files to appear as changed even if it didn't.

Essence_of_War · Apr 1, 2016

Red Squirrel said:
the work around is cron jobs that reset permissions properly

Given that you have cron jobs running around changing permissions, I don't know why you're surprised when rsync periodically sees ctime and permission changes.

Red Squirrel · Apr 1, 2016

So is there a way to make it look ONLY at the actual file content? I don't care about permissions or time or anything like that I just care that the file actually changed.

Though I just double checked and that particular folder is not part of the cron job. It's only doing my torrents folder and my backups, as the backups run as root (needs to be able to access everything so it can back it up) and then I want the backups to be accessible by my user. I really wish Linux had a better permission system like NTFS so I would not need to do that, but that's a whole other story.

Essence_of_War · Apr 2, 2016

Red Squirrel said:
So is there a way to make it look ONLY at the actual file content? I don't care about permissions or time or anything like that I just care that the file actually changed.

Yes, but I'm not sure if you actually want that either.

rsync has a checksum flag (-c) which compares the checksums of the files on both sides to test if the content has changed, ignoring metadata clues like mtime and size. Rsync doesn't build a database though, and since it isn't a filesystem, it can't stash the checksums in metadata inodes or something for later use, so it has to regenerate the checksums on both sides every time. This isn't so bad if you're talking about a relatively small number of files, or a situation where there is a client running rsync and a server running the rsync daemon, but if you're doing this with a large filesystem (big files, lot of files, possibly both) this can be quite slow.

Red Squirrel · Apr 2, 2016

Checksum is actually what I use. I found that when I use the other options it ALWAYS wants to copy everything, no matter what, while with checksum it seems to only be every now and then. When I ran it in debug mode it was saying that everything changed including checksum.

replica9000 · Apr 3, 2016

Red Squirrel said:
Is there a way to make it not care about permissions then? Just thought of that now, I have various scripts on my file server that ensure permissions are consistent throughout folders as Linux does not have inheritance and you end up with all sorts of different permissions depending on who/what writes a file so the work around is cron jobs that reset permissions properly. Kinda dirty but I have not figured out anything better. Wonder if these scripts might be causing files to appear as changed even if it didn't.

What about ACLs?

Red Squirrel said:
Checksum is actually what I use. I found that when I use the other options it ALWAYS wants to copy everything, no matter what, while with checksum it seems to only be every now and then. When I ran it in debug mode it was saying that everything changed including checksum.

I find the checksum option to really slow things down, especially when I sync over 5TB of data. Once in a while, I find rsync will sync things that haven't changed, but that's usually to a non-native filesystem.

thecoolnessrune · Apr 5, 2016

Yeah, after years of dealing with Windows, Linux, and BSD systems permissions, I found ACLs was pretty much the definitive solution to getting permissions of a directory on the same page. The Linux inheritance thing is a giant pain in a lot of cases, and using ACL's to do the inheritance for you (and keep doing it as files are modified, added, and removed) makes it worth learning.

Red Squirrel · Apr 5, 2016

Only thing with using ACLs, since they're a 3rd party addon and not really part of the Linux core, will they be honoured all the time by all parts of the system? Ex: if I use ACLs locally will they also apply if I share that directory out with samba or NFS? I might have to take a look at it.

Crusty · Apr 5, 2016

Red Squirrel said:
Only thing with using ACLs, since they're a 3rd party addon and not really part of the Linux core, will they be honoured all the time by all parts of the system? Ex: if I use ACLs locally will they also apply if I share that directory out with samba or NFS? I might have to take a look at it.

I thought ACLs were part of the POSIX standard.

If you are using some other system to access those files they need to understand the POSIX ACL setup. It's not that they are third party and might just disappear one day.

why is rsync sometimes so slow?

Member

No Lifer

No Lifer

Member

No Lifer

Golden Member

Golden Member

No Lifer

Platinum Member

No Lifer

Platinum Member

No Lifer

Golden Member

No Lifer

Golden Member

Lifer

No Lifer

Platinum Member

No Lifer

Platinum Member

No Lifer

Member

Diamond Member

No Lifer

Lifer