why is rsync sometimes so slow?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
What type are the filesystems you are syncing?

Mostly ext3/4 over NFS. In this particular example they're both on the same NFS share, just copying to another folder.

I also just double checked and I don't have any cron job or anything that would change permissions. Since Linux does not have permission/owner inheritance in some situations I need a cron script to fix any permissions that may not be right but I tend to do that through NFS now by using all_squash and anouid and anongid.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Well ended up having to go back to --checksum mode. This other mode is just too sporadic, it seems to copy what it wants when it wants, even skipping files that changed sometimes while other times deciding to copy a whole group of files that have not even been touched.

The issue with checksum is how it seems to hang on random files at times, otherwise it's fast enough. Is there a way to stop it from hanging like that?
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
I find without checksum it's too unpredictable, it copies what it wants and fails to copy stuff that did change.

But with checksum it randomly stalls on random files. Is there a way to at least fix the stalling? It should not take 10 minutes to copy a 100 byte text file. The stalling is very sporadic too, for example an offsite backup can manage to saturate my 50mb download then suddently it will just stall out on one file, then continue. As I'm typing this it's been on the same file for over 15 minutes, it's just a small shell script probably under 100 bytes. After it's finally done stalling it then starts to copy more files (bigger ones to boot) at much faster rate before it decides to stall again on another file. Why is it doing this?
 
Last edited:

KillerBee

Golden Member
Jul 2, 2010
1,750
82
91
What specific H/W and S/W is running between the source and destination?

ex: a Dell Server running Centos 7.x with a NAS attached to it running what version s/w?

I remember an old server that always caused trouble rsyncing
running Centos 5.x with a NAS which ran it's own Windows based storage s/w

No problem rsyncing the internal server drives since they were under control of Centos
but rsyncing the NAS attached to it(which ran Windows internally) always had trouble
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Lot of different machines, I see this problem all the time. In this particular case it's CentOS 6.x (6.5 but have some 6.6 too I think) over NFS. File system is mostly all ext4. Most of them are running in vmware.
 

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
I find without checksum it's too unpredictable, it copies what it wants and fails to copy stuff that did change.

But with checksum it randomly stalls on random files. Is there a way to at least fix the stalling? It should not take 10 minutes to copy a 100 byte text file. The stalling is very sporadic too, for example an offsite backup can manage to saturate my 50mb download then suddently it will just stall out on one file, then continue. As I'm typing this it's been on the same file for over 15 minutes, it's just a small shell script probably under 100 bytes. After it's finally done stalling it then starts to copy more files (bigger ones to boot) at much faster rate before it decides to stall again on another file. Why is it doing this?

From the problems we've discussed in this thread, it isn't at all clear to me that the problem is rsync and not your network, or some other piece of hardware in the NFS stack. Or perhaps something else entirely.

Have you tried rsyncing to local directories or over directly attached storage? That could be a useful diagnostic.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
I could create dummy jobs to test but I presume locally it will work fine, it's NFS and remote storage it seems to have trouble with. Some of the jobs also rsync directly using key pair ex: using user@host for source or destination. The issues are very sporadic which makes it harder to troubleshoot. I know it's not network congestion since when I'm watching the traffic graph during the delays it goes from full saturation to zero during the delay period, not to mention that it happens locally too and there's no way I'm actually congesting my gb LAN. The load average of the server is also not that high during this pause, so it's like if it's actually really waiting. Waiting for what, I don't know.

All my data is on my NAS so the local jobs are to or from the NAS, while internet jobs are through ssh. I don't tend to notice how well the backup jobs work as they happen overnight, but when running them manually I've noticed these delays as well. If I try without checksum, then at random it will decide it wants to update entire folders and it will take longer than it needs to, so even with the random glitches checksum ends up being faster.

Without very expensive temperature compensated GPS based sync/clocking equipment it's impossible to ensure that each system clock is 100% in sync so I have a feeling slight variations of time could be at play here too, for when trying to use time based syncing instead of checksum. Is there perhaps an option to make it so it allows a few seconds in variation perhaps? Idealy it should compare the source and destination system time too and compensate based on that too. Ex: if the destination time is 1 second behind then when checking time stamps it should add +1 second because all files will be 1 second behind and trigger an update when really it should not.

Worse comes to worse I might just write my own app, I don't imagine it would be that hard to do really.
 

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
Worse comes to worse I might just write my own app, I don't imagine it would be that hard to do really.

I suppose you could. A LOT of time, effort, and debugging have gone into rsync, though. I'm not sure why you'd think that you'd necessarily do it better starting from scratch. Programmers regularly think that starting from scratch is "better" because the code is "a mess" and they're almost always wrong.

I've previously mentioned the rsync "mtime" flag. I've used it successfully in several contexts, mostly recently with a a regular sync job from a pair of mac pros to a distant CIFS/Samba file server. Adding mtime=1 prevented a bunch of extra data being transmitted due to mtime mismatches (according to the itemized output). It sounds like you were still having problems with using mtime though, right?

If the total size of the files isn't too big, and local network traffic isn't too much of a constraint, perhaps the correct strategy is to just bypass the "smart" part of rsync and just copy/overwrite with something simpler like SCP?
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
So this started happening again. It thinks a whole bunch of files changed when in reality they didn't.

The reasoning it shows is FCSTP. Which does not make sense because none of those things have actually changed. Some of these scripts I have not touched since like 2007. Why does it suddenly think they changed now? And it's not a few files here and there, if it was I'd just figure it's bit rot or something, but, it's entire directory structures.

As a side note, it seems even git thinks the files changed. Why does it do this? The files did not change!
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Nope, it seems to be really sporadic. Most of my backups happen in the background so I don't pay much attention to it, but it's when I'm watching it where I realize just how bad it is.

Part of the issue is not only that it randomly starts being bloody slow, but also that at random it decides that an entire directory structure have changed even when it didn't.
 

KillerBee

Golden Member
Jul 2, 2010
1,750
82
91
It sucks when it's an intermittent problem

Have you tried rsyncing via SSH vs. NFS to help narrow down the problem?

ie: Take one machine where you've seen the problem occur before
unmount the NFS backup directory
change your script to rsync via SSH
then see if you can make it fail again
 
Last edited:

Crusty

Lifer
Sep 30, 2001
12,684
2
81
If git is telling you files have changed then they did.

What does the output of 'git diff' look like? Git will report changes in file permissions as changes to the file FYI.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Is there a way to make it not care about permissions then? Just thought of that now, I have various scripts on my file server that ensure permissions are consistent throughout folders as Linux does not have inheritance and you end up with all sorts of different permissions depending on who/what writes a file so the work around is cron jobs that reset permissions properly. Kinda dirty but I have not figured out anything better. Wonder if these scripts might be causing files to appear as changed even if it didn't.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
So is there a way to make it look ONLY at the actual file content? I don't care about permissions or time or anything like that I just care that the file actually changed.

Though I just double checked and that particular folder is not part of the cron job. It's only doing my torrents folder and my backups, as the backups run as root (needs to be able to access everything so it can back it up) and then I want the backups to be accessible by my user. I really wish Linux had a better permission system like NTFS so I would not need to do that, but that's a whole other story.
 
Last edited:

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
So is there a way to make it look ONLY at the actual file content? I don't care about permissions or time or anything like that I just care that the file actually changed.

Yes, but I'm not sure if you actually want that either.

rsync has a checksum flag (-c) which compares the checksums of the files on both sides to test if the content has changed, ignoring metadata clues like mtime and size. Rsync doesn't build a database though, and since it isn't a filesystem, it can't stash the checksums in metadata inodes or something for later use, so it has to regenerate the checksums on both sides every time. This isn't so bad if you're talking about a relatively small number of files, or a situation where there is a client running rsync and a server running the rsync daemon, but if you're doing this with a large filesystem (big files, lot of files, possibly both) this can be quite slow.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Checksum is actually what I use. I found that when I use the other options it ALWAYS wants to copy everything, no matter what, while with checksum it seems to only be every now and then. When I ran it in debug mode it was saying that everything changed including checksum.
 

replica9000

Member
Dec 20, 2014
74
0
0
Is there a way to make it not care about permissions then? Just thought of that now, I have various scripts on my file server that ensure permissions are consistent throughout folders as Linux does not have inheritance and you end up with all sorts of different permissions depending on who/what writes a file so the work around is cron jobs that reset permissions properly. Kinda dirty but I have not figured out anything better. Wonder if these scripts might be causing files to appear as changed even if it didn't.

What about ACLs?

Checksum is actually what I use. I found that when I use the other options it ALWAYS wants to copy everything, no matter what, while with checksum it seems to only be every now and then. When I ran it in debug mode it was saying that everything changed including checksum.

I find the checksum option to really slow things down, especially when I sync over 5TB of data. Once in a while, I find rsync will sync things that haven't changed, but that's usually to a non-native filesystem.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,673
583
126
Yeah, after years of dealing with Windows, Linux, and BSD systems permissions, I found ACLs was pretty much the definitive solution to getting permissions of a directory on the same page. The Linux inheritance thing is a giant pain in a lot of cases, and using ACL's to do the inheritance for you (and keep doing it as files are modified, added, and removed) makes it worth learning.
 

Red Squirrel

No Lifer
May 24, 2003
68,760
12,779
126
www.anyf.ca
Only thing with using ACLs, since they're a 3rd party addon and not really part of the Linux core, will they be honoured all the time by all parts of the system? Ex: if I use ACLs locally will they also apply if I share that directory out with samba or NFS? I might have to take a look at it.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Only thing with using ACLs, since they're a 3rd party addon and not really part of the Linux core, will they be honoured all the time by all parts of the system? Ex: if I use ACLs locally will they also apply if I share that directory out with samba or NFS? I might have to take a look at it.

I thought ACLs were part of the POSIX standard.

If you are using some other system to access those files they need to understand the POSIX ACL setup. It's not that they are third party and might just disappear one day.