• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

git diff with huge files remotely?

Elixer

Lifer
I know, normally, you can do git diff <remote>/branch..branch, but, that requires a fetch.
Thing about that is, that the files are huge, we are talking >40GB each.
(yes, someone had the bright idea of slamming data sets into git, so "we can track them easier!", and they converted the binary data to text.)
What is worse is, their new location only has a ~6Mb connection now (DSL), so that would take a really, really long time, and blow the caps out of the water.

It don't seem possible with git to read in X chunks of data at a time, and compare that, and then move on with the next chunk of data.

The best thing I can think of is have someone download the files for them, and send them a USB stick or whatever.

Anyone know of a better way?
 
Can you do a unidiff with a known common version to get a patch file you can send?
 
I would just sneakernet your .git folder if downloading over the Internet is out of the question. Make sure the person that added the data is footing the bill for the registered mail envelope and then finally scold that person, that's almost as bad as including the build targets/outputs in your source repository IMO.
 
Since I am working remotely, no, not possible to do a unidiff, and they uploaded everything to the repo before they moved locations, they have nothing local.
Servers are in Sweden.

What is more troubling, they wanted to use the ipad to hook up to these massive datasets and peruse them on their own time.
I told them apple devices don't have enough RAM, they said, the apple guy said that we can use the cloud. I go, not with git, it don't work like that.
The person who set this up this monstrosity was fresh out of MIT, and she didn't want to move to the new location supposedly. I think they left because they finally saw this was the stupidest way possible to do this.
The gal didn't even use git lfs. Talk about being brain dead.

The only good news was, when I called the server hosting company they were kind enough to copy all the data to a few HDs, and send it to the company.
 
How often do you have to do the diffs? I mean, the simple answer is "ssh into a system that's local to the data and do your work there." But it's not pretty.

You could install a web front end (cgit, or gitlab, or whatever somebody else cooked up since the last time I looked). View diffs through a web browser might be bad, depending on the browser you use and how much divergence there is.

I'll echo the comments about git not being appropriate for these size/types of files, though. Really, this should be local, and you should use a proper diff tool like BeyondCompare.
 
Last edited:
They have no shell at all to the server... and, it seems they want diffs generated on demand.

But, as I stated, as they are now getting the data sent to them via FedEx, don't have to worry about this insane plan of using git for these huge files.
 
Back
Top