Large files are corrupt when downloaded

bgstcola

Member
Aug 30, 2010
150
0
76
I'm trying to make some 400mb files available for download on my site. It works fine for me but my two friends get randomly incomplete downloads (for example stops at 80MB).

The files made available with a <a href=link>link</a> tag.

Any ideas what could be wrong? Isn't the only possibility that the host somehow handles big files in a bad way? It's just weird that I can download them myself fine o_O

Is there a more reliable way to send files? I've tried a PHP header but with the same results.
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
http file transfer for large files pretty frequently results in corruption. Probably the best way to do it reliably is by using a torrent.
 

bgstcola

Member
Aug 30, 2010
150
0
76
I cant use a torrent since it is protected content. Isn't there another way to do it?

I have a paypal shop on my site and it works by including the name of the secret folder which stores my files in the paypal return URL. So I'm looking for some option which would work in that context. Isn't there some way to improve the transfer reliability with php?
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
I cant use a torrent since it is protected content. Isn't there another way to do it?

I have a paypal shop on my site and it works by including the name of the secret folder which stores my files in the paypal return URL. So I'm looking for some option which would work in that context. Isn't there some way to improve the transfer reliability with php?

I would suggest dropping the "secret folder" approach. Security through obscurity is no security.

If you want the download to work every time, you are going to need to give up distribution through a web server. I suggested a torrent because it automatically breaks the file into chunks and checks a hash on each chunk (which leads to its excellent data integrity.)

you might be able to simulate the same thing by writing a quick app that does just that.

Your only other option is to just have the person redownload the file if an error happens.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Would FTP be any better?

I was going to suggest the same thing. Most web browsers handle it as well as http transfers.

I was a little surprised at Cogman's comments about corruption and unreliability with large binary http transfers. I haven't hosted any really large files on a web server, but thinking about how the protocol works I can't see why it should be so troublesome.

The OP's report isn't about corruption, but non-completion. Sounds to me like the web server just stopped writing bytes to the response stream. What class of hosting package is this running on?
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
I was going to suggest the same thing. Most web browsers handle it as well as http transfers.

I was a little surprised at Cogman's comments about corruption and unreliability with large binary http transfers. I haven't hosted any really large files on a web server, but thinking about how the protocol works I can't see why it should be so troublesome.

The OP's report isn't about corruption, but non-completion. Sounds to me like the web server just stopped writing bytes to the response stream. What class of hosting package is this running on?

Non completion is a form of corruption ;-).

I've seen actual corruption over http all the time (mind you, I did live in Idaho). The http protocol relies heavily on the TCP/IP protocol for its error handling. That is just a simple CRC32 on each of the chunks. While that covers most cases, it doesn't grab them all. With large files you start to see the fail case happen more frequently (most visibly if there are a lot of hops from you to the server).

Think about it. Who really distributes large bits of info over the http protocol any more? If data integrity is important, they'll usually have their own special downloading program that is doing exactly what I described.(steam, windows update, blizzard's patch distribution, which really is just bittorrent etc..)

FTP will have the exact same issues.

The problem will be exacerbated if the client has an unstable hard drive.
 

bgstcola

Member
Aug 30, 2010
150
0
76
Thanks for the replies. I guess this isn't that easy to solve. But do you think the reliability would improve if I changed the host of the site?
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
Thanks for the replies. I guess this isn't that easy to solve. But do you think the reliability would improve if I changed the host of the site?

Depends on what the problem is. If it is just cause from random electrical spikes that just so happen to align in a way that makes chunks of the file data pass its checksums, then no, the issue is bigger.

If it is cause by some sort of bad network setup. Maybe, depends on if the data has to hop through the same crappy network.

If it is caused by a server side error, yes.

If it is caused by a client side error, no.

The fact that you can download the file just fine suggests to me that the problem isn't going to be fixed by changing hosts.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Non completion is a form of corruption ;-).

Good point! And yeah, it's definitely the case that http wasn't designed for ensuring the delivery of large amounts of binary data.

Think about it. Who really distributes large bits of info over the http protocol any more?

Over http, probably relatively few, at least of the high volume cases. Over tcp? Probably the majority at this point. Bitorrent being a peer-to-peer mesh protocol kind of rules it out for anything where channel security is important, doesn't it?

FTP will have the exact same issues.

We've moved a lot of large (500+ mb) files using ftp and sftp, and I don't recall any cases of corruption, but then I work in a major metropolitan area, as do most of the recipients of the data. We gots good intarnetz.
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,703
4,661
75
I would suggest dropping the "secret folder" approach. Security through obscurity is no security.
So encrypt the file first, then torrent it. :)

Edit: Anybody familiar enough with the command line to use "wget -c http://path.to.file"?
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
Good point! And yeah, it's definitely the case that http wasn't designed for ensuring the delivery of large amounts of binary data.

Over http, probably relatively few, at least of the high volume cases. Over tcp? Probably the majority at this point. Bitorrent being a peer-to-peer mesh protocol kind of rules it out for anything where channel security is important, doesn't it?
TCP is used for the transfer (just because it DOES catch errors and most routers have inbuilt refetching stuff making transfers more stable than without.) but it isn't used alone. Generally, things are chunked and then hashed with a strong, non-crc hash which makes it MUCH more unlikely that the universe aligns in a way that causes the file to download incorrectly. The also provides them with stability over unstable connections.

We've moved a lot of large (500+ mb) files using ftp and sftp, and I don't recall any cases of corruption, but then I work in a major metropolitan area, as do most of the recipients of the data. We gots good intarnetz.
:) You probably aren't moving the data far. In some areas, I never experience corruption, in others, it is almost common (dang Idaho) whether you get data corruption or not is highly dependent on the routers you data has to pass through to get to you.

Do you do any secondary hashing with your file transfers? (IG, md5sum)
 

Fallen Kell

Diamond Member
Oct 9, 1999
6,222
540
126
Take a cluestick from the Newgroup guys. Split the file into a multi-file rar archive and create a par2 recovery volume.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
Do you do any secondary hashing with your file transfers? (IG, md5sum)

Yes, MD5 for the most part, but the incidence of retries is extremely low as far as I know.

As for how far we move the data, almost all of it moves around a triangle including data centers on the east coast, the midwest, and the west coast.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
I'm trying to make some 400mb files available for download on my site. It works fine for me but my two friends get randomly incomplete downloads (for example stops at 80MB).

The files made available with a <a href=link>link</a> tag.

Any ideas what could be wrong? Isn't the only possibility that the host somehow handles big files in a bad way? It's just weird that I can download them myself fine o_O

Is there a more reliable way to send files? I've tried a PHP header but with the same results.

Sounds like an apache timeout issue. Unless you can edit the apache configs (ie not on shared hosting) your sol (if you want to use http).