file compression that compresses duplication in files?

Jul 26, 2006
143
2
81
Out of curiosity is there any compress programs out there that can compress files and remove the duplication of the content of the file?

Example: You have 10 video files, each 1 GB, each with the 500MB of 100% same intro. Is there any compression that would give me a ~5GB combined total compression instead of 10GB?

I tried this with 7zip and winrar, both failed to reduce the file size more then a few % point even tho all 7 of my clips had the same intro, ending and misc parts in the middle (it should of given me about 50% compression instead I got less then 5%).
 

Gooberlx2

Lifer
May 4, 2001
15,381
6
91
You're basically asking for deduplication. This is achieved with "solid" archives. 7zip, for example, has options for this. Linux has done this forever with compressed tarballs (tar.gz). Basically, put the files into one data stream before being sent to the compression algorithm.

With regards to your video example, however, unless those intros are bit-for-bit the same, you won't see a lot of compression. Every frame in most videos depends on the frames before and behind it to determine which pixels actually need to change, and by how much. That's how codecs like x264 achieve their own compression, resulting in small video files relative to the quality. It's also why fast action movies don't compress as much as slow dramas and animations.

Factor in differences like source quality, transfer quality, encoding parameters, etc... If you were to take the same exact frame of, say, the Universal Studios intro, from two different movies, and do a difference subtraction in Gimp or Photoshop, the remaining non-black pixels you'd see would be the actual difference between those frames. It'd probably be surprisingly substantial.
 
Last edited:

jolancer

Senior member
Sep 6, 2004
469
0
0
As goober pointed out that wouldnt work anyway because of the suttle compression differences from the encoder or source. Alternative like i do on some of my vids, if they have the exact same Intro or Outro, Just split them, if there encoded correctly that is so the key frame starts almost emediatly after the scene change from Intro.. you can just split them there (split has to start at a key frame but can End anywere ). then just keep one intro for the start of each playlist to share and you never no the diff and save all that space without compression.
 

masteryoda34

Golden Member
Dec 17, 2007
1,399
3
81
You're basically asking for deduplication. This is achieved with "solid" archives. 7zip, for example, has options for this. Linux has done this forever with compressed tarballs (tar.gz). Basically, put the files into one data stream before being sent to the compression algorithm.

With regards to your video example, however, unless those intros are bit-for-bit the same, you won't see a lot of compression. Every frame in most videos depends on the frames before and behind it to determine which pixels actually need to change, and by how much. That's how codecs like x264 achieve their own compression, resulting in small video files relative to the quality. It's also why fast action movies don't compress as much as slow dramas and animations.

Factor in differences like source quality, transfer quality, encoding parameters, etc... If you were to take the same exact frame of, say, the Universal Studios intro, from two different movies, and do a difference subtraction in Gimp or Photoshop, the remaining non-black pixels you'd see would be the actual difference between those frames. It'd probably be surprisingly substantial.

Great answer. I was hoping someone was going to tackle this.
 

Chiefcrowe

Diamond Member
Sep 15, 2008
5,056
199
116
How do you set up 7zip to do deduplication? I never realized you could do that.



You're basically asking for deduplication. This is achieved with "solid" archives. 7zip, for example, has options for this. Linux has done this forever with compressed tarballs (tar.gz). Basically, put the files into one data stream before being sent to the compression algorithm.

With regards to your video example, however, unless those intros are bit-for-bit the same, you won't see a lot of compression. Every frame in most videos depends on the frames before and behind it to determine which pixels actually need to change, and by how much. That's how codecs like x264 achieve their own compression, resulting in small video files relative to the quality. It's also why fast action movies don't compress as much as slow dramas and animations.

Factor in differences like source quality, transfer quality, encoding parameters, etc... If you were to take the same exact frame of, say, the Universal Studios intro, from two different movies, and do a difference subtraction in Gimp or Photoshop, the remaining non-black pixels you'd see would be the actual difference between those frames. It'd probably be surprisingly substantial.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
7zip apparently does this by default, when you right click a folder and choose "7-zip --> Add to myfolder.7z". I zipped up a folder that had 3 copies of the same 11MB file, and it zipped up to 222KB. I upped the count from 3 copies to 10 copies and the .7z output file only increased to 230KB! So it used a mere 8K to hold those 7 extra copies.
 

Gooberlx2

Lifer
May 4, 2001
15,381
6
91
How do you set up 7zip to do deduplication? I never realized you could do that.

The relevant option is highlighted. I don't know what changes the block size does exactly, maybe improves compression/decompression speed at the cost of compressability?

RnPAD.png


Dunno what dictionary size or word size do.
edit: here's a good explanation of things
 
Last edited: