what are some of the methods that are used to compress audio?

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
besides general compression (like Zipping a wav file!).

I'm wondering, becuase I thought of this one thing.. first off, to get you to understand my way of thinking :)

in a 16 bit 44100 hz sample rate audio stream, each sample is composed of 16 bits of data (so 16 1's or 0's). this allows for a combonation of 65535 possible positions that the pressure of the air could be. ok, now, there is a neutral position, placed precisely at the middle of the max number that 16 bits can make which is 65535 / 2 = ~32767 (I think)

ok, so say that your audio stream has absolutely no audio in it (in other words, it's devoid of sound). instead of having all 16 bits represent ~32767, couldn't you replace that with the number 1/2 (which is 2 bits in length), which the computer immmediately turns into x/65535.

I'm guessing this is already implimented everywhere, but I wanted to see.. just in case! anyway, you can follow this to much greater detail, so that if you had the pressure at say 1/8th of ~32767, that's 1/16th of 65535, which can be turned into a number in binary with LESS bits then the it would be when it's x/65535.
 

Lore

Diamond Member
Oct 24, 1999
3,624
1
76
Are you talking about methods _other_ than mp3s and the codecs used for cell phones and video conferencing?
 

AndyHui

Administrator Emeritus<br>Elite Member<br>AT FAQ M
Oct 9, 1999
13,141
16
81
Variable Bit Rate (VBR) encoding for MP3s uses this idea...
 

Budman

Lifer
Oct 9, 1999
10,980
0
0
Open up your control panel, then goto Multimedia , then devices tab, then look at the audio compression codecs, those are all the different codecs you can use to compress audio in windows.
 

Becks2k

Senior member
Oct 2, 2000
391
0
0
I tried to find how some compression works for audio and couldn't find any good info :(

I think monkeyaudio has/had the best lossless audio compression and its about 50%
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
&quot;Well, you know how mp3s work, right?&quot;

no I don't.. not in it's entirety anyway. I've had some explanations, but I don't know how true they are..
 

Modus

Platinum Member
Oct 9, 1999
2,235
0
0
The compression of audio streams, like any other data compression, falls into two categories: lossy and lossless.

Lossless compression shrinks a set of data but retains the ability to extract the same data set later, with zero loss of information. This is what we do when we use binary compression software like ZIP and RAR. The mechanics of how it's done are explained very technically in AnandTech's All About Compression.

Compression algorithms employ all sorts of clever tricks to reduce the size of a given data set. The most general tricks are Huffman encoding, Lempel Ziv encoding, and Run Length Encoding (RLE). Huffman and LZ are explained in the article above but may be difficult to grasp without programming experience. RLE, however, is a powerful and simple concept that applies to all compression schemes.

RLE compresses data by greatly reducing the space occupied by long streams of identical data. For instance, a web advertisement probably contains a transparent background that is more or less continuous in certain parts of the image. RLE would take a line of that image that was all white or some other color, and instead record only the length of the color stream and the color used. The potential space gains are enormous.

Data compression software steadily improved until about ten years ago. Although there are some interesting mathematical breakthroughs yet to be seen, most experts agree that general lossless compression algorithms are almost as efficient as they can be.

Some types of data are more agreeable to lossless compression than others. English text, for instance, usually compresses to around 20% of its original size. Multimedia sound and video, however, does not compress nearly as well. This is where lossy compression comes in.

Lossy compression works on the assumption that much of the data in multimedia files is redundant or not noticeable by typical human perception. This data is removed and intelligently &quot;smoothed over&quot; to make the file easier to compress.

Imagine a photograph of a sunset with a long, horizontal, orange streak. Since this is nature, the very middle of the streak happens to have a slightly darker shade of orange, not quite detectable by the human eye, but captured nevertheless by the camera. When stored digitally, the slight &quot;imperfection&quot; in the orange streak really gets in the way of an RLE encoding scheme, because it cuts the nice orange stream in half and actually triples the size of the resultant compressed data. Lossy compression algorithms like JPG will actually go and make the whole streak the exact same color of orange. Your eyes likely will not notice, but your patience certainly will when the download comes in twice as fast.

The principle for MP3 audio compression is essentially the same: smooth out the undetectable &quot;peak and valleys&quot; of natural sound to make it easier to compress.

Modus
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
ok, 90% of that I already knew.

second, I have to wonder if Zip or RAR files can compress the file in the way I was talking about (it's possible).

so does anyone else have anything to say? thanks for the info on Anand's article, MODUS! I'll go read it, see if I can dig any more dirt up on this..

ahh ok I've read it, and it doesn't mention audio compression. the compression it DOES mention, I already knew (the repetitive stuff gets represented by a symbol), and the other method, which more complex, doesn't do the same as what I'm saying..

anyone else with this info, help would be greatly appreciated.
 

Modus

Platinum Member
Oct 9, 1999
2,235
0
0
I still don't understand the method you describe in the original post. Maybe rephrase it an elaborate and I can tell you if that kind of thing is done.

Modus
 

StrangeRanger

Golden Member
Oct 9, 1999
1,316
0
0
soccerman...check this out: SoftSound
This is a company that makes a lossless and lossy compression package for audio files. Excellent package and it looks like this page will give you some insights.
j
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
ok I will try.. its very hard to describe without a picture of some sort..

each piece of audio is made up of 16 bits (in a 16 bit, 44100hz audio stream for example). the total possible number of combonations given to you is 65535. (1+2+4+8+16+32+64+128+256+512+1024+2048+4096+16384+32768) = 65535

that's one way you can find the maximum value of the binary number you have, OR what you can do, is go to the bit after the last one (in this case a 17th bit) and find what that bit's value is, and subtract 1 to find the value of 1111111111111111. it's like doing 25-1=24, only with binary you have to know what decimal number each bit represents.

anywho back on topic. normally for each piece of audio you have 16 bits, giving a number, which corresponds to the air pressure at that single moment in time in the song (I'm looking at this from an oscilliscope view).

instead of writing 32767/65535 as the middle, neutral pressure (0), you could just write 1/2, which would require 4 bits, rather then 16.

your decoder then comes along and sees that this isn't x over 65535 (x/65535), and therefor converts it automatically, by multiplying the fraction by the maximum number of 16 bits, therebye getting the exact same (acually more accurate, becuase 1/2 of 65535 is a decimal) answer.
 

Modus

Platinum Member
Oct 9, 1999
2,235
0
0
OK, I see where you're comming from. However, a little insight into how computers store fractions and decimals (non-integrers) will show you that the method you're describing wouldn't really save space.

To store a whole number in binary is fairly simple: you convert it from base 10 to base 2. 2 becomes 10, 3 becomes 11, 4 becomes 100, 5 becomes 101, etc. The problem is, a fractional number (floating point decimal) is not so easy to store in binary. For instance, representing 1/2 in binary is not so bad: it's 1/10. But another arbitrary fraction, say 2787/65535 is much more difficult to represent. For that kind of fraction, which only reduces to 929/21845, storing it as a binary fraction actually takes MORE space than just storing 2787.

Now, you could also use a floating point value to hold the sound data, but it still wouldn't be as space-efficient as a simple integer, because floating point values must use a kind of binary scientific notation. The method you describe is only useful when the value being stored is a nice round fraction like 32768/65536, which reduces greatly. You could code an encoder that would detect these kinds of values and store them as fractions, but it would probably be more trouble than it's worth, not to mention the fact that the decoder would have to slow down to process it.

It's an interesting thought, though. I've always been fascinated by compression technology. I once thought I had come up with a compression method that could reduce ANY data set by a GUARANTEED exactly 33%. Before I even sat down to try and code it, I realized that such a thing is impossible in this universe! If you could compress ANY data by 33%, all the time, then you could in turn compress the compressed data, which is itself just data, and gain another 33%. You could theoretically compress any given data down to a single bit. Obviously, this is ridiculous because a single bit can by definiton only hold 2 values!

Always be wary of any new lossless compression scheme that claims to give guaranteed space gains. Guaranteed lossless compression rates for generic data simply don't exist.

Modus
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
yeah actually I thought about the fraction problem.. it possibly won't be too hard to sit down and write this. first of all, you have to be able to figure out which bites of the data should be compressed using this method. some of these values will no doubt be alot tougher then others, and thereby will be skipped.

1/2 can be represented by .5 right? it takes 2 bits to represent that right?

1/4 can be represented by .25.. as long as the original sample is Larger then the &quot;compressed&quot; one, the program would keep the compressed version, and continue on.

however, becuase there are so many possibilities, it's impossible to cover everything (like you said, if you're going to have x/65535 , that's pointless (in fact, doesn't that take up twice the amount of bits?). I'm guessing x/32767 would probably not be good as well.

also, if your value does not take up the 32767 bit (the 16th bit), you could knock of a bit for every sample that doesn't use that. so in extremely loud sections of the piece of audio, you get the maximum compression from that alone. it's kindof like the opposite of VBR, where more bitrate is given where it's needed.

as you can see, you cannot compress everything, but every little bit helps right? it's still something to consider!
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
also, you could have a lossy version of this too.

this has the 16 bit file, with 65535 possible positions that the pressure can be in. and when you do that normal algorithm instead of having the compressed version exactly the same as the original, you could round it off, however I personally wouldn't like to do that (hey what I've said is enough!)

BTW, I'm not in programming, I barely remember any of my Binary, so please tell me when I'm wrong!

the reason I posted this here, is so that it could stand the test of time! and to see if it was already done..

and of course, in the unlikely scenario that it turns out as an original idea, it's MY idea! so you can't have it! :p

actually I don't know what I'd do if it turned out to be a unique thing...
 

HaVoC

Platinum Member
Oct 10, 1999
2,223
0
0
Interesting concept, Soccerman. I think I see what you are saying. I think you would like to just store a perfect 1/2 value as simply a half. I suspect in reality digital audio picks up so much noise from the environment that the possility of 2^8 out of 2^16 occurs VERY infrequently in the actual wav data stream. So, your method would in reality probably not compress much since few sample values are in the form of 2^n where n is 2 4 8 and 16.

Absolutely devoid of sound means a perfect 0 sample which many compression schemes probably already eliminate using RLE.

Now, here's another idea I've pondered. What if you took that 16-bit raw PCM data and ENCODED IT INTO A PICTURE. Like every sample becomes a pixel and those samples are arranged into a linear array say 640x480 at a 16bit color depth. Then apply BITMAP compression techniques to the resulting file and see what happens. Maybe patterns could more easily be detected? Maybe JPEG could compress it fairly well? Arrange each picture (which is some segment of sound) into a video and use MPEG2 on it? Or even MPEG4 which is the pretty much the most advanced video compression standard out there now.

I think MP3 uses several compression algorithms. The main compression technique is based on a PSYCHOACOUSTIC MODEL that makes assumptions about what the human ear can and can't hear. Many encoders cut off the higher frequencies past 16Khz to 20Khz on the assumption that most people can't hear them. Also there is an effect called frequency masking that is caused by a louder frequency masking a tone at a nearby frequency which is much quieter. MP3 discards the quiet one. Different bitrates affect how much information is kept and how much is discarded. This is my understanding of how it works and others are free to correct or append.
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
ahh yes, you are correct, 1/2 probably by itself won't accomplish much.

like I said in my above post, you have more then 1/2.. 1/3, 1/4 1/5 1/6 etc until this value becomes equal in size to the original, in which case you just use the original. all this combined compression can be quite helpful (I can't come up with any numbers as to HOW helpful it would be).

and finally, the more complicated the sound file, the better, becuase the wave will pass through a value (such as 1/2) more often. high frequency noise would compress more so then songs with lots of bass however.

yeah I understand that hidden sound removal concept..

interesting idea about the picture, can you elaborate more, so I can get a more accurate idea on whether or not it would work?
 

HaVoC

Platinum Member
Oct 10, 1999
2,223
0
0
Let's say you have a sample file that is really tiny...just four samples of 16bit 44.1 Khz audio.

They are 0, 512, 1024, 2048. (Small sound values but I'd like to keep the numbers small for simplicity)

You could arrange that as a tiny 2x2 bitmap on a 16bit grayscale color depth. (one color with 16 bits per pixel gives 2^16 possible grays)

It would look like this: ('%' = pixel)
% %
% %
with grayscale values of:

0 512
1024 2048

Then use JPG compression on that file. You would arrange the next samples as another bitmap frame. You'd have to make a decision on how many samples go into each picture. Basically you adjust the number of samples in a picture to give you a good picture size (say 320x240 or 620x480) and just add more and more frames to create a &quot;video of sound&quot; See? Video MPG compression is pretty darn good at elinating spatial and temporal redundancy. I guess you could add 3 color channels (by allocating each 16-bit sample into 5-6-5 RGB pixels. so that the resulting picture has colors instead of grayscale. I wonder how the resulting file would sound when replayed?

I'm sure this idea has been posited before. I'd like to see someone code a program to implement this algorithm in some form.
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
interesting.. I'm sorry I don't quite understand exactly what you mean, but I think I get the basic meaning..

so does anyone have anything to say about my compression technique?
 

HaVoC

Platinum Member
Oct 10, 1999
2,223
0
0
Hmm...I don't think that there will be that many perfect 1/2, 1/4, 1/16, etc...sample. Might be an interesting experiment to see what kind of numbers are present on average in a typical wave file. I have a feeling it is a lot of integers that don't form reducable fractions with 65536.
 

Becks2k

Senior member
Oct 2, 2000
391
0
0
Wouldn't have to reduce PERFECTLY, but pretty close.

Take any fraction multiply by 2^16 your gonna get a number. Now all you have to do is round it to nearest interger, .5 or greater you round up, less than .5 you go down. So every fraction would work. However they won't all be less than 16bits.

Hmm I dunno how right that is i'm thinking bout this now

Hmm... this could would pretty well. I have no idea how this stuff actulaly works i'm 18 and havn't done anything so excuse me if i show my ignorance....

Music is just a series of 16bit numbers... so you could make it so that when a 15bit or less number came up it would automatically divide 65536 by it.
0 0 <--this oen would be special :p
1 65536
2 32768
3 21845
4 16384
5 13107
6 10922
100 655
300 218
500 131
501 131

Now as the numbers get bigger they all don't work, as 500 and 501 gives the same number. All numbers up to 512 would take 9 or less bits, more than 50% smaller avg. However if every one of those 512 give a differnt number, they would only account for 512/2^16 = .8% of the numbers. So .8% of the wav file would be compressed 50% which is nothing. :(
 

Soccerman

Elite Member
Oct 9, 1999
6,378
0
0
&quot;So .8% of the wav file would be compressed 50% which is nothing.&quot;

heh well hold on a second with that statement. first of all, a WAV file with 16 bit full stereo 44100hz requires ~170 kiloBYTES/second. everything helps!

second of all, I personally would LOVE to see how well this runs on a normal audio file. I mean, the lossless form of this, with the reduction of bit counts (using 15 bits when the 16th bit isn't needed etc), not the lossy version (cause MP3 is already bad enough!).

can you run through some numbers at least to tell me in real life terms how much this would compress? compressing all the way until the resulting &quot;compressed&quot; value takes up as much space as the original..