Audio and video (and image) lossy compression:

CSMR

Golden Member
Apr 24, 2004
1,376
2
81
Technical question which intrigues me:
Normally when video or audio is compressed lossily the compressed file still has a frame rate/sampling rate, in the case of video a resolution, and maybe a bit depth. Compressed images have a resolution also.

Why is this?

Let's assume perfect sources and perfect display/playback technology, so that inputs have infinite sampling rate, resolution and bit depth, and so do outputs.

In this hypothetical situation would you still want to use technologies that compress at a particular sampling rate and resolution?

Say the original is described by a function o: T ->X, where T is the continous time interval and X the continous thing that is represented (image, sample) at each point in time.

Why would you take finite approximations of T and X, Tf and Xf, project to of: Tf -> Xf, and then approximate of with a compression algorithm within Tf->Xf?
Why don't the best algorithms compress within T->X?
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Conversion to digital media requires the specification of a sampling rate. I'm more familiar with image processing, so I'll use that as an example. The true, real-world image is analog with a (virtually) continuous color spectrum. To record this data digitally, it must be stored using discrete variables. A still image is generally stored as arrays. An RGB image is stored as an n-by-m-by-3 array, where n and m are the resolutions in the x and y dimensions, and the 3 arises because each (x,y) pair has a red, green, and blue intensity associated with it. So, a single frame from a digital video is a discrete sampling of an analog signal that further degrades the true image by discretizing the different color channels into a finite number of bits for storage. A video is simply many of these images strung together, adding another dimension to the array.

So, I think there are a couple of ways to look at your question. The first is the most obvious (at least, to me). Converting T->X as you suggest would require complete determination of Fourier series coefficients for each point in the image, which is computationally infeasible, since each is an infinite series. To make it computationally feasible, the Fourier series could be truncated, but information is lost as a result. Compiling a Fourier time series in addition to this would add another level of computational insanity. While it could theoretically improve the quality of the film, the added storage and computation would probably outweigh any benefits.

The second is that information is lost no matter how compression is achieved. I don't know much of anything about compression except that the compressed version of something must always have less information than the uncompressed, otherwise it would require the same number of bits for storage. That is, unless one can find a superior storage method, but I don't think this would necessarily be considered compression.

Finally, I think the reason current methods have been so successful is because of the limitations of the human senses. The eye can only perceive so many frames per second, above which the image appears as continuous. Increasing the framerate has some effect on the perceived image quality, but the effect is generally minimal since the brain achieves everything I described in the second paragraph in real time (more or less).
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
The second is that information is lost no matter how compression is achieved. I don't know much of anything about compression except that the compressed version of something must always have less information than the uncompressed, otherwise it would require the same number of bits for storage. That is, unless one can find a superior storage method, but I don't think this would necessarily be considered compression.

Excellent explanation. Just wanted to point out that lossless compression is possible, and many early compression methods for files were lossless. .ZIP (LZW, I think) compression is lossless, obviously, since all the bits have to come back. Lossless compression relies on building dictionaries of repeating patterns (strings) that can then be stored once and indexed. Unfortunately it starts to fail miserably on data that is very unordered, like a frame of video, or an outdoor snapshot. It works well enough on images with blocks of homogenous color, but even in images where color looks solid, it usually isn't. Thus the invention of lossy methods that trade detail against data size.
 

CSMR

Golden Member
Apr 24, 2004
1,376
2
81
Thanks, I understand this, but I think my question remains:

Take video for example. That this is often represented as a collection of cubes in X*Y*T, each cube having a particular colour. Not always - for example vector graphics represents shapes in X*Y that are not squares. But usually.

In a high fidelity source these cubes will be extremely small.

When compressed the compressed file will often represent larger cubes, or else the same size cubes with distorted colours.

Why should the compressed file be in the form of cubes at all?

Perhaps someone here has studied the theory of compression and the metrics used?
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
Originally posted by: CSMR
Why should the compressed file be in the form of cubes at all?

Perhaps someone here has studied the theory of compression and the metrics used?

They may not be in the form of cubes.

If we take the example of image compression using the JPEG algorithm, the compressed representation is the spatial frequency components of the image as generated by the discrete cosine transformation - hence a single byte, or data value, in the compressed file represents a pattern of ripples spanning an area of the image.

In the case of JPEG2000, a wavelet decomposition algorithm is used, where each value in the compressed file represents a specific wavelet at a specific region of the image.

Same thing with MPEG audio - the samples are broken into the frequency spectrum, and the frequency representation is what is stored.

In the case of all these algorithms the frequency data is processed and those values that are least perceptible (based upon physiological and psychological research) are removed.

Because the representations of these compressed files are in the frequency domain, as opposed to the spatial or time domain, they could theoretically be regenerated at an arbitrary resolution during reconstruction. In practice, it's easier to reconstruct at the native resolution and then resample as required.



 

hellokeith

Golden Member
Nov 12, 2004
1,664
0
0
For example, a 1920 x 1080 (progressive) frame could have array coordinates of [0,0,0,T0] to [1919,1079,2,T1] with the RGB values stored in each array element and T0 --> T1 being the frame number (or timestamp).

Now you could rearrange that however you like, such as lining up all the reds first, then greens, then blues. Or use some alternating method of frame array storage, so long as the process is lossless and you know how to retrieve the data. But, typically the purpose of video is for real-time display. As such, it becomes necessary to be able to quickly retrieve any particular frame. So other storage methods, while they might theoretically make it easier to do some type of analysis or processing, would eventually always have to be converted back to real-time frame retrieval.