To understand the wavelet transform, you first have to understand the Fourier transform - they are very much related.  The Fourier transform uses the sine and cosine functions (orthogonal basis functions) to represent any arbitrary function you give it (this is a very glossed over explanation, but lays the groundwork for what I'll say next).  Wavelet transforms are the same thing - but instead of using sines/cosines, use some other orthogonal basis functions.  What makes them orthogonal, and what makes them basis functions, is simply a mathematical concept - you're best off to find some good wavelet resources and they will point you to well defined, orthogonal, basis functions.  Some names are Haar and Daubechies (spelling might not be right).
As for the applications - again, think Fourier transform first.  The discrete cosine transform (DCT) is basically a subset of the discrete fourier transform, and the DCT is the heart of JPEG encoding.  So one idea would be to replace the DCT with a DWT (discrete wavelet transform), and perhaps you'll get a better representation.  The FBI fingerprint database uses this concept, and there are some very good tech articles on how they do this.  Other applications include general spectral analysis, radiosity (the technique used to generate lighting simulations in Quake), and some others you can find out there on the web.
There are a good number of online resources out there, so poke around on google - that's basically where I found all this information, albeit several years ago.