This cannot be summarized well in a single post. If you want the real answers to those questions, you'd probably need a graduate degree in electrical engineering with a focus on digital communications.
Having said that, bobsmith's explanation is a good qualitative summary of the issues. In generally, a communications system works as a chain of blocks that perform specific functions. You typically have some type of data encoding (usually involves forward error correction, i.e., you transmit extra symbols so that at the receiver you can use those to correct errors in the data), a modulation scheme, after which you send the data over a channel. The channel will add some noise (what type of noise depends on the channel). On the receiving end, you'll demodulate the signal and decode the original bits from the symbols coming out of the demodulator. Note that here we're talking about what's usually called channel encoding/decoding (i.e., coding use for error correction), which is distinct from source encoding/decoding, a term which typically to compression schemes (so you have to send less data over the channel).
Now, if you know the properties of just the channel, you can actually define something called the channel capacity. I'll link the Wikipedia article here:
http://en.wikipedia.org/wiki/Channel_capacity
The idea behind this is that given certain noise characteristics of a channel, you can still transmit data with an arbitrarily small error at a non-zero rate. Now this result isn't immediately obvious if you think about it. If you have a noisy channel, it isn't immediately obvious that you can guarantee arbitrarily small errors at a non-zero rate (if you could tolerate rates approaching zero, then it'd be trivial, since you could just repeat yourself all the time, but the channel would be useless then). Thus, from a theoretical perspective, you can't beat the channel capacity. Note that this is true for any modulation/coding scheme. The idea is that given an ideal modulation/coding scheme, you could theoretically achieve the channel capacity. With modern schemes we can get pretty close.
What bobsmith is talking about is mostly modulation schemes. So when you send symbols over a channel, you have to choose some modulation scheme. A very simple modulation scheme is BPSK. The Wikipedia article for phase shift keying (PSK) is
http://en.wikipedia.org/wiki/Phase-shift_keying
If you scroll down to BPSK, you can see a constellation of two points. If you imagine a sequence of bits, you can modulate it into an analog signal by taking a sinusoid and shifting the phase by 180 degrees whenever the bit value changes from 0 to 1 or vice versa. Now this scheme is relatively robust because you only have to pick between two points. One way to think of it is if at the output you have some voltage. If that voltage is +1, then you can assume the bit is a 1. If it's -1, then the bit is 0. What can happen though is noise can get added when that modulated signal is transmitted. This means that when you demodulate, rather than just having some value x (which should be just +1 or -1), you have the value x plus some noise n, or x + n.
Now we could devise some simple decision scheme given that we have noise: if x + n > 0 we have a 1, and if x + n < 0 we have a 0. However, let's say x = -1 but n = +1.5. Since x = -1 the bit should be a 0, but since x + n = 1.5 > 0, we actually make the decision that the bit is a 1. This is an error. Very often you'll hear about something called BER, or bit error rate. Basically this is a number indicating the rate at which we have bit errors (it's often something like 1/10000 or 1/100000).
If you take another look at the BPSK constellation, think about placing a 2D Gaussian at each of the constellation points. If you're having trouble imagining a 2D Gaussian, it's something like this:
http://en.wikipedia.org/wiki/File:Gaussian_2d.png
So basically a little hill around each constellation point. This represents the ideal value received plus some noise. The more noise we add, the fatter these Gaussians are. The less noise we have, the sharper they are. Note that the more these Gaussians overlap, the higher our probability of making an error is. So you can see that noise, by kind of spreading the Gaussians out, causes more errors. Using a higher modulation scheme (such as QPSK, shown on the same Wikipedia page at BPSK) is going to allow us to send more bits per symbol (2 bits per symbol versus 1 bit per symbol for BPSK), but the constellation points are also closer. Thus, we will incur more errors as a trade-off to having a higher data rate. Take a look at 16-QAM:
http://en.wikipedia.org/wiki/File:16QAM_Gray_Coded.svg
You can see there are lots of constellation points and they are much closer than in BPSK. Again you get an increase in data rate but less robustness to noise. Now one way to combat noise is to increase the power of the signal. What this effectively does it shift all of the constellation points radially away from the origin, causing them to be spaced further. Naturally this means that the Gaussians at each point will overlap less and thus our error probability will go down.
So that covers the idea behind modulation. On top of that you have channel coding, which I mentioned previously as a form of error correction. There are many channel coding schemes. Some very popular ones include convolutional coding and Viterbi decoding, Turbo codes, and LDPC codes. Describing these would take tons of time, so just look them up on Wikipedia if you're interested.
The gist of it, though, is that you combine channel coding and modulation to achieve the best data rate you can given your channel characteristics (noise, bandwidth, etc.). This means your questions cannot be easily answered. 1 user on 2100 MHz could have infinite throughput on a noiseless channel. The maximum number of concurrent users for a communications scheme with 0 data rate is infinite. Changing frequencies doesn't necessarily have an impact on the throughput.