IP addresses are made up of 32 bits - divided in 4 octets.
For Class A networks, the first octet is used for network identification.
The first octet must start with a 01 (for the first two bits), so the maximum first number is 127.
This leaves you with the last three octets for host identification. This is 24 bits. The maximum number of hosts this can allow is 2^24 (minus 2, since you can't use host 0.0.0 or host 255.255.255).
For Class B networks, the first two octets are used for network identification.
The first octet must start with a 10 (for the first two bits), so the minimum is 128 and maximum is 191.
This leaves you with the last two octets for host identification. This is 16 bits. The maximum number of hosts this can allow is 2^16 (minus 2, since you can't use host 0.0 or host 255.255).
For Class C networks, the first three octets are used for network identification.
The first octet must start with a 110 (for the first three bits), so the minimum is 192 and maximum is 223.
This leaves you with the last octet for host identification. This is 8 bits. The maximum number of hosts this can allow is 2^8 (minus 2, since you can't use host 0 or host 255).
Subnetting was created because giving out entire networks is dumb. Say you have a network with 2^16 + 1 hosts. This is only one over the class B host limit, but you still have to use a class A network. A class A network can provide up to 2^24 - 2 hosts (16,777,214). You need 2^16 + 1 hosts (65,537). That's a waste of 16,711,677 possible addresses. That sucks. Subnetting fixes this.
With subnetting, you could take a class A network and borrow 7 bits from the second octet. This leaves you with (32-(8)-(7)) or 17 bits for the host identification. That allows for 2^17 - 2 hosts, or 131,070 hosts. You can see that this wastes only about 65,000 hosts, instead of 16 million. You should always make these subnets as small as you possibly can. For example, if you needed 25,000 hosts, you would know that you need 15 bits for this number of hosts (2^15 - 2 = 32,766). 2^14 - 2 would only give you 16,386 hosts, which isn't enough, so 15 is the lowest number of host bits that you need. Knowing that, you see that you have 17 bits for the network identification. This gives you a class B network (which takes 16 bits) borrowing 1 bit.
This is a very basic guide, just let me know if you need anything else.