OK..starting with the basics:
The subnet mask defines the boundry of what's the "network address" and the "host adddress." When the boundry is moved into the host address (adding to the network portion of the mask), the bitss added are the "subnet mask." If you started with a class B address (like 172.16.1.0) with a "natural mask" of 255.255.0.0, you'd get 16K addresses on the host side. If you move the network addressing boundry one octet into the host address space (255.255.255.0), you get 255 potential networks of potentially 255 potential host addresses. BY CONVENTION, you lose a couple addresses (usually the "0" and the ".255" for network ID and broadcast addresses). The third octet is the subnet address portion of the address.
If your entire network used this address and mask, there'd be no problems.
If you have some point-to-point links, it's a waste of addresses to assign an entire subnet for just two endpoint interfaces. To conserve addresses, many organizations will assign a subnet mask that defines a subnet of ONLY two addresses....30 bits (255.255.255.252) for use with point-to-point links. That way they can cover a bunch of p2p links with waht would usually be one subnet block. Frequently, the "zero" subnet is used, since (by convention) the zero subnet is generally a throw-away.
The same organization may have a need for more than 253 devices on a subnet...so they adjust the mask for more hosts (like maybe 255.255.248.0).
Now the problem comes up that some routing protocols (like RIP v1) won't pass mask information (they only use the natural mask). What might be a host address on one subnet would be a network address on another subnet. Unless the routers understand that the mask may be different from one subnet to another, it may not route the traffic properly.
By telling the router the "variable length subnet masking" is being used, it forces the router to pay attention to the masking in effect on an interface-by-interface basis. To use VLSM, you must also either statically route (manual entry) or use a "mask aware" routing protocol, like RIPv2 or OSPF so the adjacent routers update their tables with the correct subnet information.
If your network is using the same mask from end-to-end, you don't need VLSM: only if the same network block is used, and the mask changes withing the block.
Using VLSM instead of using another address block (like the 192.168.X.X) allows for address summerization (supernetting) and smaller routing tables...more efficient routing / less CPU time.
This is the short story, but perhaps it's enough to give you the hint you need to lock on.
FWIW
Scott