A cache is part of the memory hierarchy of a computer. The memory hierarchy in a computer consists of "levels" of memory. The lowest levels are the smallest in size, but have the fastest access times. The higher levels are larger in size, but have slower access times. Each level is used to hold a subset of the level above it. For instance, the L1 and L2 caches are used to hold a subset of the data found in main memory.
The reason computers have caches is because the accesses to memory in a program follow what is called locality of reference. There are two types of locality - temporal and spatial. Temporal locality basically means that if you accessed a data element in memory, you are likely to access it again in the near future. Spatial locality means that if you accessed a data element in memory, you are likely to access nearby elements in the near future.
Because of this, the lowest level (and fastest) cache, the L1, can be very small in size, yet 90+% of the time, the data needed by a program will be found in the L1 cache because of the principles outlined above. If it is not found in the L1, the CPU will look for it in the L2. If it is not in the L2, it will look in the next highest level, and so on.
There are lots of parameters of caches - how large to make each entry in the cache, when to replace data in the cache, and lots of other things.