Optimum cache size depends on workload and external memory performance.
From Hennesy/Patterson, the average memory hierarchy access throughput is:
p*cache_throughput+(1-p)*mem_throughput, where p is a cache hit ratio.
The same formula is used to calculate the average memory access latency.
Since the relative memory latency increase (and throughput decrease) with each
successive CPU generation, better hit ratio is needed. As a rule of thumb,
2x cache size increase decreases cache miss ratio by half. Also, more asociativity provide better hit rate.
Cache size is also dictated by the die size constraints, since bigger die tremendously increases production costs.
Bottom line: for current desktop applications 256K 8-way set-associative cache looks enough, however for future CPUs cache size requirements may well increase.