I agree, an XX90 series GPU is pricey AF. Fee are going to afford the up-front on that.
Yes, you can factor larger models and some are amenable to splitting up among multiple video cards, but then you are getting into the weeds for 90%+ of the audience and are doing a configuration that is highly customized. This stuff is targeted at the crowd that buys dell laptops in bulk and might have a small team that gets workstation class gear. It would be one thing to order as bunch of Optiplex dekstops with a high end GPU in them to run an LLM on, but if you have to do that in an organization that spans thousands of computers, that outlay is horrendous. Better to scale a "service" that you pay for that is FAR less expensive than the people that it replaces.
My point remains that they all knew what they were doing: buy up as much as you can so that no one else can, including competitors and your own customers. If there's one gold mine in the world, and only a handful of shovel manufacturers, if you want to control supply, you buy up all the shovels that you can and book all the production of the shovel manufacturers. It takes time to build another shovel factory, so, for a long time, only you will be able to mine most of the gold. Yes, a few may use spades for small flakes of it here and there, but you control the vast majority.
These companies aren't stupid. This was always a known second order effect.
They can buy up all the memory as long as they want to. I'm fine with that, it's thier choice. But eventually prices will come back, there will be oversupply, and the prices will really drop. This won't take as long as you might think either. We've been through these cycles before, like when Windows 95 came out and it required double the memory of 3.1 to run properly.
Yes, the big online models are very good and as I said I'm paying $20/month currently but if the price goes up there is certainly a point where I tap out. Or if the local models get better or GPU's come down in price I'll be out. The online AI game will be a loser in my opinion as we, the users, always figure a way around it as the amount of hardware you can get for your dollar gets cheaper, and it always does.
As I wrote earlier the smaller models are more to the point with answers and generally provide less slop. My initial "OMG, this is amazing love affair with AI" has long since passed. It constantly repeats itself (even long context online models), states the obvious, and misses obvious points. It has not motivation to be right. It's really a glorified calculator unless you are are low level person and need it to figure out what percent of 32 is 72 or something ridiculous like that.
30 years ago they said we'd never be able to edit MPEG-2 video on our computers without dedicated hardware. We were doing it 2 years later. Then I had huge discussions with people saying that tape will "never go away" when it comes to video recording. 3 years later... gone. Trust me, I've been around for too long We'll be running really really large models locally soon enough. Jeez with two 4090's you could almost run chatgpt. As I said inference is really easy to running the thing backward for training. They are keeping real AI processors out of our hands just for this reason. Sell a 96GB 5060 for $1000 and the game is up. It's not the compute that is the problem (for a user or 3) but having all of the parameters "at your fingertips" and a decent amount left over for context.
The Holy Grail for vendors it continuous "pay to play." They love it, we hate it and we continually vote it down. I will literally run my forever version of Photoshop Elements.... forever! It's how I say no to subscription models. Now that we're 64 bit and will be there for a long time, these apps will run in compatibility mode for a long, long time.
Now don't get me wrong subscription models have their place. You use an application every day for work and need instant updates instant support, etc.. it can be worth it.