More or less as expected given the theoretical BW, but I would expect 10-15% more should be possible to achieve, to be close to 80% of theoretical BW.Text generation seems sadly slow![]()
It will be also better with recent Qwen MOE models, where you can get 20 tk/s from ddr5 2ch at 6000MT/s on CPU, with Q6.