AI coding assistance discussion

igor_kavinski · Feb 22, 2025

How often do you use a coding specialized LLM?

Which LLM is your preferred one that gives the best results?

Which of your available hardware performs best?

Is any offline model almost as good as Github Copilot or ChatGPT subscription?

@manly @Red Squirrel @MS_AT I'm particularly interested in hearing more about your adventures with coding LLMs.

igor_kavinski · Feb 22, 2025

Nice token speed on the Xeon 6248R.

Red Squirrel · Feb 22, 2025

TBH I have not really played with local LLMs but it is something I kinda want to play with at some point. The issue is you need lot of GPU memory to use models as good as the ones online, or to train your own one. Also not sure if I would trust anything Chinese on my network so won't touch Deepseek but there are other open source ones out there too.

It would be fun to build one and train it from scratch, just to see how the process evolves but have not found much info on doing that. I think if one could somehow map vram to a SSD instead it would be doable, just very slow. GPUs with lots of ram are just too hard to find and if you do find one, they cost thousands.

Ken g6 · Feb 22, 2025

I bought a 4070 over the summer to try to set up something like this. I know one program you need to run an LLM locally is Ollama. (Which is for more than just LLaMa models.) It provides the local API everything else calls. I think VS Code can call into that, but I never got as far as setting that up because I found it quite useless having a BSer as an assistant. I've heard maybe some companies are working on making AI admit its ignorance more. Has anyone found a good, preferably local, model that will tell you when it doesn't know how to do something?

MS_AT · Feb 22, 2025

Red Squirrel said:
Also not sure if I would trust anything Chinese on my network so won't touch Deepseek but there are other open source ones out there too

It's probably the most open sourced model compared to competition, especially ClosedAI, ups sorry OpenAI. And while I would be wary of running Deepseek via their webpage, whatever you will download and self host is pure data.

Red Squirrel said:
The issue is you need lot of GPU memory to use models as good as the ones online

You can find guides online that list the hardware necessary to run full Deepseek 671B model (so not distills) at 8 tk/s for 6000$, you need an Epyc CPU and lots of DRAM. Still I think that is a bit of overkill, but well, possible without GPUs.

igor_kavinski said:
How often do you use a coding specialized LLM?

Few times a week as an alternative to google search.

igor_kavinski said:
Which LLM is your preferred one that gives the best results?

I use phind.com as I don't have enough time right now nor low power enough hw to setup a 24/7 solution on my own.

igor_kavinski said:
Which of your available hardware performs best?

Since it's basically a question of memBW, if the model is small enough the GPU, if the model doesn't fit in VRAM then CPU. But since CPU is more flexible I am using it most often to experiment.

igor_kavinski said:
Is any offline model almost as good as Github Copilot or ChatGPT subscription?

I don't have subscriptions so cannot compare. The free versions I had access to weren't very impressive on their own, and compared to "open" models I was playing with were not significanlty ahead. Mind you I am using the models to mostly look up information rather than to write new code as I found it to be tedious to get them to output any sensible new stuff. But they are sometimes useful for repetitive refactorings and such.

igor_kavinski · Feb 22, 2025

Ken g6 said:
I bought a 4070 over the summer to try to set up something like this. I know one program you need to run an LLM locally is Ollama.

Try LM Studio. It is super simple to use. Install, download model (Staff picks are good) and you are in business. With your 4070 and 32GB system RAM, it might be possible to get decent results from a 32B parameters model. Though if you want the satisfaction of watching your computer pretend to be intelligent, Deepseek Coder V2 Lite is fun and really quick.

@Red Squirrel Chinese models can't do anything to your PC or network if you use them just for coding suggestions. The code generated is not obfuscated or really hard to understand and so far, I haven't seen anything suspicious.

Red Squirrel · Feb 22, 2025

igor_kavinski said:
Try LM Studio. It is super simple to use. Install, download model (Staff picks are good) and you are in business. With your 4070 and 32GB system RAM, it might be possible to get decent results from a 32B parameters model. Though if you want the satisfaction of watching your computer pretend to be intelligent, Deepseek Coder V2 Lite is fun and really quick.

@Red Squirrel Chinese models can't do anything to your PC or network if you use them just for coding suggestions. The code generated is not obfuscated or really hard to understand and so far, I haven't seen anything suspicious.

I'd be more worried about anything else sneaked into the code, it could do non AI stuff as well. Although since it is open source it might be unlikely as someone would have eventually caught it. I may in fact play with it in a VM at some point, and see how far I can get without a dedicated GPU.

MS_AT · Feb 22, 2025

Red Squirrel said:
I'd be more worried about anything else sneaked into the code, it could do non AI stuff as well.

These are pure weights. There is no code to speak of. That is why you need something to run them like llama.cpp and derivatives or closed source like lmstudio (which is using llama.cpp as one of the backends iirc). Of course running anything in a VM is a good idea if you do not trust it.

Red Squirrel · Feb 22, 2025

MS_AT said:
These are pure weights. There is no code to speak of. That is why you need something to run them like llama.cpp and derivatives or closed source like lmstudio (which is using llama.cpp as one of the backends iirc). Of course running anything in a VM is a good idea if you do not trust it.

Oh I figured it also came with the engine to run it. Are these built in a standard way that any engine can run them then?

igor_kavinski · Feb 22, 2025

Yeah. gguf files. Kind of like a compressed document of trained knowledge.

igor_kavinski · Feb 22, 2025

The really crazy thing I learned is that you can create your own executable model and share it with friends and family. It just runs right away without them needing to install anything else. But on Windows it is limited to 4GB (max executable size limit) while on Linux, there is virtually no limit. Haven't delved into trying to do that though. And oh, if it's 4GB or less, it's a universal executable. Means it can run on both Windows and Linux. The same executable. I wish all software would be like that.

Ken g6 · Feb 22, 2025

igor_kavinski said:
Yeah. gguf files. Kind of like a compressed document of trained knowledge.

In the image generation space, which I've found more interesting, I've seen .safetensors, which are uncompressed, and .pt ("pickle tensors") as well. Don't take pickle tensors from strangers - it's an unsafe format.

purbeast0 · Feb 24, 2025

The only AI coding help I've ever used is when I do a google search for how to do something in python since I'm new to python and don't know the syntax for everything yet, and all it really does is save me 1 click from clicking on the first stackoverflow result.

I do webapp development and there is no AI that is going to be able to spit out any actual useful stuff for the UI side of things. It can get you started with some stuff but that would be about it. It's not going to be able to actually help you with any UX stuff and how the application flows.

igor_kavinski · Feb 24, 2025

purbeast0 said:
I do webapp development and there is no AI that is going to be able to spit out any actual useful stuff for the UI side of things. It can get you started with some stuff but that would be about it. It's not going to be able to actually help you with any UX stuff and how the application flows.

If you can give me a prompt describing something that would be hard to do for a newbie human being in a few hours, I can run it by Deepseek and see what it says.

purbeast0 · Feb 24, 2025

igor_kavinski said:
If you can give me a prompt describing something that would be hard to do for a newbie human being in a few hours, I can run it by Deepseek and see what it says.

I mean any feature that is needed and is specific to a specific web app is going to be pointless to send over to an AI model to try and do imo.

And if a newbie human is trying to learn by using AI to spit out code, they aren't going to get very far in the industry.

igor_kavinski · Feb 24, 2025

purbeast0 said:
And if a newbie human is trying to learn by using AI to spit out code, they aren't going to get very far in the industry.

I guess it's an acceptable compromise to spend a few years doing things this way as a hobby until enough experience is gained?

purbeast0 · Feb 24, 2025

igor_kavinski said:
I guess it's an acceptable compromise to spend a few years doing things this way as a hobby until enough experience is gained?

IMO you'd be better off learning traditionally and doing coding classes/demo projects versus asking AI to write code for you and just copy/pasting it.

Red Squirrel · Feb 24, 2025

I can see UI/UX being hard to do if you have no idea what you're doing while coding, but it can get you started on it. I have played with it a bit to see if it could make a UI stuff and it can, so that's a good starting point and a good way to learn different UI elements in whatever language you decide to use like how to make dialog boxes, handle buttons, etc.

It's kind of neat to be able to give it a prompt and see it spit out code that not only compiles but runs and does more or less what you want. The hard part with UI is figuring out the libraries and stuff, and it will also tell you what to install, and what flags to use when running g++ so it can save you lot of research. You still need to have a decent understanding of OOP and how to properly structure a program and what not if you want to do anything big though, but it can help you with the boilerplate code and save time with that.

I don't tend to use it a lot myself as I can see it become a crutch where I get too complacent with it, but if I'm trying to figure out the syntax for something I'll use it to give me an example. This works great with Linux commands too. Especially the more complex ones like ffmpeg. I find lot of documentation for stuff like that tend to lack enough good examples.

Ken g6 · Feb 24, 2025

purbeast0 said:
I mean any feature that is needed and is specific to a specific web app is going to be pointless to send over to an AI model to try and do imo.

And if a newbie human is trying to learn by using AI to spit out code, they aren't going to get very far in the industry.

Some say that's what's actually happening.

Young Coders Are Using AI for Everything, Giving "Blank Stares" When Asked How Programs Actually Work

Young programmers "can't actually program" because they're too reliant on AI models, writes developer Namanyay Goel.

futurism.com

purbeast0 · Feb 24, 2025

Red Squirrel said:
I can see UI/UX being hard to do if you have no idea what you're doing while coding, but it can get you started on it. I have played with it a bit to see if it could make a UI stuff and it can, so that's a good starting point and a good way to learn different UI elements in whatever language you decide to use like how to make dialog boxes, handle buttons, etc.

It's kind of neat to be able to give it a prompt and see it spit out code that not only compiles but runs and does more or less what you want. The hard part with UI is figuring out the libraries and stuff, and it will also tell you what to install, and what flags to use when running g++ so it can save you lot of research. You still need to have a decent understanding of OOP and how to properly structure a program and what not if you want to do anything big though, but it can help you with the boilerplate code and save time with that.

I don't tend to use it a lot myself as I can see it become a crutch where I get too complacent with it, but if I'm trying to figure out the syntax for something I'll use it to give me an example. This works great with Linux commands too. Especially the more complex ones like ffmpeg. I find lot of documentation for stuff like that tend to lack enough good examples.

Yah see I disagree with this being good for someone.

IMO, you'd be much better off going to Material UI's (just as an example UI library) and looking at the components you need, looking at their examples, and actually understanding it while being able to see the docs are reference. You also learn how to read docs and how to use them. It takes some skill to even understand HOW to read technical/api docs for programming languages.

That way you will be learning how to actually use them as you go along and figure things out.

The AI stuff is just going with the way everything is going in that everyone wants instant gratification. Everyone just want fast short term results, instead of getting slower longer term results which are better off for you in the end..

purbeast0 · Feb 24, 2025

Ken g6 said:
Some say that's what's actually happening.

Young Coders Are Using AI for Everything, Giving "Blank Stares" When Asked How Programs Actually Work

Young programmers "can't actually program" because they're too reliant on AI models, writes developer Namanyay Goel.

futurism.com

LOL this is funny that you posted literally as I was typing about how in the present, everyone wants instant gratification and the easy way to get things done short term instead of doing it a way that will take longer but also be more beneficial to you in the end.

And yah I'd definitely say it's somewhat of a generational thing, as with these young coders that link refers to.

igor_kavinski · Feb 27, 2025

I decided to mess around with LM Studio on my 64C/128T Zen 2 server, just to see if having this many cores would help with performance. Also have a 1080 Ti installed to keep it company. Here's what I found:

LM Studio has a hard limit of 32 physical cores (it ignores the virtual threads). If you have 16C/32T, you only get to use 16 threads for calculations.

Decided to try a "preview" BF16 model, 65GB in size. FUSEO1-Deepseek-Qwen_33B_something. BF16 coz it's supposed to be accurate. The GPU did not like that format. Slowed down to a crawl (several seconds for just one token). Had to take it out of the equation. CPU cores didn't mind it as much but still the speed was hardly 2 tokens per second. It "thought" and showed its reasoning on how it arrived at the final solution, after 57 minutes 6 seconds. Don't have a compiler installed so don't know how good the solution is. Need to test this model with a better GPU to see if this can be sped up.

Went scurrying back to Deepseek V2 Coder Lite at "only" 16GB size and 8-bit quantization. Problem with this model is that it throws a lot of assumptions in the user's face even when you give it documentation to work with. This time, the GPU seemed a lot happier at around 11 tokens per second but the solution was crap. Even though the CPU was supposed to share the workload, it didn't and remained mostly idle while the GPU thought at +95% utilization.

Tried one more 33B Q8 model, Everyone_coder_V2, 35GB in size. Supposed to be a mixture of three different models that work collaboratively. GPU again couldn't handle it. Turned to CPU and it's working at least three times faster but still at something like 1 token per second speed. It tells me that what I'm asking it to do is impossible since the Windows API and Standard C++ library does not provide any function to get the frequencies of all CPU cores at once. Needs a better and more positive attitude for it to be useful

Gonna need to check AnythingLLM and see if it can use the CPU better than LM Studio.

MS_AT · Feb 27, 2025

igor_kavinski said:
LM Studio has a hard limit of 32 physical cores (it ignores the virtual threads). If you have 16C/32T, you only get to use 16 threads for calculations.

Well, you need to make sure you are interleaving numa nodes for best mem BW. Then check what MemBW you can achieve, divide by the model size, if you are near the tk/s you get, then optimizing compute is pointless. From your results we can estimate that the MemBW you have is ~144GB/s.

For a inference of batch size of 1 (just one client) 99% of the time you are limited by MemBW and not compute. It's a bit different if you are using MoE models, as only one expert is supposed to be active at a time, so MemBW needed is smaller than the total model size.

And generally for GEMM compute (matrix multiplication) in well behaved code using one thread per physical core is optimal, as one thread can saturate the backend.

igor_kavinski · Feb 27, 2025

MS_AT said:
Well, you need to make sure you are interleaving numa nodes for best mem BW.

Thanks! I had set it at NPS4. Need to redo the testing with NPS1 (NPS0 is supposed to be for dual socket servers).

igor_kavinski · Feb 27, 2025

Helped a lot!

Latency with NPS1 went down to 102ns from 128ns. Token speed on the slow 33B Everyone_coder_V2 model increased from barely 1 tokens to roughly 2.8 tokens per second!

AI coding assistance discussion

Lifer

Lifer

No Lifer

Programming Moderator, Elite Member

Senior member

Lifer

No Lifer

Senior member

No Lifer

Lifer

Lifer

Programming Moderator, Elite Member

No Lifer

Lifer

No Lifer

Lifer

No Lifer

No Lifer

Programming Moderator, Elite Member

No Lifer

No Lifer

Lifer

Senior member

Lifer

Lifer