• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

The AI discussion thread

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
So, you assume that whatever a PhD in astrophysics is true. No, that wasn't you, but I have to think that since physics is in chaos astrophysics can't be a settled realm at this point. Astrophysics uses the laws of physics. Since those are all in doubt, so is astrophysics.
Eh... macrophysics and microphysics are two different realms. We can calculate the age, size, density, energy, distribution, and makeup of the universe, galactic clusters, galaxies, star systems, and stellar bodies within a few percentage points. They all work within a realm of physics that is relatively simple and quite well understood.

Subatomic physics is another realm entirely, and is frankly voodoo bullshit.
 
wired article, not in depth though



DeepSeek had to come up with more efficient methods to train its models. “They optimized their model architecture using a battery of engineering tricks—custom communication schemes between chips, reducing the size of fields to save memory, and innovative use of the mix-of-models approach,” says Wendy Chang, a software engineer turned policy analyst at the Mercator Institute for China Studies. “Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat.”

DeepSeek has also made significant progress on Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train. In fact, DeepSeek's latest model is so efficient that it required one-tenth the computing power of Meta's comparable Llama 3.1 model to train, according to the research institution Epoch AI.
 
DeepSeek is interesting. if all their claims pan out this could definitely kill OpenAI. The capital intensive AI models of OpenAI are not sustainable IMO. Will have to see how OpenAI responds. DeepSeek is opensource, so possibly OpenAI could use these innovations to improve their own models.
 
Very interesting and prompted the biggest loss of value of a company ever, nVidia....



Even though China's Deepseek is of course biased against Tianmen and Taiwan, you can also see the same kind of bias in ChatGPT and western AI towards other matters
 
Very interesting and prompted the biggest loss of value of a company ever, nVidia....



Even though China's Deepseek is of course biased against Tianmen and Taiwan, you can also see the same kind of bias in ChatGPT and western AI towards other matters
Such as?
 
Very interesting and prompted the biggest loss of value of a company ever, nVidia....



Even though China's Deepseek is of course biased against Tianmen and Taiwan, you can also see the same kind of bias in ChatGPT and western AI towards other matters
But isn't that just the data it is trained on? If the model is open source then anyone can use it to train on other data and build a less China biased model?
 
I admit knowing of those things indirectly from reading reddit, but googling chatgpt censoring gives enough results. Problem is its evolving and things it didn't censored before now are. Breastfeeding, political things, etc. The new Trump dynasty may even promt some changes to the responses allowed towards him, it's family, actions. I wouldn't put it past them especially since he loves authoritarianism so much
 
I admit knowing of those things indirectly from reading reddit, but googling chatgpt censoring gives enough results. Problem is its evolving and things it didn't censored before now are. Breastfeeding, political things, etc. The new Trump dynasty may even promt some changes to the responses allowed towards him, it's family, actions. I wouldn't put it past them especially since he loves authoritarianism so much
Like J6
 
No current publicly accessible AI models (as far as I'm aware of) have access to live data. They're all working off a dataset that's x months or years old. It might be able to tell you the current day or maybe weather if it has hooks in it, but ask it what the most recent sub variant of COVID is and it'll give you a rough idea of how old the data it was trained on is.
 
Are any of the AI video generators not complete shit? I totally understand it’s about the prompts they all do multiple things wrong such as:
Wildly vary based upon the prompt
Need multiple requests to get it moderately good
Conceal their pricing
Hide their pricing as in how many images/videos can be made with 20 credits
Are constantly “busy” during the free trial
Need apps to function and those apps tend to be made by someone else and their review scores swing quite a lot.
 
But isn't that just the data it is trained on? If the model is open source then anyone can use it to train on other data and build a less China biased model?
The model weights are open and free. The model arch is open and free so open source back ends can and have implemented the arch, so the model weights can be run on anyone's hardware.

The training data is not free and open. You need trillions of tokens to process during training on 10's of thousands of gpu's over periods of weeks to months.

You can fine tune the publicly available weights to remove any blindspots or censorships. This is done commonly enough since some folks like to run AI's locally that are good at generating smut reading material according to the owners fetishes 🙂
 

lol, lmao

this is part of the reason isn't it?

https://stratechery.com/2025/deepseek-faq/

more detailed

Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.

Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
 
Last edited:
Back
Top