The AI discussion thread

Page 64 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,468
7,218
136
Hunyuan Image 3.0 enters the image-generation fray!



FREE & OPEN SOURCE!! Beating out Nano Banana!


September 28, 2025 — Tencent HunYuan today announced and open-sourced HunYuanImage 3.0, a native multimodal image generation model with 80B parameters. HunYuanImage 3.0 is the first open-source, industrial-grade native multimodal text-to-image model and currently the best-performing and largest open-source image generator, benchmarking against leading closed-source systems.

Users can try HunYuanImage 3.0 on the desktop version of the Tencent HunYuan website Tensor.Art (https://tensor.art) is soon to support online generation! The model will also roll out on Yuanbao. Model weights and accelerated builds are available on GitHub and Hugging Face; both enterprises and individual developers may download and use them free of charge.

HunYuanImage 3.0 brings commonsense and knowledge-based reasoning, high-accuracy semantic understanding, and refined aesthetics that produce high-fidelity, photoreal images. It can parse thousand-character prompts and render long text inside images—delivering industry-leading generation quality.

More reading on the model:


1760128654292.png

1760130281385.png

1760130295689.png

1760131518067.png

1760131992471.png

1760132772461.png

1760133683890.png

1760133638370.png

1760133663413.png

1760132150151.png
 

RnR_au

Platinum Member
Jun 6, 2021
2,689
6,140
136
Yeah the Chinese models you can run at home is capitalism's worst nightmare :)

edit:... just saw the memory requirements for this model "GPU Memory: ≥3×80GB (4×80GB recommended for better performance)"
 
Last edited:

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,468
7,218
136

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,468
7,218
136
So here's an interesting look into the future: Real-time interactive simulation

1. Multi-modal input means we can use Video-to-Video style transfer, such as AI upscalng & generative quality improvement.
2. Real-time reskinning now exists, just not at a high-quality level. This can be applied to both existing video games AND live video!

Imagine real-time AI-scaling in the future over older & low-poly games:


Decart has the base real-time rskinning technology up & running with VR support with camera pass-thru:



1760291618267.png
 
  • Wow
Reactions: igor_kavinski

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,468
7,218
136
So here's an interesting look into the future: Real-time interactive simulation

1. Multi-modal input means we can use Video-to-Video style transfer, such as AI upscalng & generative quality improvement.
2. Real-time reskinning now exists, just not at a high-quality level. This can be applied to both existing video games AND live video!

Imagine real-time AI-scaling in the future over older & low-poly games:

Decart has the base real-time rskinning technology up & running with VR support with camera pass-thru:

Sooooo many applications:

1. Live video streaming, Zoom meetings, Facetime effects, Snapchat filters, etc.
2. Post-production video processinh
3. Older game uprezzing
4. Low-poly, low-GPU-demand upscaling & reskinning
5. VR games & video pass-thru enhancement


 

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,468
7,218
136
One of my biggest interests with AI is in LIDAR, for 2 applications:

1. Self-driving cars
2. Ground-penetrating archeological mapping

Waymo & other companies are doing some neat stuff with LIDAR & AI. But what's even more fun is aerial LIDAR-mapping of hidden historical archeological structures, especially in South America. Great background story here:


The first commercial lidar sensors became available in the mid-1990s. Unlike traditional photographic sensors, airborne lidar had the unique capability to be used day or night, penetrate vegetation canopies and map underlying structures. Since then, significant improvements in technology have resulted in lidar becoming an essential exploratory tool for archaeologists worldwide.

Great book on LIDAR & South America:

The Lost City of the Monkey God: A True Story" by by Douglas Preston

Some additional reading on new discoveries over the past few years:






Data from various sites throughout South America:

Research Just Showed That The Maya Population Was Much Larger Than Experts Thought And May Have Included 16 Million People

Researchers used LiDAR scans of Maya urban centers to conclude that, when the population was at its peak circa 600-900 C.E., this civilization was much larger than experts once thought.

“This discovery has proven there was an equivalent of Rome in Amazonia,” Rostain said. “The people living in these societies weren’t semi-nomadic people lost in the rainforest looking for food. They weren’t the small tribes of the Amazon we know today. They were highly specialised people: earthmovers, engineers, farmers, fishermen, priests, chiefs or kings. It was a stratified society, a specialised society, so there is certainly something of Rome."

...

"“Using airborne laser-scanning technology (Lidar), Rostain and his colleagues discovered a long-lost network of cities extending across 300sq km in the Ecuadorean Amazon, complete with plazas, ceremonial sites, drainage canals and roads that were built 2,500 years ago and had remained hidden for thousands of years."

But LiDAR, said Estrada-Belli, “has revolutionized our ability to map.” The technology has enabled archaeologists to cover around 7,000 square kilometers as of 2019, and to recognize virtually every structure, “even small things you couldn’t see even if you were standing right in front of it—but also very large things because their size is obscured by the jungle itself.”

In just six months, archaeologists have managed to scan an area ten times larger than what five years’ worth of standard pedestrian surveys had covered. Technology can help collect data and obtain the variations, elevations, and mysteries behind the Mayan landscape, such as information about population density and cultivation practices.

In addition, the data can be presented to appeal to a wide variety of audiences; day and night views, 3-D enhancement, and thermal panoramas are just a few of the filters users can choose to help make the data come to life.

1760297000685.png

1760297134039.png

1760297301556.png

1760297202982.png

1760297278240.png

1760296569311.png

1760297343215.png

1760296901669.png

1760297628759.png

1760296798259.png


 
  • Like
Reactions: igor_kavinski