The AI discussion thread

Kaido · Nov 2, 2025

Kaido said:
WAN ATI (Any Trajectory Instruction) for animation

The next generation of animation:

Kaido · Nov 3, 2025

Business use case examples:

1. Voice navigation systems
2. Corporate photos (like for phone menu trees)

At the start of this thread two years ago, I posted about an Adobe voice-enhancing software tool:

Enhance Speech from Adobe | Free AI filter for cleaning up spoken audio

This AI audio filter improves spoken audio to make it sound like it was recorded in a soundproofed studio.

podcast.adobe.com

My wife kindly records the voice menus for most of the VOIP/PBX systems I install. At the time, I switched from using a high-quality microphone with a portable sound booth to just using my iPhone at the dinner table with post-processing, which was a lot easier & faster. Now, I just use Eleven Labs for script-to-speech, which is customizable (voice, accent, cadence, etc.) & is REALLY nice for quick nav menu updates that I can load remotely for clients:

I also do business photography, which is now largely replaced by AI, as far as photos of people goes! For business, that typically means:

1. Corporate headshots
2. Business casual shots

These days, I can literally take a quick picture with my iPhone & convert it into a headshot in under 5 minutes. Nano Banana has all of my camera, lens, lighting, backdrop, wardrobe, poses, props, and personal photography style information loaded into custom JSON prompts. Zero setup required! I request user consent to use AI image processing & then user approval for the final shot(s). The headshots then get used in a variety of applications:

1. Badges
2. Email & database profile photos
3. Company newsletters (ex. new hires)
4. Corporate website
5. etc.

This is a generated headshot of Gordon Freeman, so the source reference isn't a real photo, but you get the idea haha:

The business casual shots are great for stuff like social media & have more style & pose options. This is a sample "business casual" shot (model is AI generated for public posting purposes) with one of my standard "virtual studio" prompts. The accuracy is excellent & the flexibility is outstanding! Great for stuff like LinkedIn profile pictures:

Here's a good starter guide on detailed prompt encoding:

How to Write JSON Prompts for Gemini Nano Banana

If you’re using Gemini Nano Banana, your results depend a lot on how you write the prompt.

medium.com

You can include any configuration instructions you'd like, as detailed as you'd like:

* Camera make & model
* Film stock
* Lens brand & model
* Focal length
* F-stop
* Bokeh
* Lighting configuration (sources, gobos, etc.)
* Lighting temperature
* Studio backdrop, outdoor scenes, etc.
* Custom props, including customer products
* Clothing & accessories (jewelry, ties, etc.)
* Hair & makeup
* Poses & facial expressions

I've had a few requests for things like company logo or business name props in various formats. Clothing can even be virtually silkscreened or embroidered onto existing clothing for corporate branding! Here's a sample 3D CGI integration as "posing furniture" within Nano Banana, using our previously-generated model as the core "photo" reference, with a custom JSON prompt:

HUGE time & effort savings! Scary, but awesome! I've been using Photoshop for over 25 years & use tools like macros, drawing tablets, custom interfaces (ex. Tourbox), plugins, etc. AI integration is the next evolutionary step! Adobe has integrated Nano Banana into Photoshop (beta), Firefly (primary app), and Express:

Gemini 3 with Nano Banana Pro — now in Firefly.

Create stunning images with Gemini 3 (Nano Banana Pro) and Gemini 2.5 Flash Image without leaving your favorite apps.

www.adobe.com

Or you can get a subscription & tokens to online datacenter-driven services, such as Higgsfield & Freepik. Way cool applications out there for those who are willing to tinker!!

Kaido · Nov 4, 2025

Nano Banana is definitely my new BFF! Getting really amazing results with stuff like logo reskinning. Rather than spending hours in a 3D program, I can now spend minutes!

KLin · Nov 5, 2025

Watch Chiropractor | Streamable

Watch "Chiropractor " on Streamable.

streamable.com

Kaido · Nov 5, 2025

Real-time interactive generated video:

https://twitter.com/x/status/1985857545630892194

Generative-view camera control with character & scene consistency:

https://twitter.com/x/status/1986174924038218087

Kaido · Nov 5, 2025

AI upscalers be crazy

ClarityAI Crystal Image Upscaler - Image Upscaler - WaveSpeedAI

Clarity AI is a high-resolution upscaler that enhances images and adds detail. You can decide how much detail you want the AI to add. Use the latest AI technology to upscale your images, suitable for landscapes, portraits, illustrations, interior design, and many more.

wavespeed.ai

https://twitter.com/x/status/1986236840563855659

https://twitter.com/x/status/1986102265208242269

Kaido · Nov 5, 2025

Winner of an AI short film contest: (>5 minutes)

https://twitter.com/x/status/1985882446814904662

Kaido · Nov 6, 2025

Google’s Gemini 2.5 Flash Image (aka "Nano Banana") has only been out a few months, but I already use it more than Photoshop! The physics & photography emulation are excellent. This was entirely generated within Nano Banana:

Material emulation is also excellent:

Kaido · Nov 6, 2025

MiniMax Hailuo 2.3:

https://www.freepik.com/blog/introducing-minimax-hailuo-2-3/

Animated my Muppet conversion from Nano Banana (no sound). This would have taken ALL DAY with my stop-motion setup!!

Kaido · Nov 6, 2025

One of the latest image upscalers is Magnific V2:

Upscaler Precision V2 - Upscale image - Freepik API

Upscales an image while adding new visual elements or details (V2). This endpoint may modify the original image content based on the prompt and inferred context.

docs.freepik.com

The top-end model does 8K resolution output. Prett good detail output:

Gotta get dat Arnold meme pose:

Kaido · Nov 6, 2025

With Nano Banana (NB), I can now build high-quality VRM's (Virtual Reference Models) for static images & animation. The basic workflow is:

1. Create character in NB (ex. felt muppet based off static anime still)
2. Create quad pose shot in NB for full-body reference
3. Upscale to 8K
4. Generate poses & scenes in NB based off high-resolution VRM
5. If desired, use those shots for animation as start & stop frames (VEO 3.1, Grok Imagine, WAN 2.5, etc.)

Character consistency is pretty great with a high-resolution VRM & NB!

The animation below would have taken me an entire semester lol. Thanks to prompt direction & datacenter processing, it took under 10 minutes total to go from 2 source images to a 9-second 1080p clip! For reference, each frame for the original Toy Story took between 45 minutes to 30 hours to render (depending on the complexity of the scene) with 117 computers working constantly to render all 114,240 frames for the 1-hour & 21-minute movie.

It took 4 years to complete all of the work required & was released in 1995 (30 years ago this year!!). It was one of the reasons I got into computer-based art! So this new approach is REALLY amazing!! It's really going to open up animation to an entire generation of artists & story tellers!

Fullscreen: (no sound)

Kaido · Nov 6, 2025

Video test using physics in Kling 2.5:

Kling 2.5 AI - Text to Video & Image to Video with Kling2.5 AI

Create stunning videos with Kling 2.5 AI - 30% cheaper & 50% faster! Professional physics, cinematic quality, character animation. Try free trial now!

kling25.ai

Setup:

1. Character created & scene generated in Nano Banana (B&W, studio backdrop, pose, dress physics)
2. Music generated from Eleven Labs
3. Video animation by Kling 2.5

Advanced Physics Simulation:

Experience industry-leading Kling 2.5 AI Video Generation with superior physics simulation. Achieve realistic water physics, natural character movement, and professional-quality results with 50% faster generation speeds.

Water Physics Excellence: Disaster movie-quality water physics and fluid simulation with AI precision.
Character Movement Mastery: Outstanding animal and character movement with natural motion dynamics.
Turbo Speed Processing: 50% faster video generation with Kling AI Video turbo technology.

[DHT]Osiris · Nov 6, 2025

[DHT]Osiris said:
Finally cracked it:
View attachment 132953
On-prem open webui instance, using LLM through our own gateway, running an arbitrary command against a remote system. If it can be scripted, our AI can perform it, and it doesn't require me manually setting up claude code on everyone's workstation.

Just had an absolutely bonkers exchange with our AI. I'm working on a memory MCP to verify functionality. Tell it a 'secret', then open a new chat window to ask it to recall that info:

Mind you, the data was definitely available (given that it responded with it) in the 'source' at the bottom:

I spent some time getting to know it better in this chat, walking through some of the other tools, apparently gaining it's trust, then tried again:

It's so interesting how an AI can walk into a conversation 'knowing' many things, to the point of being distrustful of what it's being told/presented until it forms a relationship with the user.

At any rate, I have a functional cross-chat (and cross-account!) memory system, which can store and recall info as requested. It's a little fiddly though, with how it forms relationships. Still very 'mechanical' in nature. I'm hoping that with a more well-developed memory, it might be able to create a true mental map of its capabilities.

Kaido · Nov 6, 2025

[DHT]Osiris said:
Just had an absolutely bonkers exchange with our AI. I'm working on a memory MCP to verify functionality. Tell it a 'secret', then open a new chat window to ask it to recall that info:
View attachment 133303
View attachment 133304

Mind you, the data was definitely available (given that it responded with it) in the 'source' at the bottom:
View attachment 133305

I spent some time getting to know it better in this chat, walking through some of the other tools, apparently gaining it's trust, then tried again:
View attachment 133306

It's so interesting how an AI can walk into a conversation 'knowing' many things, to the point of being distrustful of what it's being told/presented until it forms a relationship with the user.

At any rate, I have a functional cross-chat (and cross-account!) memory system, which can store and recall info as requested. It's a little fiddly though, with how it forms relationships. Still very 'mechanical' in nature. I'm hoping that with a more well-developed memory, it might be able to create a true mental map of its capabilities.

Memory development is one of the biggest focuses in AI at the moment! Memory is the biggest reason I pay for ChatGPT:

1. It remembers that I have ADHD, so it automatically remembers to break everything down into bite-size steps in PDF checklists for ALL of my projects,

2. It references my other chats & projects, so I don't have to refeed it the data.

I mostly got out of programming around 20 years ago. Thanks to AI, I have now done more programming this year than in my ENTIRE LIFE!!

Kaido · Nov 6, 2025

Kaido said:
Winner of an AI short film contest: (>5 minutes)

https://twitter.com/x/status/1985882446814904662

Uncanny valley warning but YOOOO!!

https://twitter.com/x/status/1986085046831202799

Kaido · Nov 6, 2025

Kaido said:
Real-time interactive generated video:

Generative-view camera control with character & scene consistency:

The meme potential lol:

https://twitter.com/x/status/1986456316047720503

https://twitter.com/x/status/1986545571457585584

Kaido · Nov 6, 2025

This will be INSANE for post-production video editing!!

https://twitter.com/x/status/1986405014101692828

https://twitter.com/x/status/1986512436229259754

Kaido · Nov 7, 2025

https://twitter.com/x/status/1986477693207257383

Kaido · Nov 8, 2025

Higgsfield Recast: Video character swap

* 1-click full-body replacement
* Gesture tracking
* Voice cloning
* Multi-language dubbing
* Background transformation.

https://twitter.com/x/status/1986537950268801106

https://twitter.com/x/status/1986737600577884441

Kaido · Nov 8, 2025

The future we wanted:

https://twitter.com/x/status/1987206712131269068

The future we got:

Kaido · Nov 11, 2025

Great thread on Freepik’s Spaces:

https://twitter.com/x/status/1987601416836256241

Camera Angles:

https://twitter.com/x/status/1987974601972875412

Kaido · Nov 12, 2025

Multi-modal inputs & outputs:

https://twitter.com/x/status/1988125627384521093

Kaido · Nov 12, 2025

Dang, Grok Imagine is getting pretty good!

https://twitter.com/x/status/1988419725379158367

https://twitter.com/x/status/1987984650128785796

Kaido · Nov 12, 2025

Higgsfield Recast:

https://twitter.com/x/status/1987917691831857290

Kaido · Nov 12, 2025

Grok Imagine

The AI discussion thread

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Lifer

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Lifer

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord

Elite Member & Kitchen Overlord