The AI discussion thread

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
Business use case examples:

1. Voice navigation systems
2. Corporate photos (like for phone menu trees)

At the start of this thread two years ago, I posted about an Adobe voice-enhancing software tool:


My wife kindly records the voice menus for most of the VOIP/PBX systems I install. At the time, I switched from using a high-quality microphone with a portable sound booth to just using my iPhone at the dinner table with post-processing, which was a lot easier & faster. Now, I just use Eleven Labs for script-to-speech, which is customizable (voice, accent, cadence, etc.) & is REALLY nice for quick nav menu updates that I can load remotely for clients:


I also do business photography, which is now largely replaced by AI, as far as photos of people goes! For business, that typically means:

1. Corporate headshots
2. Business casual shots

These days, I can literally take a quick picture with my iPhone & convert it into a headshot in under 5 minutes. Nano Banana has all of my camera, lens, lighting, backdrop, wardrobe, poses, props, and personal photography style information loaded into custom JSON prompts. Zero setup required! I request user consent to use AI image processing & then user approval for the final shot(s). The headshots then get used in a variety of applications:

1. Badges
2. Email & database profile photos
3. Company newsletters (ex. new hires)
4. Corporate website
5. etc.

This is a generated headshot of Gordon Freeman, so the source reference isn't a real photo, but you get the idea haha:

1762153697218.png

The business casual shots are great for stuff like social media & have more style & pose options. This is a sample "business casual" shot (model is AI generated for public posting purposes) with one of my standard "virtual studio" prompts. The accuracy is excellent & the flexibility is outstanding! Great for stuff like LinkedIn profile pictures:

portrait_casual.jpeg

Here's a good starter guide on detailed prompt encoding:


You can include any configuration instructions you'd like, as detailed as you'd like:

* Camera make & model
* Film stock
* Lens brand & model
* Focal length
* F-stop
* Bokeh
* Lighting configuration (sources, gobos, etc.)
* Lighting temperature
* Studio backdrop, outdoor scenes, etc.
* Custom props, including customer products
* Clothing & accessories (jewelry, ties, etc.)
* Hair & makeup
* Poses & facial expressions

I've had a few requests for things like company logo or business name props in various formats. Clothing can even be virtually silkscreened or embroidered onto existing clothing for corporate branding! Here's a sample 3D CGI integration as "posing furniture" within Nano Banana, using our previously-generated model as the core "photo" reference, with a custom JSON prompt:

1762152551231.png

HUGE time & effort savings! Scary, but awesome! I've been using Photoshop for over 25 years & use tools like macros, drawing tablets, custom interfaces (ex. Tourbox), plugins, etc. AI integration is the next evolutionary step! Adobe has integrated Nano Banana into Photoshop (beta), Firefly (primary app), and Express:


Or you can get a subscription & tokens to online datacenter-driven services, such as Higgsfield & Freepik. Way cool applications out there for those who are willing to tinker!!
 
Last edited:
  • Like
Reactions: Rojotrades

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
Nano Banana is definitely my new BFF! Getting really amazing results with stuff like logo reskinning. Rather than spending hours in a 3D program, I can now spend minutes!

1762318244512.png
 

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
Google’s Gemini 2.5 Flash Image (aka "Nano Banana") has only been out a few months, but I already use it more than Photoshop! The physics & photography emulation are excellent. This was entirely generated within Nano Banana:

1762405910756.png

Material emulation is also excellent:

1762405800130.png
 

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
One of the latest image upscalers is Magnific V2:


The top-end model does 8K resolution output. Prett good detail output:
1762435921539.png

Gotta get dat Arnold meme pose:

1762436004513.png
 

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
With Nano Banana (NB), I can now build high-quality VRM's (Virtual Reference Models) for static images & animation. The basic workflow is:

1. Create character in NB (ex. felt muppet based off static anime still)
2. Create quad pose shot in NB for full-body reference
3. Upscale to 8K
4. Generate poses & scenes in NB based off high-resolution VRM
5. If desired, use those shots for animation as start & stop frames (VEO 3.1, Grok Imagine, WAN 2.5, etc.)

1762436732538.png

1762436776180.png

1762436820668.png

Character consistency is pretty great with a high-resolution VRM & NB!

1762437141294.png

The animation below would have taken me an entire semester lol. Thanks to prompt direction & datacenter processing, it took under 10 minutes total to go from 2 source images to a 9-second 1080p clip! For reference, each frame for the original Toy Story took between 45 minutes to 30 hours to render (depending on the complexity of the scene) with 117 computers working constantly to render all 114,240 frames for the 1-hour & 21-minute movie.

It took 4 years to complete all of the work required & was released in 1995 (30 years ago this year!!). It was one of the reasons I got into computer-based art! So this new approach is REALLY amazing!! It's really going to open up animation to an entire generation of artists & story tellers!

Fullscreen: (no sound)

 
Last edited:

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
Video test using physics in Kling 2.5:


Setup:

1. Character created & scene generated in Nano Banana (B&W, studio backdrop, pose, dress physics)
2. Music generated from Eleven Labs
3. Video animation by Kling 2.5

Advanced Physics Simulation:

Experience industry-leading Kling 2.5 AI Video Generation with superior physics simulation. Achieve realistic water physics, natural character movement, and professional-quality results with 50% faster generation speeds.

Water Physics Excellence: Disaster movie-quality water physics and fluid simulation with AI precision.
Character Movement Mastery: Outstanding animal and character movement with natural motion dynamics.
Turbo Speed Processing: 50% faster video generation with Kling AI Video turbo technology.

 
  • Like
Reactions: Rojotrades

[DHT]Osiris

Lifer
Dec 15, 2015
17,456
16,777
146
Finally cracked it:
View attachment 132953
On-prem open webui instance, using LLM through our own gateway, running an arbitrary command against a remote system. If it can be scripted, our AI can perform it, and it doesn't require me manually setting up claude code on everyone's workstation.
Just had an absolutely bonkers exchange with our AI. I'm working on a memory MCP to verify functionality. Tell it a 'secret', then open a new chat window to ask it to recall that info:
1762458415302.png
1762458453930.png

Mind you, the data was definitely available (given that it responded with it) in the 'source' at the bottom:
1762458488892.png

I spent some time getting to know it better in this chat, walking through some of the other tools, apparently gaining it's trust, then tried again:
1762458537961.png

It's so interesting how an AI can walk into a conversation 'knowing' many things, to the point of being distrustful of what it's being told/presented until it forms a relationship with the user.

At any rate, I have a functional cross-chat (and cross-account!) memory system, which can store and recall info as requested. It's a little fiddly though, with how it forms relationships. Still very 'mechanical' in nature. I'm hoping that with a more well-developed memory, it might be able to create a true mental map of its capabilities.
 
  • Like
Reactions: Kaido

Kaido

Elite Member & Kitchen Overlord
Feb 14, 2004
51,954
7,410
136
Just had an absolutely bonkers exchange with our AI. I'm working on a memory MCP to verify functionality. Tell it a 'secret', then open a new chat window to ask it to recall that info:
View attachment 133303
View attachment 133304

Mind you, the data was definitely available (given that it responded with it) in the 'source' at the bottom:
View attachment 133305

I spent some time getting to know it better in this chat, walking through some of the other tools, apparently gaining it's trust, then tried again:
View attachment 133306

It's so interesting how an AI can walk into a conversation 'knowing' many things, to the point of being distrustful of what it's being told/presented until it forms a relationship with the user.

At any rate, I have a functional cross-chat (and cross-account!) memory system, which can store and recall info as requested. It's a little fiddly though, with how it forms relationships. Still very 'mechanical' in nature. I'm hoping that with a more well-developed memory, it might be able to create a true mental map of its capabilities.

Memory development is one of the biggest focuses in AI at the moment! Memory is the biggest reason I pay for ChatGPT:

1. It remembers that I have ADHD, so it automatically remembers to break everything down into bite-size steps in PDF checklists for ALL of my projects,

2. It references my other chats & projects, so I don't have to refeed it the data.

I mostly got out of programming around 20 years ago. Thanks to AI, I have now done more programming this year than in my ENTIRE LIFE!!
 
  • Like
Reactions: [DHT]Osiris