Discussion about this post

User's avatar
Chris Anderson's avatar

i'm curious about how much GPT-4o is integrating diffusion into their model vs. using a completely different technique (LLM) in their image generation. I haven't dug into the modeling distinction. I know there are a bunch of changes for audio-in-audio-out without having text generation in the middle...

Have you looking into the modeling differences between the GPT-4o style approach and the diffusion models?

Expand full comment

No posts