ChatGPT's Studio Ghibli filter goes viral
It's a charming novelty. But LLMs just ate diffusion models.
I’m not sure exactly how OpenAI worked their GPT‑4o model to generate images, but the output, particularly reworking photos to look like Studio Ghibli stills, has unleashed full-on mimesis.
The combination of novelty and familiarity, along with a broadly popular style, surely helped just as Instagram was buoyed in the early days by its fun filters driving distribution on Facebook.
The fad will pass, but although some are skeptical that LLMs are the final form for AI, they do seem to have eaten diffusion models.
Some are saddened by what it’s doing to art. Should Studio Ghibli be reproduced this way? I have a few thoughts on that—which are tied to my “size of the firm” questions — and I’ll address those in the coming weeks.
i'm curious about how much GPT-4o is integrating diffusion into their model vs. using a completely different technique (LLM) in their image generation. I haven't dug into the modeling distinction. I know there are a bunch of changes for audio-in-audio-out without having text generation in the middle...
Have you looking into the modeling differences between the GPT-4o style approach and the diffusion models?