The jagged edge of AI
After a solid two days of development, here's what works and what does not
I’ve been heads down doing development work using various gen AI tools for the last few days, and have a better fingerspitzengefühl of what works and what does not.
For coding, Gemini and Cursor really are excellent. The ability to load documentation into context means they do well at languages and formats that lack training data. And all models do well with text where association is good enough.
On the topic of associative text, while the line-by-line is fine, they struggle with structure, narrative beats, and style. They do better if you signpost, spell out narrative beats explicitly, and tend to be a little rigid or ridiculous if you push them for any kind of voice. The great stylists are safe (for now).
Image generation is a different story. While individual images look good, character consistency is a problem. None of the tools have a good way of adjusting an individual person. I think photoshop illustrations can be handled well, but something like a comic book, where you use the same character sequentially, cannot. Even backgrounds, which you think would be easy, are quite hard. The model has no world concept, so if you turn around everything changes.
The difficulties images have mirrors the struggles with text. Fundamentally these are associative models and the specific articulation and dress of a body imposes axiomatic constraints just as mathematics does on symbols.
If anyone has ideas on how to solve character consistency with images, please drop me a line!