Veo 3 does not solve the character consistency problem
Yes, I spoke to soon earlier. Check out the trainwreck.
After struggling with complicated pipelines and Leonardo AI, I was thrilled when Veo 3 offered the functionality I needed out of the box. I declared “Veo 3 (probably) solves the character consistency problem”. I spoke to soon.
Today, I tried to generate Chad’s interlocutor, Willow, using the same prompts that gave me such good results earlier. The output was disappointing.
The first prompt was copied word for word, I just substituted a description of Chad with one of Willow, and it generated this:
Not only is she rendered as an Amazon from a super hero comic, but the camera itself is too close to begin with and zooms in as she goes through the poses. The art style is different from Chad’s and they would look mismatched side by side.
I updated the prompt to make the dress more appropriate, the model less Amazonian, and the frame to be still and show me a full figure. The hair and clothing improved, but it chose to render it in a strange cel-shading style which reminded me or Borderlands. The camera didn’t improve either.
I had one more try, so I encouraged a studio Ghibli look and gave very explicit camera instructions. This time the art style was OK, but background had just become worse, and the camera was too zoomed in (at least it was still).
I could probably salvage the final output my having an LLM generate the legs, but the amount of human input needed remains unacceptably high, and basic instructions like “I need the whole figure” aren’t followed reliably.
For my next attempt, I’ll see if generating both figures in a single run might help. It would force a consistency in art styles and rendering. Even the best Willow attempt has different lighting than Chad, and I fear they would not look right in the same frame.