In Wolfram’s excellent introduction to LLMs, he discusses “temperature” and marvels at how incorporating some randomness into the process, actually makes these models work.
The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time. And, in keeping with the idea of voodoo, there’s a particular so-called “temperature” parameter that determines how often lower-ranked words will be used, and for essay generation, it turns out that a “temperature” of 0.8 seems best. (It’s worth emphasizing that there’s no “theory” being used here; it’s just a matter of what’s been found to work in practice. And for example the concept of “temperature” is there because exponential distributions familiar from statistical physics happen to be being used, but there’s no “physical” connection—at least so far as we know.)
In the spirit of engineering, here’s a table suggesting different kinds of “temperature” for different kinds of writing:
Personally, I think this approach is wrongheaded — but it may still work! Temperature was added to token prediction because it made the language seem more natural. Why have any randomness in code generation at all?
LLMs tend to perform best when prompts are clear and well-structured. Even so, language is inherently associative, so I understand how temperature might help on the input side. Still, why it's needed on the output side—especially for code—is a mystery.