Beginner: Understanding Text Generation
Last updated
Was this helpful?
Last updated
Was this helpful?
Large Language Models (LLMs) generate text by predicting the next word in a sequence. In the example below, we're using a Text Input Module and a Fill Mask Module to demonstrate how an LLM guesses the missing word in a sentence.
For instance, when prompted with a sentence like "The sky is [MASK]," the model predicts the top 5 most likely words to fill the blank: "clear," "blue," "dark," "gray," and "black." These predictions are ranked by likelihood, with "clear" being the most probable, forming the sentence "The sky is clear."
If we take this predicted word and feed it back into the model to generate the next word, we can create a longer, somewhat coherent sentence like: "The sky is clear now, too bright outside here."
However, if we choose the least likely word from the model's predictions, we end up with a sentence that quickly becomes nonsensical. This demonstrates how the selection of less probable words can lead to a breakdown in sentence structure.
The randomness of word selection can be adjusted using a parameter called temperature. Temperature controls how likely the model is to choose less probable words. At higher temperatures, the output becomes more random, making all words nearly equally likely to be selected. Different models may behave differently, but on standard tasks like this, their performance is often comparable.