How to Use Stop Sequences in LLMs?

When working with Large Language Models (LLMs), especially for tasks like chatbots, structured output generation, or API-based completions, stop sequences are an essential tool. They help control when the model should stop generating text—saving compute, cleaning up responses, and maintaining the structure of outputs.

In this post, we’ll explore what stop sequences are, why they matter, and how to effectively use them in your applications.

✋ What is a Stop Sequence?

A stop sequence is a string (or a list of strings) that tells the language model to halt generation when that sequence is encountered. It’s like saying: “Stop completing once you see this pattern.”

For example, in OpenAI’s API:

{
  "prompt": "Name three colors:\n1.",
  "stop": ["\n"]
}

In this case, the model might output:

Red

And stop before generating the next line.

🧠 Why Are Stop Sequences Important?

Here’s why you should use stop sequences:

✅ Prevent run-on text: Keep output limited to the desired portion (e.g., one sentence or item).
📦 Structure control: Ensure outputs match expected formatting in APIs or UIs.
🔒 Security/safety: Cut off generation before the model spills into unwanted or harmful areas.
🧹 Post-processing ease: Simplify parsing by reducing unexpected tokens.

🛠️ How to Use Stop Sequences in Practice

1. In Prompt Engineering

Let’s say you’re building a Q&A system:

{
  "prompt": "Q: What is the capital of France?\nA:",
  "stop": ["\n", "Q:"]
}

Now, the model will stop either after answering or if it tries to start answering the next question.

2. With JSON Output

Imagine you want JSON from the model:

{
  "prompt": "Return a JSON object with name and age:\n",
  "stop": ["}"]
}

Add a } as a stop token to avoid extra trailing text:

{
  "name": "Alice",
  "age": 30
}

You can append the } manually afterward if needed.

🧪 Best Practices

Use specific stop sequences: Avoid overly common characters unless you want early stops.
Escape properly: When working with newlines (\n) or special characters, ensure they're correctly formatted in your code.
Multiple sequences: Most APIs allow you to define multiple stop sequences; the model will stop on the first match.
Test with variations: The same prompt might trigger different behaviors, so test edge cases.

⚠️ Limitations

Doesn’t "cut off" instantly: The model stops after generating the stop sequence. You may need to trim it from the final result.
Tokenization matters: Stop sequences are applied on token level, not raw strings, so you might get partial matches or unexpected behavior if your stop sequence is too complex.
Not supported by all models/APIs: Always check the documentation of the LLM platform you're using.

🔚 Final Thoughts

Stop sequences are a simple but powerful way to control the output of LLMs. Whether you're generating short answers, structured data, or avoiding hallucinations, using stop sequences effectively makes your LLM application more predictable, efficient, and safe.