H
    Hestur
    Back to Blog

    How to Reduce AI Hallucinations

    5 min read

    AI hallucinations — confident but incorrect outputs — are reduced through three primary techniques: retrieval-augmented generation (RAG), structured prompting with constraints, and output verification. No technique eliminates hallucinations entirely, but production systems can reduce them to under 2% of responses with the right architecture.

    AI hallucinations — confident but factually incorrect outputs — are reduced through three primary techniques: retrieval-augmented generation (RAG) to ground responses in verifiable sources, structured prompting with constraints to narrow the model’s output space, and output verification to catch errors before they reach users. No technique eliminates hallucinations entirely, but production systems achieve under 2% hallucination rates with the right architecture.

    Why LLMs hallucinate

    LLMs predict the next token based on statistical patterns from training data. When asked about something outside their training distribution — your internal data, recent events, very specific factual details — they sometimes generate plausible-sounding but incorrect content. The model doesn’t know it’s wrong; it’s doing what it was trained to do, generating coherent text.

    Hallucinations are more common when:

    • The model is asked about topics not well-represented in training data
    • The question is ambiguous or the expected answer is very specific
    • The model is asked to recall precise numbers, dates, or names
    • The temperature (randomness) setting is too high

    Technique 1: RAG (retrieval-augmented generation)

    RAG is the most effective hallucination reduction technique for domain-specific applications. Instead of asking the LLM to recall facts from training data, you retrieve the relevant documents from your knowledge base and include them in the prompt.

    Without RAG: “What are our payment terms?” → LLM guesses based on common business practices → potentially wrong

    With RAG: System retrieves your actual contract template → LLM reads it → answers accurately with source citation

    Hallucination reduction: RAG typically reduces domain-specific hallucinations by 60–80% compared to a base LLM on the same questions.

    Key implementation details:

    • Always cite the source document: “Based on [document name]…”
    • Set a confidence threshold: if retrieval score is too low, say “I don’t have reliable information on that” rather than guessing
    • Use hybrid search (keyword + semantic) to improve retrieval recall

    Technique 2: Structured prompting and constraints

    Temperature = 0. For factual, deterministic tasks, set temperature to 0. This makes the model deterministic — it always picks the highest-probability next token. Less creative, but far fewer hallucinations.

    Explicit scope constraints. Tell the model what it cannot discuss:

    You answer questions ONLY using the provided context.

    If the answer is not in the context, say exactly:

    “I don’t have reliable information about that in my knowledge base.”

    Do NOT speculate or use general knowledge.

    Structured output schemas. When you need specific data extracted, use JSON schema constraints (available in GPT-4o and Claude via structured output mode). The model fills a defined schema rather than generating free text — dramatically reducing fabricated fields.

    Chain-of-thought prompting. For complex reasoning, ask the model to show its reasoning steps before giving the answer. This surfaces when the model is uncertain and often catches hallucinations before they appear in the final answer.

    Technique 3: Output verification

    Self-consistency checking. Ask the model the same question 3–5 times with slight prompt variations. If the answers disagree, flag for human review. If they agree, confidence is higher.

    Fact verification layer. For critical facts (numbers, dates, proper nouns), run a second LLM call specifically to verify the claim against the source documents.

    Human-in-the-loop for high-stakes outputs. For medical, legal, or financial applications, route outputs that include specific claims (dosages, legal citations, dollar amounts) to a human reviewer before delivery.

    Citation validation. If your system produces citations (“as stated in document X, section 3…”), verify programmatically that the cited section actually contains the stated information.

    Architecture comparison: hallucination rates by approach

    | Approach | Typical hallucination rate | Notes |

    |—|—|—|

    | Base LLM, no grounding | 15–30% on domain questions | High risk for specific factual queries |

    | Base LLM + good prompting | 8–15% | Better but still risky |

    | RAG (basic) | 5–10% | Depends heavily on retrieval quality |

    | RAG (hybrid + reranking) | 2–5% | Production-grade for most use cases |

    | RAG + verification layer | Under 2% | Required for high-stakes applications |

    What you can’t eliminate

    Some hallucination risk always remains:

    • Retrieval misses — if the right document isn’t returned, the model may speculate
    • Ambiguous questions — where multiple interpretations are valid
    • Edge cases in reasoning — complex multi-hop reasoning chains can fail
    • Model confidence calibration — LLMs don’t always know what they don’t know

    For critical applications (medical, legal, financial), build for the residual risk: add human review for high-stakes outputs, implement explicit confidence scoring, and make the system’s uncertainty visible to users rather than hiding it.

    Practical checklist for production systems

    • [ ] RAG with hybrid search (not vector-only)
    • [ ] Reranking layer
    • [ ] Temperature = 0 for factual queries
    • [ ] Explicit out-of-scope response instruction in system prompt
    • [ ] Source citations on all factual claims
    • [ ] Evaluation set of 100+ question-answer pairs with known correct answers
    • [ ] Regular accuracy measurement against evaluation set
    • [ ] Human review workflow for low-confidence outputs

    Hestur AI builds RAG systems targeting 90–95% accuracy with explicit hallucination monitoring. Book a scoping call.

    [@portabletext/react] Unknown block type "image", specify a component for it in the `components.types` prop

    Enjoyed this article?

    Subscribe to our newsletter for more AI automation insights.

    Back to Blog