Skip to main content
context poisoningcontext failureAI hallucinationprompt injectionAI context

What Is Context Poisoning?

Context poisoning is a failure mode of AI systems in which a false or misleading piece of information enters the model's context and is then treated as true for the rest of the interaction - and often beyond. Because language models reason from whatever is in their context, a single poisoned fact does not stay contained: the model builds on it, repeats it, and lets it shape every downstream answer. A small error becomes a compounding one.

The term was popularized as part of a wider taxonomy of context failures described by Drew Breunig in 2025, and it has been observed in real systems - notably a Google DeepMind report where a Pokemon-playing Gemini agent had hallucinations poison the "goals" section of its context, steering its behavior for the rest of the run. As enterprises ground AI in their own data and let agents accumulate context over long tasks, context poisoning has become one of the most consequential reliability risks - and one that data governance is well placed to prevent.

TL;DR

Context poisoning happens when a hallucinated or erroneous fact enters an AI's context and is subsequently treated as ground truth, compounding over time. It is one of four classic context-failure modes alongside context confusion, context clash, and distraction - and a contributor to overall context rot. It is dangerous because the model cannot tell a poisoned fact from a real one; only the source can. The defense is governance: grounding AI in trusted, traceable data with clear provenance rather than unverified context. Dawiso's context layer serves governed, single-source-of-truth context to agents via MCP, so what enters the context window is verified, not poisoned.

Context Poisoning Defined

Context poisoning occurs when incorrect information - a model hallucination, a stale fact, a wrong retrieved document, or a maliciously injected statement - is added to the context and then accepted as a premise for further reasoning. The defining characteristic is persistence and compounding: unlike a one-off wrong answer, a poisoned fact stays in the context and contaminates everything built on top of it. The model, reasoning faithfully from a false premise, produces confidently wrong output and may even reference the poisoned "fact" as justification.

The root problem is that a language model has no innate way to distinguish a true statement in its context from a false one. It treats the context window as given. Whatever is placed there carries the authority of fact - which is exactly why what you put in the context, and whether it can be trusted, matters more than how cleverly you prompt.

One of Four Context Failures

Context poisoning is one of four widely-cited ways a context window can fail. They often appear together and all degrade reliability.

Four Ways a Context Window Fails FOUR WAYS A CONTEXT WINDOW FAILS POISONING a false fact enters and is treated as true thereafter compounds over time DISTRACTION too much piled up model fixates on its own history CONFUSION irrelevant context sways the answer CLASH parts of context contradict each other The model cannot tell a poisoned fact from a real one - only the source can
Click to enlarge
  • Poisoning - a false fact enters and is treated as true, compounding over the interaction.
  • Distraction - so much accumulated context that the model over-weights its own history instead of reasoning freshly.
  • Confusion - irrelevant information in the context influences the answer.
  • Clash - parts of the context contradict each other, and the model reconciles them badly.

Poisoning is the most insidious of the four because it disguises itself as fact and survives across turns. Together, these failures are a major driver of context rot - the broad degradation of output quality as context grows larger and messier.

How It Happens

Context gets poisoned through several common routes:

  • Self-poisoning by hallucination. The model invents a fact, that output is written back into memory or a scratchpad, and the agent then treats its own fabrication as established truth.
  • Bad retrieval. A RAG pipeline pulls a stale, wrong, or out-of-context document, and the model accepts it as authoritative.
  • Stale or ungoverned data. The context is grounded in data with no agreed definition or freshness guarantee, so an outdated value enters as if current.
  • Prompt injection. A malicious instruction or false statement is deliberately planted in content the agent ingests, deliberately poisoning its context.

In every case the model is blameless - it reasons correctly from a false premise. The failure is upstream, in what was allowed into the context and whether its trustworthiness was ever established.

How to Prevent It

Because the model cannot verify facts itself, prevention has to happen at the source of context:

  • Ground in trusted data. Feed AI from governed, authoritative sources with agreed definitions, not unverified documents or ad hoc copies.
  • Track provenance. Keep lineage so every fact in the context can be traced to a known, trustworthy origin - and untrusted content can be flagged or excluded.
  • Validate retrieval. Check that retrieved context is current, relevant, and from an approved source before it reaches the model.
  • Isolate and refresh. Prevent unverified model output from silently re-entering long-lived context, and refresh context against the source of truth rather than letting it drift.

All of these reduce to one principle: govern what enters the context. Clever prompting cannot fix a poisoned premise - only trustworthy, traceable inputs can.

How Dawiso Helps

Dawiso prevents context poisoning by making the context AI consumes governed and traceable by default. The Context Layer connects your glossary, catalog, and lineage into a single source of truth, and serves it to any agent through the Dawiso MCP Server. So instead of grounding on whatever document a retrieval step happened to surface, an agent draws on authoritative definitions and trusted data with known provenance - and when a fact is used, its origin is traceable. Poisoning thrives on unverified context; a governed context layer is exactly the verified, single-source-of-truth foundation that starves it.

Conclusion

Context poisoning turns one wrong fact into a chain of confident, compounding errors, because a model treats everything in its context as true. It is among the most damaging context failures precisely because it hides as fact and persists across turns. The cure is not a better prompt but a trustworthy source: govern what enters the context, track where it came from, and keep it fresh. Ground your AI in a governed context layer, and the false premises that poison reasoning never make it into the window in the first place.

See it in action

Dawiso Context Layer

Ground your AI in governed, traceable context instead of whatever lands in the prompt - a single source of truth served to any agent via MCP.