Context Engineering: A Primer

As large language models move from novelties to core components of modern software, the discipline for building with them is maturing. Early terms like “prompt engineering” are proving inadequate for the task of creating sophisticated, reliable applications. A more accurate and powerful term has emerged: context engineering. This is the essential discipline for any practitioner or team serious about building with generative AI. It shifts the focus from crafting clever prompts to the systematic engineering of the entire informational environment the model uses to perform its work.

This article outlines a coherent framework for understanding and practising context engineering. You’ll learn the right mental models for working with LLMs, master the core techniques for assembling effective contexts, and gain a methodology you can apply consistently.

Thinking About Context Engineering

To build effectively with LLMs, it is helpful to adopt the right mental models. Popular analogies and metaphors like chatting or “programming in English” are seductive but often misleading and can hinder the development of robust systems by creating false expectations about how AI works.

A Better Mental Model: The LLM as a Universal Function

It is tempting to anthropomorphise LLMs — to treat them as junior assistants that can “think”, “understand”, or “get confused”. This is a fundamental mistake from an engineering perspective. An LLM does not possess beliefs or intentions; it is an intelligent text generator.

A more useful and powerful model for an engineer is to think of an LLM as a universal, non-deterministic function.

LLM = f(context) -> generation

This function takes a single input — a string of text called the context — and produces a single output: a newly generated string of text.

It is “universal” because, given the right context, it can perform a vast and open-ended range of tasks, from translation and summarisation to code generation and analysis, without being explicitly programmed for each one.
It is “non-deterministic” because the same context can produce slightly different outputs on different runs. This is an inherent feature of its probabilistic nature, not a bug to be fixed.
It is “stateless” because the function has no memory of past calls. To maintain a conversation or a multi-step workflow, the entire relevant history must be explicitly included in the context of the next call.

This functional abstraction is powerful because it cleanly separates the stable, pre-trained capabilities of the model from the dynamic inputs we control. The engineering challenge is not to change the function itself, but to harness its “API”: the context window. Everything we do to guide the model must be encoded into that single input string. This framing forces us to think in terms of systems, data pipelines, and rigorous testing, rather than trying to reason with an opaque black box.

From Prompting to Systemic Assembly

The term “prompt engineering” has become popular, but it is too narrow and often misleading. It suggests that success lies in finding a perfect, static “incantation” or a short snippet of text. This is rarely the case in production systems. This view is brittle — a prompt that works today may fail with the next task, dataset, or model update — and it fails to capture how the context often includes large amounts of dynamically retrieved data, not just static instructions.

Context engineering is a more general and inclusive term. It recognises that the work involves designing and building an entire system that assembles the context, often dynamically, from multiple sources. It is a shift from text composition to systems design.

	Prompt Engineering	Context Engineering
Primary Goal	To elicit a specific response from a model.	To build a reliable and scalable application powered by a model.
Core Activity	Crafting a static string of text, often by hand.	Designing a system that dynamically assembles a context.
Metaphor	An incantation or a command.	An API call to a universal, non-deterministic function.
Scope	The final set of instructions.	The entire pipeline: instructions, state, retrieved data, and format.
Core Skills	Creative writing, familiarity with a model’s quirks.	Systems design, data engineering, information retrieval, evaluation.

Adopting the term and mindset of context engineering moves the focus from linguistic tricks to a repeatable engineering process that can be managed, measured, and scaled.

The Practice of Context Engineering

Context engineering is where the conceptual model meets implementation. It is a practical and experimental discipline composed of a toolkit of architectural patterns and a rigorous methodology for applying them.

The Toolkit for Building Context

The various architectures and techniques used in the industry — RAG, agents, and so on — should not be seen as separate disciplines. They are all tools and strategies within the broader practice of context engineering. Their sole purpose is to get the right information into the context window, in the right format, at the right time.

Instructions: This is the directive part of the context. It might include setting a role (e.g., “You are an expert financial analyst”), providing clear, step-by-step instructions, and giving examples of the desired output format (“few-shot” examples). This is the classic domain of “prompting” and remains a key component.
State Management: In interactive applications, the context may include state-relevant information, like the history of the interaction. In chat applications, for example, this involves accumulating user and model messages over time. And because the context window is finite, it may require complementary strategies like summarising older parts of the conversation or using a sliding window to keep only the most recent messages. In other stateful applications, the context may need to include a representation of the relevant aspects of the application state. State management and representation is therefore a foundational context engineering task for any stateful application.
Retrieval-Augmented Generation (RAG): This is an important architectural pattern for building knowledge-intensive applications. RAG grounds the LLM in external, up-to-date, or proprietary information, which dramatically reduces factual errors (“hallucinations”) and allows the model to work with data it was never trained on. RAG retrieves information from data sources via search and includes it in the context.
Advanced RAG and Agentic Systems: As a natural evolution, these systems use AI to make the context-building process itself more intelligent. They are not a different discipline, but a more sophisticated form of context engineering that goes beyond simple retrieve-and-generate patterns. Agentic systems use an LLM-powered “agent” to construct the context iteratively. Instead of a single retrieval step, the agent can use a set of tools (like web search, database queries, or API calls) to reason about what information is missing and actively go and find it before assembling the final, comprehensive context for the generation step.

All these techniques are simply different answers to the same fundamental question: “How do we build the most effective context for this LLM call?”

A Methodology for an Empirical Science

Because LLMs are non-deterministic, you cannot know in advance precisely how a change to the context will affect the output. Therefore, context engineering must be an empirical discipline, driven by systematic experimentation and measurement. While intuition helps form hypotheses, only data from experiments provides ground truth.

The core skill for a context engineer is not writing clever phrases, but the ability to design experiments, measure outcomes, and methodically improve a system’s performance. A methodology for this can be described in two phases.

Plan from the End Backwards

Start with the desired result and work backwards to define the system’s requirements. This ensures every component you build serves a clear purpose.

Define the Desired Output. Be precise. What information must the final generation contain? What is its required format? What is the target tone? Write down the ideal answer.
Define the Ideal Context. Looking at the ideal answer, what information must logically be present in the context for the LLM to produce it? This includes data, instructions, user information, and retrieved knowledge. This defines the “product” your context pipeline must deliver.
Define the Assembly Pipeline. Design the system needed to produce this context. Does it require RAG? If so, what kind? Simple vector search or an agentic loop? How will state be managed?
Define the Data Sources. Identify the underlying documents, databases, or APIs the assembly pipeline needs to access.

Build from the Beginning Forwards

With a clear plan, you implement the system. The key here is to build and optimise each part of your context-assembly pipeline in isolation before connecting them. This is a standard practice in complex systems engineering and is essential here to avoid compounding failures and manage the complexity of the development process.

Build and Optimise Data Capture. Ensure you can reliably access and process the source data you need. Test this ingestion component on its own.
Build and Optimise Retrieval. Test the retrieval system independently. For a set of test queries, does it retrieve the correct information? Use metrics like precision (how much of what was retrieved is relevant?) and recall (did we retrieve all the relevant information?). Perfect this before moving on.
Build and Optimise Context Assembly. Test the logic that combines instructions, state, and retrieved data into the final context string. Is it formatted correctly? Is it token-efficient?
Test the End-to-End System. Only when each upstream component is validated should you test the full pipeline. Now, the evaluation can focus purely on the quality of the final LLM generation, knowing the context it received was correct.

This methodical, component-wise approach, driven by a continuous loop of experimentation and evaluation, is the heart of the discipline. It transforms building with LLMs from an art into an engineering science.

Conclusion

Understanding and embracing context engineering is a critical step towards growing the capacity to build truly valuable and reliable AI products. It moves the team away from ad-hoc tinkering and towards a structured, scalable discipline that can deliver dependable results.

Adopt the right mental model: Treat the LLM as a universal, non-deterministic function. Your control lever is the context you provide. This is an engineering problem, not a conversation.
Think in systems: The real work is not writing the prompt, but engineering the entire pipeline that assembles the context. All the modern architectures — RAG, agents, and beyond — are merely tools for achieving this.
Practise empirically: Your team’s core competency must be the ability to experiment, measure, and iterate methodically. Plan your systems from the goal backwards, then build and optimise them piece by piece, forwards.

The quality of an LLM-powered application is determined not just by the magic of the model or the promise of a single tool or technique, but by the quality and precision of the context. Mastering the discipline of context engineering is how you will build AI systems that work, and it is the key approach in AI engineering.

Key Takeaways

Context engineering is the discipline of designing, assembling, and optimising everything that goes into the LLM’s context window.
Treat the LLM as a universal, non-deterministic function: your only lever is the context you provide.
Move beyond prompt tinkering — focus on building robust, testable systems for context assembly.
Practise empirically: experiment, measure, and iterate to improve results.
The quality of your context determines the quality of your AI application.