When to Use RAG Instead of Putting the Whole Document in the Prompt

Written by Madalina Turlea
15 Jan 2026
Imagine an expense agent like the one Ramp built. A company uploads its expense policy, and employees text the AI to ask whether something can be expensed. The agent answers based on that company's policy. One way to build this is to put the whole policy document into the prompt every time. The question is when that stops being a good idea, and you should switch to RAG, which sends only the relevant part of the policy instead.
The case for keeping the whole policy in the prompt
For a single, reasonably sized policy, putting the whole thing in the prompt is simple and it works. The model has everything in front of it and can answer questions about any part of the policy.
When it makes sense to switch
Two situations push you toward RAG.
The first is when you have multiple policies for different parts of the company. Each department or role can have its own rules. Sales teams, for example, tend to have more expenses, someone might get a company car, and so on. Once you have many separate policies, sending all of them every time becomes wasteful.
The second is when the policy simply becomes too big, like in a large corporation with different rules for different scenarios. At that point it can make sense to send only the relevant section rather than the entire document.
The cost reason underneath it
The reason this matters is input tokens. When you put the whole policy in the prompt, you send all of it to the model on every single request. That uses a lot of input tokens, and input tokens cost money. A RAG approach sends only the relevant part, so each request carries less.
A reasonable way to decide is to run it first with the whole policy and look at the input token usage. If you see that the policy is large enough that you are sending a lot of input tokens every time, that is your signal that pulling out only the relevant section could be worth building.
So the rule of thumb is straightforward: keep it simple while the policy is small and singular, and move to RAG when you have many policies or one policy that has grown too big to keep sending in full.
You might also like

Start With the Prompt, Not RAG: Giving an AI Feature Access to Your Knowledge
The instinct is to reach for RAG. The advice that works is the opposite: start by stuffing the important knowledge into the prompt, and only add complexity when the volume actually demands it.

Tokens, Explained: What You Are Actually Paying For When You Build With LLMs
Tokens are the thing you pay for and the only number you can rely on. What a token actually is, why the same prompt costs different amounts across models, and the hidden reasoning tokens.

Building an Expense-Policy AI Agent: What We Learned Reverse-Engineering Ramp
Three prompts, six models, 126 runs. What we learned trying to reverse-engineer Ramp's expense-policy AI agent — including why a more detailed prompt actually made GPT-5 worse.