When to Use RAG Instead of Putting the Whole Document in the Prompt

By Madalina Turlea·
When to Use RAG Instead of Putting the Whole Document in the Prompt

Written by Madalina Turlea

15 Jan 2026

Imagine an expense agent like the one Ramp built. A company uploads its expense policy, and employees text the AI to ask whether something can be expensed. The agent answers based on that company's policy. One way to build this is to put the whole policy document into the prompt every time. The question is when that stops being a good idea, and you should switch to RAG, which sends only the relevant part of the policy instead.

The case for keeping the whole policy in the prompt

For a single, reasonably sized policy, putting the whole thing in the prompt is simple and it works. The model has everything in front of it and can answer questions about any part of the policy.

When it makes sense to switch

Two situations push you toward RAG.

The first is when you have multiple policies for different parts of the company. Each department or role can have its own rules. Sales teams, for example, tend to have more expenses, someone might get a company car, and so on. Once you have many separate policies, sending all of them every time becomes wasteful.

The second is when the policy simply becomes too big, like in a large corporation with different rules for different scenarios. At that point it can make sense to send only the relevant section rather than the entire document.

The cost reason underneath it

The reason this matters is input tokens. When you put the whole policy in the prompt, you send all of it to the model on every single request. That uses a lot of input tokens, and input tokens cost money. A RAG approach sends only the relevant part, so each request carries less.

A reasonable way to decide is to run it first with the whole policy and look at the input token usage. If you see that the policy is large enough that you are sending a lot of input tokens every time, that is your signal that pulling out only the relevant section could be worth building.

So the rule of thumb is straightforward: keep it simple while the policy is small and singular, and move to RAG when you have many policies or one policy that has grown too big to keep sending in full.