Tokens, Explained: What You Are Actually Paying For When You Build With LLMs

By Madalina Turlea·15 Jan 2026

Written by Madalina Turlea

15 Jan 2026

Large language models work on tokens. When you build with them, tokens are the thing you pay for and the only number you can really rely on, so it is worth understanding what they are.

What a token is

Originally, tokens meant characters or words. Over time, different models started using different definitions of what a token is and how many tokens a word has.

A token is a group of characters, grouped by how probable that set of characters is. Take a URL as an example. OpenAI's tokenizer sees it as ten tokens. It tries to group characters by the highest probability, so something like "https" together is high probability and becomes one token, and the rest gets separated depending on whether it is a common sequence or not.

Each model has its own token definition, and they are not the same.

Why the same prompt costs different amounts on different models

Because each model counts tokens its own way, the exact same prompt can record different input token numbers across models. The text is identical, but the count is not.

Language matters too. English tends to be very efficient, because a lot of the data the models were trained on is in English, so there is high probability for many words and parts of words. For very specific languages, or non-Latin ones, tokens are more expensive, because those sequences are less common.

Output tokens always depend on the result.

The hidden tokens

Output tokens are not only the text you see in the answer. They also include what the model consumes thinking through the task.

This showed up clearly in an experiment where the model had to extract a few fields from a document, like a name, address, phone number, and job title. That was the text output every model was supposed to return. But when you looked at actual token usage, some of the frontier models, like GPT-5, used ten, twenty, or a hundred times more tokens, because they were doing a lot of reading and reasoning behind the scenes before outputting those few items.

You can estimate the text output. You generally cannot estimate the reasoning tokens. Only by actually testing on your use case do you learn this, and it greatly impacts the cost.

The cost you see is an estimate

When you make an API call, you get back the answer plus information about what it did, mostly the tokens: how many input tokens, how many output tokens, and sometimes the total. None of them return you a cost.

The cost is calculated from the model's pricing, for example a certain number of dollars per million input tokens and per million output tokens. As a reference point, one model was listed at five dollars per million input tokens and twenty-five dollars per million output tokens, which makes one thousand tokens cost half a cent. But that pricing can change, so any cost figure is an estimation until the bill comes. The only thing you can actually rely on is the tokens, input and output.

Web search is not free

Giving a model access to the web also costs tokens. When it searches, it goes to the website, scrapes the content, and treats that as a message, as if you had copy-pasted the whole page in as input, then processes it. That is why web search is still quite expensive, with one model listed at ten dollars per thousand web searches.

Article15 JAN 2026

When to Use RAG Instead of Putting the Whole Document in the Prompt

Putting the whole policy in the prompt works — until you have many policies or one that has grown too big. The simple rule for when to switch to RAG, and the cost reason underneath it.

Madalina Turlea

Article15 JAN 2026

Why ChatGPT Gets Dumber the Longer You Talk to It

Long chats degrade for a concrete reason: the context window. The model is not remembering — it is being re-sent the whole conversation every turn, and there is a hard limit.

Madalina Turlea

Article16 JUN 2026

AI Evals for Product Managers: The Complete Guide for 2026

The complete, practical guide to AI evals for product managers in 2026 — what an eval is, why it's a PM skill, and how to evaluate AI quality whether you have a live feature or just an idea.

Madalina Turlea

Tokens, Explained: What You Are Actually Paying For When You Build With LLMs

What a token is

Why the same prompt costs different amounts on different models

The hidden tokens

The cost you see is an estimate

Web search is not free

You might also like

When to Use RAG Instead of Putting the Whole Document in the Prompt

Why ChatGPT Gets Dumber the Longer You Talk to It

AI Evals for Product Managers: The Complete Guide for 2026

Your AI is live.Do you know it's working?

Your AI is live.
Do you know it's working?