In the world of generative artificial intelligence, like the one we use at Novatium to
automate processes or assist users, a keyword that constantly comes up is: token.
But what is a token? And why should you pay attention to how they're consumed?
What is a token?
A token is a unit of text that language models use to process information. It's
not exactly a word: it can be a syllable, a whole word, or even a symbol. For example
- The word “intelligence” translates to 2 tokens.
- “Hello” is 1 token.
- A phrase like "How are you?" can take up between 4 and 6 tokens, depending on
the model.
Every interaction with an AI model (such as ChatGPT, Claude, or Gemini) involves the
consumption of tokens both in the input (what you ask it) and in the
output (the answer it gives you).
Why do tokens matter at the cost level?
AI model providers (such as OpenAI, Anthropic, Mistral, Google, etc.) charge based on
the number of tokens processed. Thus, the price a company pays for using artificial
intelligence doesn't depend on the number of questions asked, but rather on the total
amount of information processed.
For example:
Model |
Estimated price per 1,000 tokens |
What it represents |
GPT-4o |
~0,005 USD (entry) |
750 words approx. |
Claude 3 Opus |
~0,015 USD (entry) |
Very long context |
Gemini 1.5 Pro |
~0,010 USD (entry) |
High compression capacity |
This means that a very long query, or a long answer, can cost 10 times more than a
simple one.
How much does a typical use cost?
Let's say a company uses a chatbot that answers frequently asked questions. A typical
interaction can consume between 100 and 500 tokens, depending on the level of
detail in the response.
- 1,000 simple queries per day → ~100,000 tokens
- Daily cost (economic model): 0,50 USD
- Monthly cost: 15 USD
But if those responses include multiple documents, tables, or advanced customization,
the consumption can multiply. In more complex solutions, such as internal assistants or
legal text analysis, it can exceed 2 million tokens per month, and that does have a
budgetary impact.
Tips to optimize the use of tokens
- Concise prompts: the clearer and shorter the request, the fewer tokens are
consume
- Control context size: avoid always loading all available data if it is not
needed.
- Choosing the right model: More powerful models (such as the GPT-4 or Claude
Opus) consume more, but are not always necessary.
- Response summary: Configure to have concise results, especially in
automations.
Write to us. At
Novatium, support is just the beginning.