Tokens, Context Windows, and How to Count Them (With Real Examples You Can Run Today)

Categories: AI

If you spend enough time playing with large language models, you eventually meet two quiet troublemakers: tokens and context windows.
They don’t shout. They don’t glow. But they decide how much you can send to an AI model, how much it can return, and sometimes how much you pay.

Let’s break everything down in a friendly, human way—and then I’ll show you exactly how to count tokens on your own computer using OpenAI, Gemini, and Claude.


What Exactly Is a Token?

Think of tokens as the small pieces that models read and write—like Lego bricks for language.

A token is not always a full word.
Sometimes it’s a whole word (“dog”),
sometimes part of a word (“run”, “ning”),
sometimes punctuation (“.”).

Models don’t read sentences.
They read tokens.

When you send a prompt, it gets chopped into hundreds or thousands of tokens.
The model processes them, generates more tokens, and sends those back.

So if an LLM feels slow, expensive, or stops responding halfway…
you’re usually looking at a token problem.


What Is a Context Window?

Every AI model has a maximum number of tokens it can hold in memory at once.
That limit is called the context window.

For example:

  • GPT-4o → ~128k tokens

  • Gemini 2.5 Flash → very large window (hundreds of thousands)

  • Claude Sonnet 3.5 → 200k tokens

Think of it like a desk:

  • A small desk fits a notebook.

  • A big desk fits textbooks, a laptop, and maybe a half-assembled drone.

A larger context window lets the model:

  • read longer documents

  • remember earlier parts of a conversation

  • handle bigger codebases

  • summarize entire PDFs

  • maintain complex multi-step reasoning

But even with huge windows, all prompts + all responses must fit inside.

Tokens matter. Context matters.
Now let’s learn how to count them yourself.


How to Count Tokens on Your Computer

Below are real, working examples for the three major LLM ecosystems:

  • OpenAI (ChatGPT / GPT-4o)

  • Google Gemini (2.5 Flash, 2.5 Pro)

  • Anthropic Claude (Sonnet / Opus / Haiku)

Everything shown works locally on Windows, macOS, or Linux.


1. Counting Tokens with OpenAI (Using tiktoken)

OpenAI gives us a fast offline tokenizer called tiktoken.
You don’t need an API key to use it. It works fully offline.

Install it:

 
pip install tiktoken

 

Create a file: token_test.py

 

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

text = "Hello! I'm testing tokenization with OpenAI."

tokens = enc.encode(text)

print("Tokens:", tokens)

print("Token count:", len(tokens))

  

Run it:

python token_test.py

 

Example Output — OpenAI (tiktoken)

Tokens: [9906, 11, 314, 825, 1774, 2448, 13]

Token count: 7

 

 

This tells you exactly how OpenAI models will tokenize your input—even before you send anything to the API.

Perfect for estimating cost, cleaning prompts, or chunking large documents.


2. Counting Tokens with Google Gemini (Using API Usage Metadata)

Unlike OpenAI, Google doesn’t provide a standalone tokenizer.
Instead, you can count tokens after an API call by reading usage_metadata.

Install Gemini SDK:

pip install google-genai

 

Set your API key (Windows PowerShell):

For Windows

setx GEMINI_API_KEY "your_key_here"

 

For Linux

export GEMINI_API_KEY="your_key_here"  # for linux

(After the command close and then open a new terminal.)

 

 

Create gemini_test.py:

 

from google import genai

client = genai.Client()

response = client.models.generate_content(

           model="gemini-2.5-flash",

           contents="Hello Gemini! Please introduce yourself.",

       )

print("\n--- RESPONSE ---")

print(response.text)

 

print("\n--- TOKEN USAGE ---")

print(response.usage_metadata)

 

Run it:

 
python gemini_test.py

 

Example Output  — Google Gemini (usage_metadata)

--- RESPONSE ---

Hello! Great to meet you. Here's a fun fact: Linux runs on most of the world's servers.

--- TOKEN USAGE ---

prompt_token_count: 12

candidates_token_count: 28

total_token_count: 40

 

Gemini returns token usage like:

 

prompt_token_count candidates_token_count total_token_count

These numbers tell you exactly how many tokens you consumed for billing and context.


3. Counting Tokens with Claude (Using messages.count_tokens)

Claude gives you both:

  • API token usage (after a request)

  • Pre-counting tokens (before a request!)

Very helpful when planning large workloads.

Install Anthropic SDK:

 
pip install anthropic

Set your API key:

 For Windows
setx ANTHROPIC_API_KEY "your_key_here"

For Linux

export  ANTHROPIC_API_KEY="your_key_here"  # for linux

(After the command close and then open a new terminal.)


A) Count tokens after sending a message

Create claude_test.py:

 

import os

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(

        model="claude-3-5-sonnet-latest",

        max_tokens=200,

        messages=[

          {"role": "user", "content": "Give me one fun fact about Linux."} ],

   )

print("\n--- RESPONSE ---")

print(message.content[0].text)

 

print("\n--- TOKEN USAGE ---")

print("Input tokens :", message.usage.input_tokens)

print("Output tokens:", message.usage.output_tokens)

 

Run:

 
python claude_test.py

 

 

Example Output — Claude (token usage + count_tokens)

--- RESPONSE ---

Here's one fun fact about Linux: The kernel was originally released in 1991 by Linus Torvalds.

--- TOKEN USAGE ---

Input tokens : 11

Output tokens: 27

 


B) Count tokens before sending (Claude’s pre-counter)

Create claude_count.py:

 

import anthropic

client = anthropic.Anthropic()

result = client.messages.count_tokens(

      model="claude-3-5-sonnet-latest",

      messages=[

          {"role": "user", "content": "How many tokens will this message use?"} ],

    )

print("Token count:", result.input_tokens)

 

 

This is priceless when:

  • splitting long documents

  • debugging token overflows

  • estimating cost ahead of time

  • designing multi-step workflows

Claude’s token API is one of the cleanest in the industry.


Why Tokens and Context Windows Matter

Tokens matter because they determine:

1. Cost

You pay per token for most APIs.

2. Speed

Longer prompts → more tokens → slower responses.

3. Model Memory

The model cannot exceed its max context window.

4. Prompt Engineering Quality

Good prompt design = fewer tokens + clearer instructions.

5. Chunking & Summarization

If you feed large documents, you must split them into token-safe chunks.

Understanding tokens is like understanding fuel in a car:
you drive better when you know the limits.


A Quick Summary Table

Model Offline Token Counting After-Request Token Usage Pre-Request Token Check Difficulty
OpenAI GPT-4o tiktoken ✔ Yes ✔ Via tiktoken Easiest
Gemini 2.5 ❌ No official usage_metadata ❌ No official Medium
Claude 3.5 ❌ No offline tools ✔ Yes messages.count_tokens Very easy

In Plain English

  • OpenAI → great offline tools

  • Gemini → count tokens from API

  • Claude → count tokens before and after API calls

And now you can do all three on your own machine.


Conclusion

Tokens and context windows aren’t scary—they’re predictable.
Once you know how many tokens you're working with, everything becomes easier:

  • prompts behave consistently

  • costs become manageable

  • large documents become tractable

  • failures become debuggable