If you spend enough time playing with large language models, you eventually meet two quiet troublemakers: tokens and context windows.
They don’t shout. They don’t glow. But they decide how much you can send to an AI model, how much it can return, and sometimes how much you pay.

Let’s break everything down in a friendly, human way—and then I’ll show you exactly how to count tokens on your own computer using OpenAI, Gemini, and Claude.

What Exactly Is a Token?

Think of tokens as the small pieces that models read and write—like Lego bricks for language.

A token is not always a full word.
Sometimes it’s a whole word (“dog”),
sometimes part of a word (“run”, “ning”),
sometimes punctuation (“.”).

Models don’t read sentences.
They read tokens.

When you send a prompt, it gets chopped into hundreds or thousands of tokens.
The model processes them, generates more tokens, and sends those back.

So if an LLM feels slow, expensive, or stops responding halfway…
you’re usually looking at a token problem.

What Is a Context Window?

Every AI model has a maximum number of tokens it can hold in memory at once.
That limit is called the context window.

For example:

GPT-4o → ~128k tokens
Gemini 2.5 Flash → very large window (hundreds of thousands)
Claude Sonnet 3.5 → 200k tokens

Think of it like a desk:

A small desk fits a notebook.
A big desk fits textbooks, a laptop, and maybe a half-assembled drone.

A larger context window lets the model:

read longer documents
remember earlier parts of a conversation
handle bigger codebases
summarize entire PDFs
maintain complex multi-step reasoning

But even with huge windows, all prompts + all responses must fit inside.

Tokens matter. Context matters.
Now let’s learn how to count them yourself.

How to Count Tokens on Your Computer

Below are real, working examples for the three major LLM ecosystems:

OpenAI (ChatGPT / GPT-4o)
Google Gemini (2.5 Flash, 2.5 Pro)
Anthropic Claude (Sonnet / Opus / Haiku)

Everything shown works locally on Windows, macOS, or Linux.

1. Counting Tokens with OpenAI (Using `tiktoken`)

OpenAI gives us a fast offline tokenizer called tiktoken.
You don’t need an API key to use it. It works fully offline.

Install it:

pip install tiktoken

Create a file: `token_test.py`

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

text = "Hello! I'm testing tokenization with OpenAI."

tokens = enc.encode(text)

print("Tokens:", tokens)

print("Token count:", len(tokens))

Run it:

python token_test.py

✅ Example Output — OpenAI (tiktoken)

Tokens: [9906, 11, 314, 825, 1774, 2448, 13]

Token count: 7

This tells you exactly how OpenAI models will tokenize your input—even before you send anything to the API.

Perfect for estimating cost, cleaning prompts, or chunking large documents.

2. Counting Tokens with Google Gemini (Using API Usage Metadata)

Unlike OpenAI, Google doesn’t provide a standalone tokenizer.
Instead, you can count tokens after an API call by reading usage_metadata.

Install Gemini SDK:

pip install google-genai

Set your API key (Windows PowerShell):

For Windows

setx GEMINI_API_KEY "your_key_here"

For Linux

export GEMINI_API_KEY="your_key_here" # for linux

(After the command close and then open a new terminal.)

Create `gemini_test.py`:

from google import genai

client = genai.Client()

response = client.models.generate_content(

model="gemini-2.5-flash",

contents="Hello Gemini! Please introduce yourself.",

)

print("\n--- RESPONSE ---")

print(response.text)

print("\n--- TOKEN USAGE ---")

print(response.usage_metadata)

Run it:

python gemini_test.py

✅ Example Output — Google Gemini (usage_metadata)

--- RESPONSE ---

Hello! Great to meet you. Here's a fun fact: Linux runs on most of the world's servers.

--- TOKEN USAGE ---

prompt_token_count: 12

candidates_token_count: 28

total_token_count: 40

Gemini returns token usage like:

prompt_token_count candidates_token_count total_token_count

These numbers tell you exactly how many tokens you consumed for billing and context.

3. Counting Tokens with Claude (Using `messages.count_tokens`)

Claude gives you both:

API token usage (after a request)
Pre-counting tokens (before a request!)

Very helpful when planning large workloads.

Install Anthropic SDK:

pip install anthropic

Set your API key:

 For Windows

setx ANTHROPIC_API_KEY "your_key_here"

For Linux

export ANTHROPIC_API_KEY="your_key_here" # for linux

(After the command close and then open a new terminal.)

**A) Count tokens after sending a message**

Create claude_test.py:

import os

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(

model="claude-3-5-sonnet-latest",

max_tokens=200,

messages=[

{"role": "user", "content": "Give me one fun fact about Linux."} ],

)

print("\n--- RESPONSE ---")

print(message.content[0].text)

print("\n--- TOKEN USAGE ---")

print("Input tokens :", message.usage.input_tokens)

print("Output tokens:", message.usage.output_tokens)

Run:

python claude_test.py

✅ Example Output — Claude (token usage + count_tokens)

--- RESPONSE ---

Here's one fun fact about Linux: The kernel was originally released in 1991 by Linus Torvalds.

--- TOKEN USAGE ---

Input tokens : 11

Output tokens: 27

**B) Count tokens before sending (Claude’s pre-counter)**

Create claude_count.py:

import anthropic

client = anthropic.Anthropic()

result = client.messages.count_tokens(

model="claude-3-5-sonnet-latest",

messages=[

{"role": "user", "content": "How many tokens will this message use?"} ],

)

print("Token count:", result.input_tokens)

This is priceless when:

splitting long documents
debugging token overflows
estimating cost ahead of time
designing multi-step workflows

Claude’s token API is one of the cleanest in the industry.

Why Tokens and Context Windows Matter

Tokens matter because they determine:

1. Cost

You pay per token for most APIs.

2. Speed

Longer prompts → more tokens → slower responses.

3. Model Memory

The model cannot exceed its max context window.

4. Prompt Engineering Quality

Good prompt design = fewer tokens + clearer instructions.

5. Chunking & Summarization

If you feed large documents, you must split them into token-safe chunks.

Understanding tokens is like understanding fuel in a car:
you drive better when you know the limits.

A Quick Summary Table

Model	Offline Token Counting	After-Request Token Usage	Pre-Request Token Check	Difficulty
OpenAI GPT-4o	✔ `tiktoken`	✔ Yes	✔ Via tiktoken	Easiest
Gemini 2.5	❌ No official	✔ `usage_metadata`	❌ No official	Medium
Claude 3.5	❌ No offline tools	✔ Yes	✔ `messages.count_tokens`	Very easy

In Plain English

OpenAI → great offline tools
Gemini → count tokens from API
Claude → count tokens before and after API calls

And now you can do all three on your own machine.

Conclusion

Tokens and context windows aren’t scary—they’re predictable.
Once you know how many tokens you're working with, everything becomes easier:

prompts behave consistently
costs become manageable
large documents become tractable
failures become debuggable

Tokens, Context Windows, and How to Count Them (With Real Examples You Can Run Today)

What Exactly Is a Token?

What Is a Context Window?

How to Count Tokens on Your Computer

1. Counting Tokens with OpenAI (Using `tiktoken`)

Install it:

Create a file: `token_test.py`

Run it:

✅ Example Output — OpenAI (tiktoken)

2. Counting Tokens with Google Gemini (Using API Usage Metadata)

Install Gemini SDK:

Set your API key (Windows PowerShell):

Create `gemini_test.py`:

Run it:

✅ Example Output — Google Gemini (usage_metadata)

3. Counting Tokens with Claude (Using `messages.count_tokens`)

Install Anthropic SDK:

Set your API key:

**A) Count tokens after sending a message**

✅ Example Output — Claude (token usage + count_tokens)

**B) Count tokens before sending (Claude’s pre-counter)**

Why Tokens and Context Windows Matter

1. Cost

2. Speed

3. Model Memory

4. Prompt Engineering Quality

5. Chunking & Summarization

A Quick Summary Table

In Plain English

Conclusion

Comments 0

Tokens, Context Windows, and How to Count Them (With Real Examples You Can Run Today)

What Exactly Is a Token?

What Is a Context Window?

How to Count Tokens on Your Computer

1. Counting Tokens with OpenAI (Using tiktoken)

Install it:

Create a file: token_test.py

Run it:

✅ Example Output — OpenAI (tiktoken)

2. Counting Tokens with Google Gemini (Using API Usage Metadata)

Install Gemini SDK:

Set your API key (Windows PowerShell):

Create gemini_test.py:

Run it:

✅ Example Output — Google Gemini (usage_metadata)

3. Counting Tokens with Claude (Using messages.count_tokens)

Install Anthropic SDK:

Set your API key:

A) Count tokens after sending a message

✅ Example Output — Claude (token usage + count_tokens)

B) Count tokens before sending (Claude’s pre-counter)

Why Tokens and Context Windows Matter

1. Cost

2. Speed

3. Model Memory

4. Prompt Engineering Quality

5. Chunking & Summarization

A Quick Summary Table

In Plain English

Conclusion

Related Articles

AI CLI Agents Smackdown: Claude Code, Codex CLI, GitHub Copilot CLI, Cursor CLI, and Kimi CLI — Who Actually Wins?

Why Developers Use Code They Don't Trust

How I Use Gemini, ChatGPT, and Notion to Turn Technical Tutorials into Organized Lab Notes

Comments 0

1. Counting Tokens with OpenAI (Using `tiktoken`)

Create a file: `token_test.py`

Create `gemini_test.py`:

3. Counting Tokens with Claude (Using `messages.count_tokens`)

**A) Count tokens after sending a message**

**B) Count tokens before sending (Claude’s pre-counter)**