Monish Keswani | Optimizing Claude Token Usage: A Practical Guide to Context Management

1. Understanding Claude’s Context Stack

When Claude responds, it is not just reading your latest message.

It processes an entire context stack.

Every response consumes tokens from:

Total Context Used = System Prompt + System Tools + Memory + Skills + Messages + Free Space (Remaining)

Let’s break this down.

System Prompt

Core instructions that define Claude’s behaviour.

System Tools

Capabilities Claude can access automatically.

Examples:

NotebookEdit
Plan Mode
Agent
WebSearch
Code Execution

These are loaded into context even if unused.

Memory

Persisted knowledge about your workflow or project.

Skills

Reusable capabilities Claude dynamically loads.

Messages

Your conversation history.

⚠️ This is the only component that continuously grows.

Free Space

Remaining capacity before hitting Claude’s context limit (typically ~200K tokens).

2. The Core Problem: Messages Grow Forever

Unlike system components, messages accumulate.

Every prompt-response pair adds:

Input Tokens + Output Tokens → Stored in History

Over time:

Messages → eat into Free Space

This leads to:

Slower responses
Reduced reasoning quality
Higher cost
Truncation risks

3. Understanding Turns, KV Cache, and Why Claude Re-Reads Context

To truly optimize Claude Code usage, you must understand one core concept:

Claude is priced on what it reads — and it re-reads everything every turn.

What is a Turn?

A turn is one full interaction cycle:

User prompt → Claude reads full context → Claude generates response

That entire loop = 1 turn

What Constitutes a Turn?

Each turn includes Claude reprocessing:

System Prompt
Tools
Memory
Conversation History
Your New Input

Even if you didn’t repeat anything, Claude still reprocesses all of it.

Turn Examples

Interaction	Turn Count
You ask: “Fix this bug”	1
You follow up: “Optimize performance”	2
You say: “Add logging”	3

Even though this feels like one conversation, Claude performs 3 full inference passes.

Mental Model

Think of Claude like this:

Each turn = Claude re-reading the entire book before answering the next question.

It does not continue thinking from where it left off.

Why KV Cache Doesn’t Help Across Turns

Many assume:

“KV cache should store past context — so why re-read?”

KV cache only works within one response generation.

During token-by-token generation:

Token → Token → Token

KV cache avoids recomputing attention. But once Claude finishes answering:

That inference ends
KV cache is discarded

Next turn = fresh inference

Why Claude Re-Reads Every Turn

Claude is stateless between turns by design. This allows:

Tool updates
File edits
Memory changes
Plan revisions
Safety guarantees
Deterministic behavior

If KV cache were persisted, Claude could reason using stale state, tool outputs could become inconsistent, and conversation edits would break correctness.

Persistent KV would lock Claude into past assumptions, break agent reliability, and make tool-driven workflows unsafe.

KV cache = scratchpad while writing an answer. Each new question requires re-reading the notebook, not reusing scratchpad thoughts.

4. The Solution: `/compact`

Claude provides a built-in command:

/compact

What `/compact` Does

It summarizes conversation history while preserving:

Key decisions
Constraints
Instructions
Project goals

It replaces:

Long message chain ↓ Compressed semantic memory

Example:

Before:

Messages = 120K tokens

After:

Messages = ~8K tokens

5. How `/compact` Affects Context

Component	Before	After
Messages	Huge	Minimal
Memory	Preserved	Preserved
System Prompt	Unchanged	Unchanged
Skills	Unchanged	Unchanged

Claude does not forget your work.

It retains:

Intent
Architecture
Constraints

But removes:

Redundant iterations
Debugging loops
Dead discussions

6. Automatic Compaction

Claude may automatically run compaction when it detects:

(Input + Output) > Remaining Context Window

Since the context window is typically:

~200K tokens

Automatic compaction prevents failure —
but happens late.

Manual compaction = better performance.

7. Task Isolation: Use New Windows

Claude carries history forward across a session.

So if you switch from:

RAG Debugging → Writing a Blog

Claude still loads:

Code discussions
Architecture decisions
Tool usage
Previous iterations

Even if irrelevant.

Result:

Token waste
Slower reasoning

Best Practice

Use a new Claude window for a new task

This removes:

Unnecessary Message History

and maximizes:

Free Space

8. Tool Optimization via `/tools`

Claude loads many tools by default.

Check active tools using:

/tools

You may see:

NotebookEdit
Plan Mode
Agent
WebSearch

Each tool consumes tokens in the system layer.

Example

If your project has no .ipynb files:

👉 NotebookEdit is wasted context.

If you’re not browsing live data:

👉 WebSearch adds unnecessary overhead.

If you’re not running autonomous workflows:

👉 Agent may be irrelevant.

Each unused tool acts as:

Token Tax

Removing unused tools improves:

Available context
Response efficiency
Cost efficiency

9. Summarization vs `/compact`

Since Claude re-reads every turn, optimization means reducing what it has to read. The two most powerful tools are summarization and /compact — but they work differently.

Summarization is semantic compression. You manually convert a long conversation into a structured memory document:

100K history → 20K structured memory

This keeps decisions, constraints, and current state — and removes exploration, iterations, and dead paths.

/compact is structural pruning. Claude removes redundant reasoning, tool chatter, and repetition — but does not rewrite meaning.

Feature	Summarization	`/compact`
Type	Semantic	Structural
Rewrites knowledge?	Yes	No
Removes clutter?	Yes	Yes
Keeps conclusions?	Depends on prompt	Automatically
Requires prompt?	Yes	No
Control level	High	Low

Summarization = Rewrite the story /compact = Remove the noise

Impact on Usage

Strategy	Context Size	Total Usage Trend
No optimization	Large	Explodes
Only `/compact`	Medium	Moderate
Only summarization	Small	Efficient
Summarization + `/compact`	Smallest	Optimal

Practical Workflow

Explore → Decide → Summarize → /compact → Continue

This converts recurring cost into one-time cost.

10. Optimization Checklist

Action	Benefit
Run `/compact` regularly	Shrinks message history
Summarize after exploration phases	Semantic compression of expensive history
Use new window per task	Eliminates irrelevant history
Check `/tools`	Remove unused tool overhead
Avoid multi-project threads	Prevent silent context growth
Compact before major prompts	Maximize reasoning capacity
Combine summary + compact	Smallest possible context footprint

Final Insight

Claude performance is rarely limited by intelligence.

It is limited by:

Context hygiene.

Managing:

Message growth
Tool load
Task isolation
Summarization cadence

is the difference between:

⚡ Fast, high-quality responses vs 🐌 Slow, degraded reasoning

Claude re-reads context every turn because it is stateless by design.

KV cache helps within responses — not across turns.

So optimization is not about faster thinking…

It’s about giving Claude less to remember.

Optimize context → Optimize Claude.

1. Understanding Claude’s Context Stack

System Prompt

System Tools

Memory

Skills

Messages

Free Space

2. The Core Problem: Messages Grow Forever

3. Understanding Turns, KV Cache, and Why Claude Re-Reads Context

What is a Turn?

What Constitutes a Turn?

Turn Examples

Mental Model

Why KV Cache Doesn’t Help Across Turns

Why Claude Re-Reads Every Turn

4. The Solution: /compact

What /compact Does

5. How /compact Affects Context

6. Automatic Compaction

7. Task Isolation: Use New Windows

Best Practice

8. Tool Optimization via /tools

Example

9. Summarization vs /compact

Impact on Usage

Practical Workflow

10. Optimization Checklist

Final Insight

4. The Solution: `/compact`

What `/compact` Does

5. How `/compact` Affects Context

8. Tool Optimization via `/tools`

9. Summarization vs `/compact`