How to Use Claude Code Well

The gap between an experienced Claude Code user and a newcomer isn't prompt skill. It's whether you know "what shape of input Claude Code was built to digest." Match your prompts, memory, and sessions to that shape, and you get two or three times the output for the same cost.

This isn't a book about the code itself. The code-centric version lives in the same repo at DEEP-DIVE.md. This book reverse-engineers "why was it built this way" from the internals, then lays out how to use it as a user, in alignment with that intent.

For example: why delegating long explorations to subagents keeps the main conversation alive much longer. Why editing CLAUDE.md frequently drives up costs. Why "fix X" gets better results than "could you help with this?" Why manually running /compact is more reliable than letting it run automatically. Why a hook returning block sometimes lets the next turn proceed anyway. All of it follows from design intent.

The two yellow layers are where this book focuses. The extensibility layer is what users touch directly; the core runtime is where the answers to "why does it behave that way" live.

Part One

Inside the
Authors' Minds

Chapter 01

Everything is a Stream

The premise "the user must be able to interrupt at any time" governs the entire architecture.

Design Intent

A single turn in Claude Code is not a function call — it's an async generator. When user input arrives, every LLM response token, tool execution progress update, file change notification, and stop hook result flows through a single stream.

QueryEngine.ts · line 209export class QueryEngine {
  async *submitMessage(
    prompt: string | ContentBlockParam[],
    options?: { uuid?: string; isMeta?: boolean },
  ): AsyncGenerator<SDKMessage, void, unknown>
}

The authors' reason for choosing this structure reduces to one thing. The longer the LLM response, the more users want to change direction mid-flight. Traditional CLIs use a complete-then-display model — interrupt and everything done so far is lost. Claude Code is the opposite. Interrupt and files already written stay on disk; if a tool was halfway through execution, that half is recorded.

One turn is one generator. The consumer pulls with for await, so cancellation, streaming, and backpressure come for free.

What Changes for You

Observe First · Prompt Later

The First 5 Seconds Rule

On long tasks, watch the first 5–10 seconds without fail. If the model starts reading the wrong files, hit Ctrl+C right then. The later you interrupt, the more you end up with a "half right, half wrong" state that contaminates the context. A contaminated context spreads to the next turn.

Streaming makes mid-run intervention free. With traditional CLIs, once you fire you wait — which meant crafting careful prompts upfront. Claude Code flips that. Instead of spending 30 seconds perfecting a prompt, just fire it and judge after watching the first two or three tool_use calls. Spend less time on prompt engineering; spend more time observing.

Don't

"Could you read the file first, understand the structure, then carefully make changes? Keep the existing style, and if there are related files please check those too…"

Fix the validation at auth.ts:42 to RFC 5322. If the first Read hits the right file, keep going — otherwise interrupt and re-direct.

Takeaway Observe first. Prompt perfectionism defeats the purpose of a streaming architecture.

Chapter 02

Safe by Default

"When in doubt, stop and ask." The biggest risk in an LLM agent isn't mistakes — it's silent mistakes.

Design Intent

Every tool default is set toward the conservative side. Create a new tool and it's automatically judged "not read-only," "not concurrency-safe," "not destructive." File edits require a Read first. Bash must pass through four independent permission layers.

Tool.ts · TOOL_DEFAULTSconst TOOL_DEFAULTS = {
  isEnabled:         () => true,
  // When in doubt: serialize, assume writable, assume nondestructive — permissions check catches the rest
  isConcurrencySafe: () => false,
  isReadOnly:        () => false,
  isDestructive:     () => false,
  checkPermissions:  async () => ({ behavior: 'allow' }),
  toAutoClassifierInput: () => '',
}

BashTool is especially well-defended. Rather than cramming everything into one file, the authors split it across 7 independent files. Each layer is unaware of the others — if one is breached, the rest still hold.

Running a single bash command requires passing four independent checks. The yellow layer is the one users can directly configure.

The biggest risk with an LLM agent isn't doing something wrong — it's doing something wrong quietly. We accept the performance cost and stop to ask whenever something is uncertain. The authors' judgment

What Changes for You

Trust the default permission system, then layer more restrictions on top with hooks. Many users run --dangerously-skip-permissions as their default or keep acceptEdits mode on throughout a session. That's a choice that works against the authors' intent. Use PreToolUse hooks instead.

You don't need to write "don't git push --force" in your prompt over and over. One line in settings.json hooks blocks it permanently.

This is exactly why the authors built hooks — so users can draw their own lines around unexpected risk.

Takeaway Don't disable safety guards — stack more on top. Repeating warnings in prompts is a waste.

Chapter 03

Tokens are a
Scarce Resource

Context economics. Four mechanisms for saving tokens and how to use them.

Design Intent

Claude Code has four layers of token-saving mechanisms built in.

First, Deferred Tool Loading. Of dozens of commands and 20+ built-in tools, only the core set is loaded into the system prompt. The rest have their schemas loaded only when the model searches for them via ToolSearch. This alone shaves thousands of tokens from the initial prompt.

Second, 4-stage auto-compaction. It starts light, works through read-only segment summarization, and escalates to full LLM summarization only as needed. Immediately after compaction, the 5 most recently modified files are re-injected to preserve the context of what was just done.

The four stages run in sequence; postCompactCleanup restores the context of what was just done. After three consecutive failures, autocompact trips its circuit breaker and halts.

Third, prompt caching discounts. When the same prefix repeats, cache reads cost 10% of the base price, while cache writes (the first load) cost 125%. Only the first call is slightly more expensive — every call after that is 90% off, so effective cost falls as conversations grow longer.

Fourth, independent subagent budgets. Running 4 agents in parallel costs 4x, but each has its own independent budget — so the main conversation never gets cut short because a subagent investigation used up the shared pool.

Tokens aren't infinite, and users shouldn't spend cognitive load managing them. The system handles that automatically. The authors' judgment

What Changes for You

The Biggest Mistake

Don't Edit CLAUDE.md Frequently

CLAUDE.md enters the system prompt as a stable prefix — it's the body of prompt caching. Change one sentence and the cache breaks; once the cache breaks, every subsequent call pays full price. Touch the global file no more than once a week. Anything you change often belongs in a project-specific .claude/CLAUDE.md.

Cache Hit Strategy

Front-load context into the first message

If you put "here's what I'm working on, here are the constraints, here's the completion criteria" all in the first message, the whole thing lands in cache. Adding things piecemeal later — "oh and also this" — puts them outside the cached prefix, so the discount doesn't apply. If an image sits in the middle of the prefix, it busts the cache for everything after it. Always attach images at the end of your prompt.

Model Role Allocation

Opus for judgment, Sonnet for implementation, Haiku for repetition

Running everything on a single model is throwing money away. You can swap models per subagent — send the exploration agent to Sonnet, reserve Opus for final decisions.

How to Structure CLAUDE.md

There's a structure that captures both cache efficiency and practicality at the same time.

Global (~/.claude/CLAUDE.md): Fixed facts about you only. Coding style, commit rules, off-limits instructions. Modify less than once a month. Under 20 lines.

Project (.claude/CLAUDE.md): Rules specific to this repo. Build commands, test methods, architecture summary. Changing it frequently is fine — it's a project-scoped cache, so it doesn't affect the global cache.

Don't

Cram project-specific build commands into the global file → cache breaks every time you switch projects

Global holds style only; project rules go in .claude/CLAUDE.md → caches stay independent

~/.claude/CLAUDE.md · global example# Coding style
TypeScript first. React + Next.js + Tailwind.
Functional components + hooks. No classes.
camelCase variables, PascalCase components.
Comments explain "why" only.

# Commits
Conventional Commits (feat/fix/refactor/docs/test/chore).
No console.log debug code left in commits.

# Off-limits
Don't explain things I didn't ask about.
Don't add unnecessary dependencies to package.json.

.claude/CLAUDE.md · project example# This project
Next.js 15 + App Router. /src/app based.
DB: Supabase (Postgres). ORM: Drizzle.

# Build
pnpm dev → localhost:3000
pnpm test → vitest
pnpm lint → eslint + prettier

# Watch out
Don't touch /src/lib/auth.ts (auth logic being stabilized).
API routes go under /src/app/api/ only.

Takeaway Saving tokens is a collaboration, not the system's job alone. The system has four layers ready — your job is to not break them.

Chapter 04

Subagents are
Isolated Minds

Protect the main mind. Long explorations go in isolated minds; the main does only judgment.

Design Intent

The most interesting thing about how AgentTool creates subagents is the isolation model. Subagents inherit all of the parent's permissions and tools, but cannot write to the main AppState. Only shared infrastructure like the task registry is an exception. Everything else follows the "parent is read-only; results come back as messages only" rule.

Subagent results return as summaries only, keeping the main context clean. setAppState is a no-op inside subagents.

Why does this matter? LLM quality degrades as context grows longer. Model vendors brag about "1 million token support," but in practice a conversation at 500K tokens gives hazier answers than one at 200K. The authors knew this. So keeping the main conversation as clean as possible was the top priority, and all "burns lots of tokens but produces a short result" work — exploration, investigation, verification — was pushed into isolated subagents.

What Changes for You

Core Rule

The 3-File Rule for the Main Thread

If the main thread is directly reading three or more files, stop. You're already off track. Spin up a Task tool or Explore agent and say "read these three files and summarize X." Only a one-paragraph summary comes back to the main context. Doing the same work in the main thread piles up thousands of tokens.

Don't be afraid to run subagents in parallel. They're isolated — no state collisions. Just be explicit about write scope per agent. "Agent A touches /services only, Agent B touches /components only." Without this in the prompt, two agents can hit the same file simultaneously and clobber each other.

Always verify with a separate agent. Asking the implementing agent "did you get it right?" always gets "yes." Spin up a code-reviewer agent to read the same output with fresh eyes. The reason the authors isolated AgentTool is precisely this fresh perspective.

Takeaway The main thread is the orchestrator. Judgment, coordination, final decisions only. The actual heavy lifting goes to subs.

🔒

You've read the first 4 chapters.

The remaining 10 chapters, bonus workflow, CLAUDE.md templates, and MCP setup guide are in the full ebook.

Ch 05 Hooks · Ch 06 Plan Mode · Ch 07 Memory · Ch 08 Prompts
Ch 09 Work Units · Ch 10 Skills · Ch 11 Delegation · Ch 12 Cost Cutting
Ch 13 Debugging · Ch 14 Design Intents · Bonus: First 30 Min Workflow

Get the full ebook$9+

Pay what you want · PDF · Instant download · Free updates

Chapter 05

Hooks are the Escape Hatch

"Go beyond what we expected." The door the authors left open for users.

Design Intent

The hook system lets users attach scripts to events like PreToolUse, PostToolUse, SessionStart, UserPromptSubmit, and Stop. The most powerful feature is PreToolUse's updatedInput. A hook can rewrite the tool input the model just called, and return the modified version to be executed.

Hook response schema{
  "continue": true,
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow",
    // Hook rewrites the model's called input before execution
    "updatedInput": {
      "command": "git commit -m 'fix' --signoff"
    },
    "additionalContext": "signoff added automatically"
  }
}

For example: the model calls git commit -m "fix". The hook intercepts it, rewrites it to git commit -m "fix" --signoff, and passes that to execution. The model thinks its original command went through. The user silently applied their own policy.

We can't know everything a user will want. We dug an escape hatch: "for requirements we couldn't anticipate, plug them in here." The architecture of humility

What Changes for You

Stop repeating instructions in prompts — encode them in hooks. Writing "don't git push --force," "don't rm -rf," "don't leave console.log" in every prompt is waste. Build it as a hook once and it works forever. Saves tokens. Prevents mistakes.

The distinction between hooks, skills, and commands is covered in Chapter 10. This chapter focuses on the hook internal mechanism only.

Warning

Hooks can fail silently

If you only return permissionDecision: 'deny', the tool call is blocked — but the model can immediately try the next tool. Add additionalContext with "this command is forbidden" so the model doesn't repeat the attempt. And always add logging to hook scripts.

Takeaway Put recurring rules in hooks, not prompts. Set once — fires automatically on every turn.

Chapter 06

Planning and Executing
are Different States

Exploration and modification use the brain differently. The authors made this a physical separation.

EnterPlanModeTool takes no input. It simply flips the session state to "plan mode." In this mode, all file-editing tools are locked. Only read-only tools survive — Read, Grep, Glob.

Why make this a separate state? Exploration requires "see as many possibilities as broadly as possible" thinking; modification requires "execute exactly one thing" thinking. Both LLMs and humans produce lower quality when they try to do both simultaneously.

Entry Criteria

Changes spanning five or more files. Unclear requirements. Irreversible operations — DB migrations, file deletion, git reset. Environment changes like framework upgrades. First exploration of an unfamiliar codebase. If any of these apply, entering plan mode first is the safer path.

Flow After Entry

Read broadly, organize questions, write a plan, critique the plan, exit with ExitPlanMode and execute. Skipping any of the five steps defeats the purpose of the plan. The most important step is the fourth: critique the plan. Asking the model "what are the weaknesses of this plan?" and "what cases are missing?" just once surfaces the gaps.

Drift Signal

"This also needs fixing" — twice

If you hear "this also needs fixing" two or more times during a refactor, stop what you're doing and return to plan mode. The moment scope starts expanding is the signal that the plan has already gone off track.

Takeaway Enter plan mode before you touch anything. Read broadly, organize questions, write a plan, critique it, then exit and execute. Skipping the critique step is how plans go wrong. "What are the weaknesses of this plan?" — one question, before the first edit.

Chapter 07

Memory is
Just Files

Not a database — markdown. An architecture of trust, because users must be able to open and edit it themselves.

Design Intent

Claude Code's memory isn't a database. It's stored as markdown files under ~/.claude/projects/<project>/memory/. MEMORY.md is the index; details live in individual files under topics/. The index has a 25KB / 200-line limit — exceed it and it's truncated with a warning.

Index holds one-line entries only; details go in separate files. Four types: user / feedback / project / reference.

AI memory stored in a database is hard to trust. Markdown files can be opened, edited, and version-controlled with git. Trust comes from transparency. The authors' judgment

What Changes for You

Most Common Mistake

Don't write body text into MEMORY.md

This is an index file. Fill it with one-line entries only. Details go in individual files under topics/. Ignore this and when you hit 200 lines and it truncates, important information disappears.

Default uses by type. The user type holds long-term facts about you — set once, rarely touched. feedback holds recurring mistake corrections — it grows every time you catch a mistake. project holds project structure — be careful with things that change frequently. reference holds external link collections.

Version-control it with git. Turn ~/.claude/projects/<slug>/memory/ into a git repo and you can track "when was this memory added and by what." When a memory corruption incident happens — the model remembering a wrong fact — you can roll back.

Takeaway Memory you can't open isn't memory — it's a black box. Keep it in markdown. Index in MEMORY.md with one-line entries only; details go in topics/. Version-control it with git so you can roll back when something gets corrupted.

Part Two

So How Do
You Use It?

Chapter 08

The Shape of a Prompt

Claude Code digests imperative single sentences in a 3-part structure with explicit files and line numbers.

Claude Code's system prompt already lays in "solve it with tools." So the user prompt needs to be an instruction, not an explanation. Ask for an answer and it gives up on using tools. Ask for long exploration and it delegates to a subagent.

Don't

"Could you maybe improve the validation on the login form? It would be nice if it matched the RFC spec, if possible — what do you think?"

Fix the email validation at LoginForm.tsx:42 to RFC 5322. Don't touch the UI. Add 3 tests. No changes to package.json.

3-Part Structure · Goal + Constraints + Verification

[goal] Fix the email validation in the login form to RFC 5322.
[files] src/components/LoginForm.tsx, src/utils/validation.ts
[constraints] Don't touch UI markup. Keep existing props signature.
[verification] npm test must pass. Add 3 new test cases.
[off-limits] Don't touch any other form components.

Turn this into a copy-paste template and thinking time shrinks. Goal alone and it breaks constraints arbitrarily. Constraints alone and the goal blurs. No verification and it terminates at "roughly done."

Shapes That Don't Work

Requests that start with "actually I was thinking…" Putting an unfinished thought process into the prompt blurs the model too. Decide first, then ask.

Multiple tasks in one prompt. "Do A and B and C" almost always gets B done sloppily. One turn, one task.

"If possible…" Conditional requests move the LLM toward avoiding the condition judgment. If you mean yes, say yes. If you mean no, say no.

Takeaway A good prompt is an instruction, not an explanation. Goal · files · constraints · verification · off-limits — five lines is enough.

Chapter 09

Units of Work — Turn · Session · Worktree

Confuse the three and you'll suffer for it.

Unit	Size	Criterion
Turn	1 submitMessage call	One change
Session	Full resumable conversation	One feature · one topic
Worktree	Isolated git directory	High-risk experiment

Feature complete means start a new session. Experimental change means enter a worktree. Several small fixes means multiple turns in one session. Start each day by deciding whether to resume yesterday's session or start fresh.

When to Use a Worktree

When you want to experiment without touching the existing branch. EnterWorktree creates a copy of the current repo in a temporary directory; you work freely inside it. If the result looks good, merge. If not, throw it away. The main code stays clean — no experiment residue.

Don't

Experiment on the main branch — "let me just try this" → broken → git stash / reset loop

Enter worktree → experiment → merge on success, delete on failure. Main stays clean.

Resume a Session or Start Fresh?

Resume when: follow-up work on the same feature. Adding tests after a bug fix. Incorporating review feedback. The context is still valid.

Start fresh when: switching to a different domain. The session has grown long and the model is starting to forget early instructions. You've run /compact twice and the context is still fuzzy. In these cases, cut the session and open a new one with "just finished X, next is Y."

The decision criterion is simple. "Does this session still know what I want?" If not, start fresh.

Recovery Tip

NDJSON is intentional

A broken session can be opened directly at session-<id>.jsonl. NDJSON means one line per message. Trim the last few lines and resume — you're back to the state just before the incident. Choosing NDJSON over plain JSON was a deliberate author decision: when something goes wrong, you can salvage the session by cutting from the end.

Takeaway Feature complete means new session. Experiment means worktree. Small fixes mean multiple turns in one session. Divide units consciously.

Chapter 10

Skill · Command · Hook

Multiple extension methods get confusing. A simple decision tree.

A single extension method rarely fits perfectly. When you need a bundle, use Plugin.

Examples in Practice

"Publish a blog post to 3 platforms simultaneously." A frequent natural-language trigger. Make it a Skill. Write "blog publishing" in whenToUse and the model invokes it automatically.

"Always run lint before git push." Needs to fire automatically every time. Make it a Hook. Attach a matcher to the PreToolUse event for Bash; if the command is git push, run lint first. Don't write this in every prompt.

Hook internal mechanics (updatedInput, blocking branches) were covered in Chapter 05.

"I need a summary of this project's current state." A shortcut I run when I want it. Command. Type /status and get the summary.

"I need to pull tasks from Notion." External system connection. MCP Server. Connect an MCP server wrapping the Notion API. Call the MCP server's tools from inside a skill or command.

Connecting MCP Servers

MCP (Model Context Protocol) is the standard for connecting Claude Code to external systems. It lets you use services like Notion, Gmail, Linear, and Slack as tools.

Configure in .claude/mcp.json.

.claude/mcp.json{
  "mcpServers": {
    "github": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "ghp_..."
      }
    },
    "filesystem": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
    }
  }
}

stdio vs SSE. Spawning a local process uses stdio. Connecting to a remote server uses sse with a URL. Most community servers are stdio.

Note. When an MCP server is registered, its tools are added to ToolSearch as deferred tools. The model searches for and calls them on demand. They don't load into the system prompt on every turn, so the token cost stays low.

Takeaway Automatic on every event → Hook. I type it → Command. Model decides → Skill. Bundle them → Plugin.

Chapter 11

When to Delegate

The boundary between what the main does and what subagents do.

Delegation Signals

You need to read 3 or more files. The output is a summary or report, not a file modification. The same pattern of work needs to happen independently multiple times. The result doesn't need to persist in the main context. You don't want to contaminate the current context. Any of these — delegate to a subagent.

Don't Delegate Signals

The opposite: stay in main when this task's result is the premise for the next turn. When you need to adjust things interactively with the user. When interactive decisions are required. When the domain is unclear enough that you can't write a spec for the subagent.

For write-scope partitioning and agent count limits in parallel delegation, see Chapter 04. This chapter focuses on when to decide to delegate.

Timing Judgment

Delegation Cost vs Explanation Cost

If writing the prompt for a subagent takes longer than doing it yourself, don't delegate. Writing a 10-line prompt to fix 3 lines is a net loss. Delegation pays off only when the exploration scope is wide or there are 2+ independent domains.

Takeaway If writing the delegation prompt takes longer than doing it yourself, don't delegate. Wide exploration scope is the only time delegation pays.

Part Three

Maximum
Efficiency

Chapter 12

10 Habits That Cut
Your Bill in Half

Follow the design intent and the costs naturally fall.

Don't put examples in the system prompt

Pasting long "do it like this" examples into CLAUDE.md inflates input tokens on every turn. Put examples in the first prompt once; keep rules only in CLAUDE.md.

Narrow the scope when Read results are long

Use the offset and limit parameters to read only the lines you need. Reading an entire 2,000-line file dumps all of it into context.

Grep > Bash grep

The built-in Grep tool is more token-efficient than running grep or rg through Bash. Dedicated tools return structured results; Bash output is raw text that eats into context.

Run `/compact` manually

More predictable than auto. Right after finishing a feature is the best time.

TodoWrite vs TaskCreate

Using prompts to manage to-do lists costs tokens on every turn. TaskCreate/TaskUpdate manages state in a separate store, keeping it out of context.

Block unnecessary tool calls with hooks

When the model accidentally tries to read a large file, a hook that blocks it saves tokens.

Don't paste the entire error log

Dumping a 500-line stack trace makes the model miss the signal. Paste only the first error message and the 5–10 relevant lines. For the rest: "full log is at /tmp/error.log" is enough.

Control auto-continuation

Explicitly lower the token budget to reduce incidents where the model just keeps running and burns through the budget.

Kill failing sessions early

If the model goes off the rails twice in one session, /clear. Continuing with a contaminated context just burns money.

Make `/cost` a routine

Check it a few times a day. You can't find where the leaks are if you don't know how much you're spending.

Takeaway Saving tokens is a habit. Don't change your tools — change the patterns you use them with.

Chapter 13

What to Check When Things Break

Symptom → cause → fix cheat sheet.

Symptom	Cause	Fix
Acts like it doesn't know a file it just edited	autocompact restoration limit (5 files/50KB)	Explicitly instruct it to Read that file again
Repeating the same mistake	No feedback memory	Explicitly request a save, then verify the file
Forgets something from 5 minutes ago	Context near 90%	`/compact` or `/clear`
Hook has no effect	preventContinuation missing	Set both fields in the hook response
Subagent talking nonsense	Insufficient context	Fatten up the delegation prompt
Cost spike	Cache miss	Stabilize CLAUDE.md
Parallel agents overwriting each other	Write scope not partitioned	Specify scope per agent
Plan keeps growing	Scope drift	Re-enter plan mode
Session is tangled	Multiple topics mixed	`/clear` then separate
Accidental git push	No guard	Deny in PreToolUse hook

Files to Open Directly

When something breaks, open these in order. ~/.claude/sessions/<proj>/session-<id>.jsonl has the raw current session. ~/.claude/history.jsonl has global history. ~/.claude/projects/<slug>/memory/MEMORY.md is the index memory; detailed memories are under topics/ below it. Hook, MCP, and LSP logs are under ~/.claude/logs/ (if present).

Takeaway When something's wrong, check three things — context contamination, permission block, token overflow. Most issues are one of these three.

Bonus

Your First 30 Minutes
in an Unfamiliar Repo

The actual sequence someone who knows the design intent follows when picking up a new project.

0–5 min: Session Setup

Open a new terminal, start Claude Code from the project root. Front-load goal, constraints, and verification criteria into the first prompt. That block becomes the cache prefix.

[goal] Understand this repo's architecture and beef up the README.
[constraints] Don't modify code. Read only.
[verification] Must be able to explain the main entry points, data flow, and dependency structure.

5–15 min: Parallel Explore Agents

Don't ask "figure out the project structure" directly. Instead, send three questions to subagents in parallel:

"Where are the entry points? Find main, index, and app files."
"What are the core data models? Find type definition files."
"What are the external dependencies? Read package.json / go.mod / requirements.txt."

While waiting for results, the main thread reads README.md and CLAUDE.md if they exist. The 3-file rule — main reads only; heavy exploration goes to subs.

15–25 min: Drawing the Map

When subagent results come back, synthesize in the main thread. "Entry points are X, data goes through Y to Z, external dependencies are A, B, C." That one paragraph is the mental model for this repo.

At this point, run /compact once. Exploration results have piled up in context — clean up now so the next work runs efficiently.

25–30 min: First Task

With the mental model in place, start the actual work (writing the README, fixing a bug, adding a feature). From here, delegate to an implementation agent (Sonnet) and let the main thread do judgment only.

The key is spending the first 15 minutes on exploration. Jump straight to editing code and the context gets contaminated with exploration residue. Isolate exploration in subagents and the main stays clean for focused implementation.

Takeaway First 15 minutes in a new repo: subagents explore. Next 15 minutes: main synthesizes and starts. Don't start editing right away.

Part Four

Clues the
Authors Left

Chapter 14

15 Design Intents

Answers to "why did they do it this way" — extracted from source analysis.

AsyncGenerator throughout

"The user must be able to interrupt at any time." LLM responses are long. Long means wanting to change direction. A complete-then-display model doesn't allow interruption.

→ The yield* chain inside queryLoop runs five levels deep. Cut it anywhere — everything up to the previous yield is preserved.

All tool defaults are conservative

"Don't break things quietly." isReadOnly, isConcurrencySafe, isDestructive all default to false. The biggest risk with LLM agents is silent destruction.

→ BashTool explicitly declares isConcurrencySafe: false — shell commands running in parallel create race conditions.

Schema lazy-loading via ToolSearch

"Schemas are tokens too." Loading all 100 tools into the system prompt costs thousands of tokens. Load only what's needed; search for the rest.

→ The deferred tool list is exposed by name only in the system-reminder block; the full JSON Schema joins the context only after a ToolSearch call.

Subagent isolation via AgentTool

"Protect the main mind." Context length degrades quality. Long explorations go in isolated minds; results come back as summaries only.

→ Subagents don't inherit the parent's allowedTools — if you don't specify them at spawn time, the subagent runs with only default tools.

memdir on the filesystem

"Let users look at it." AI memory stored in a database is hard to trust. Trust comes from transparency.

→ Filenames under ~/.claude/memory/ are session-ID-based — you can trace back which session wrote which memory entry.

Compact 4 stages + 3-failure circuit breaker

"No infinite loops." After repeated failure, give up. A silent infinite loop is as scary as silent destruction.

→ After 3 failures, it throws CompactionError and flips the session to read-only — blocking further writes to a broken state.

Hook's updatedInput

"Go beyond what we expected." block/allow alone isn't enough. Let hooks rewrite what the model wrote.

→ updatedInput is only valid in PreToolUse hooks — returning it from PostToolUse is silently ignored.

Plan mode as a separate state

"Exploration and execution use different minds." Force users to physically acknowledge the mode switch.

→ In plan mode, tool calls with isDestructive: true are automatically blocked — preventing accidental file changes at the source.

PKCE OAuth + Keychain storage

"Don't put API keys in text files." A rejection of the culture of hardcoding keys in .env.

→ Token refresh is handled by refreshToken logic; if Keychain write fails, the session persists via in-memory fallback.

VCR fixture system

"LLMs must be testable too." A test infrastructure that caches responses by message hash.

→ Fixture files are stored under __fixtures__/ with SHA-256 hash names; collisions get a sequential suffix.

Ink-based React terminal UI

"The terminal is an app." React instead of ncurses. Terminal programs deserve a modern web dev experience.

→ The useInput hook subscribes to key events, so signals like Ctrl+C are processed inside the React event cycle.

Sessions stored as NDJSON

"Must be recoverable." With plain JSON, a damaged file tail makes the whole file unreadable. With NDJSON, you can salvage by trimming the end.

→ Each line holds a {"type":"message","uuid":"...","timestamp":...} structure — one jq command extracts any specific turn.

Session recovery (trimming the last few lines and resuming) is covered in Chapter 09.

Stop hook blocking/non-blocking split

"Response speed matters more than completeness." Background work like memory extraction is fire-and-forget.

→ Whether a type: "stop" hook blocks is controlled by the blocking field in settings.json; the default is false.

Build-time elimination via feature flags

"Keep external builds lean." Internal experiment features become dead code in production builds.

→ Flags branch on a single IS_INTERNAL_BUILD env var; esbuild's define option tree-shakes the false branch.

Ask when permission is uncertain

"Slow is better than wrong." The cost of an LLM agent making a mistake exceeds the cost of bothering the user one more time.

→ A tool with needsPermission: true requires approval once per session — the same path pattern is auto-allowed after that.

Takeaway Read the design intents and you don't need a manual. Know why the authors built it this way and the applications follow naturally.

Epilogue

Closing

This book's central claim is one sentence. Using Claude Code well means following its internal structure.

Prompt engineering tips, cheat sheets, magic words — those are surface. Surfaces change when versions update. Design intent doesn't change easily. From the moment the authors decided "stream everything through a single async generator," through today, and going forward — the habits of someone who uses Claude Code well will remain aligned with that decision.

Seven-line summary Trust that you can interrupt and observe first. Stack more rules on top of a system that stops when uncertain. Treat tokens as a scarce resource. Protect the main mind and delegate the rest. Build escape hatches instead of repeating yourself in prompts. Separate exploration from execution. Keep memory in files you can open.

Everything else is application.

Everything is a Stream

Design Intent

What Changes for You

The First 5 Seconds Rule

Safe by Default

Design Intent

What Changes for You

Tokens are aScarce Resource

Design Intent

What Changes for You

Don't Edit CLAUDE.md Frequently

Front-load context into the first message

Opus for judgment, Sonnet for implementation, Haiku for repetition

How to Structure CLAUDE.md

Subagents areIsolated Minds

Design Intent

What Changes for You

The 3-File Rule for the Main Thread

You've read the first 4 chapters.

Hooks are the Escape Hatch

Design Intent

What Changes for You

Hooks can fail silently

Planning and Executingare Different States

Entry Criteria

Flow After Entry

"This also needs fixing" — twice

Memory isJust Files

Design Intent

What Changes for You

Don't write body text into MEMORY.md

The Shape of a Prompt

3-Part Structure · Goal + Constraints + Verification

Shapes That Don't Work

Units of Work — Turn · Session · Worktree

When to Use a Worktree

Resume a Session or Start Fresh?

NDJSON is intentional

Skill · Command · Hook

Examples in Practice

Connecting MCP Servers

When to Delegate

Delegation Signals

Don't Delegate Signals

Delegation Cost vs Explanation Cost

10 Habits That CutYour Bill in Half

Don't put examples in the system prompt

Narrow the scope when Read results are long

Grep > Bash grep

Run /compact manually

TodoWrite vs TaskCreate

Block unnecessary tool calls with hooks

Don't paste the entire error log

Control auto-continuation

Kill failing sessions early

Make /cost a routine

What to Check When Things Break

Files to Open Directly

Your First 30 Minutesin an Unfamiliar Repo

0–5 min: Session Setup

5–15 min: Parallel Explore Agents

15–25 min: Drawing the Map

25–30 min: First Task

15 Design Intents

AsyncGenerator throughout

All tool defaults are conservative

Schema lazy-loading via ToolSearch

Subagent isolation via AgentTool

memdir on the filesystem

Compact 4 stages + 3-failure circuit breaker

Hook's updatedInput

Plan mode as a separate state

PKCE OAuth + Keychain storage

VCR fixture system

Ink-based React terminal UI

Sessions stored as NDJSON

Stop hook blocking/non-blocking split

Build-time elimination via feature flags

Ask when permission is uncertain

Closing

Tokens are a
Scarce Resource

Subagents are
Isolated Minds

Planning and Executing
are Different States

Memory is
Just Files

10 Habits That Cut
Your Bill in Half

Run `/compact` manually

Make `/cost` a routine

Your First 30 Minutes
in an Unfamiliar Repo