The gap between an experienced Claude Code user and a newcomer isn't prompt skill. It's whether you know "what shape of input Claude Code was built to digest." Match your prompts, memory, and sessions to that shape, and you get two or three times the output for the same cost.
This isn't a book about the code itself. The code-centric version lives in the same repo at DEEP-DIVE.md. This book reverse-engineers "why was it built this way" from the internals, then lays out how to use it as a user, in alignment with that intent.
For example: why delegating long explorations to subagents keeps the main conversation alive much longer. Why editing CLAUDE.md frequently drives up costs. Why "fix X" gets better results than "could you help with this?" Why manually running /compact is more reliable than letting it run automatically. Why a hook returning block sometimes lets the next turn proceed anyway. All of it follows from design intent.
Authors' Minds
Everything is a Stream
The premise "the user must be able to interrupt at any time" governs the entire architecture.
Design Intent
A single turn in Claude Code is not a function call — it's an async generator. When user input arrives, every LLM response token, tool execution progress update, file change notification, and stop hook result flows through a single stream.
QueryEngine.ts · line 209export class QueryEngine {
async *submitMessage(
prompt: string | ContentBlockParam[],
options?: { uuid?: string; isMeta?: boolean },
): AsyncGenerator<SDKMessage, void, unknown>
}
The authors' reason for choosing this structure reduces to one thing. The longer the LLM response, the more users want to change direction mid-flight. Traditional CLIs use a complete-then-display model — interrupt and everything done so far is lost. Claude Code is the opposite. Interrupt and files already written stay on disk; if a tool was halfway through execution, that half is recorded.
for await, so cancellation, streaming, and backpressure come for free.What Changes for You
The First 5 Seconds Rule
On long tasks, watch the first 5–10 seconds without fail. If the model starts reading the wrong files, hit Ctrl+C right then. The later you interrupt, the more you end up with a "half right, half wrong" state that contaminates the context. A contaminated context spreads to the next turn.
Streaming makes mid-run intervention free. With traditional CLIs, once you fire you wait — which meant crafting careful prompts upfront. Claude Code flips that. Instead of spending 30 seconds perfecting a prompt, just fire it and judge after watching the first two or three tool_use calls. Spend less time on prompt engineering; spend more time observing.
auth.ts:42 to RFC 5322. If the first Read hits the right file, keep going — otherwise interrupt and re-direct.Safe by Default
"When in doubt, stop and ask." The biggest risk in an LLM agent isn't mistakes — it's silent mistakes.
Design Intent
Every tool default is set toward the conservative side. Create a new tool and it's automatically judged "not read-only," "not concurrency-safe," "not destructive." File edits require a Read first. Bash must pass through four independent permission layers.
Tool.ts · TOOL_DEFAULTSconst TOOL_DEFAULTS = {
isEnabled: () => true,
// When in doubt: serialize, assume writable, assume nondestructive — permissions check catches the rest
isConcurrencySafe: () => false,
isReadOnly: () => false,
isDestructive: () => false,
checkPermissions: async () => ({ behavior: 'allow' }),
toAutoClassifierInput: () => '',
}
BashTool is especially well-defended. Rather than cramming everything into one file, the authors split it across 7 independent files. Each layer is unaware of the others — if one is breached, the rest still hold.
What Changes for You
Trust the default permission system, then layer more restrictions on top with hooks. Many users run --dangerously-skip-permissions as their default or keep acceptEdits mode on throughout a session. That's a choice that works against the authors' intent. Use PreToolUse hooks instead.
You don't need to write "don't git push --force" in your prompt over and over. One line in settings.json hooks blocks it permanently.
This is exactly why the authors built hooks — so users can draw their own lines around unexpected risk.
Tokens are a
Scarce Resource
Context economics. Four mechanisms for saving tokens and how to use them.
Design Intent
Claude Code has four layers of token-saving mechanisms built in.
First, Deferred Tool Loading. Of dozens of commands and 20+ built-in tools, only the core set is loaded into the system prompt. The rest have their schemas loaded only when the model searches for them via ToolSearch. This alone shaves thousands of tokens from the initial prompt.
Second, 4-stage auto-compaction. It starts light, works through read-only segment summarization, and escalates to full LLM summarization only as needed. Immediately after compaction, the 5 most recently modified files are re-injected to preserve the context of what was just done.
Third, prompt caching discounts. When the same prefix repeats, cache reads cost 10% of the base price, while cache writes (the first load) cost 125%. Only the first call is slightly more expensive — every call after that is 90% off, so effective cost falls as conversations grow longer.
Fourth, independent subagent budgets. Running 4 agents in parallel costs 4x, but each has its own independent budget — so the main conversation never gets cut short because a subagent investigation used up the shared pool.
What Changes for You
Don't Edit CLAUDE.md Frequently
CLAUDE.md enters the system prompt as a stable prefix — it's the body of prompt caching. Change one sentence and the cache breaks; once the cache breaks, every subsequent call pays full price. Touch the global file no more than once a week. Anything you change often belongs in a project-specific .claude/CLAUDE.md.
Front-load context into the first message
If you put "here's what I'm working on, here are the constraints, here's the completion criteria" all in the first message, the whole thing lands in cache. Adding things piecemeal later — "oh and also this" — puts them outside the cached prefix, so the discount doesn't apply. If an image sits in the middle of the prefix, it busts the cache for everything after it. Always attach images at the end of your prompt.
Opus for judgment, Sonnet for implementation, Haiku for repetition
Running everything on a single model is throwing money away. You can swap models per subagent — send the exploration agent to Sonnet, reserve Opus for final decisions.
How to Structure CLAUDE.md
There's a structure that captures both cache efficiency and practicality at the same time.
Global (~/.claude/CLAUDE.md): Fixed facts about you only. Coding style, commit rules, off-limits instructions. Modify less than once a month. Under 20 lines.
Project (.claude/CLAUDE.md): Rules specific to this repo. Build commands, test methods, architecture summary. Changing it frequently is fine — it's a project-scoped cache, so it doesn't affect the global cache.
.claude/CLAUDE.md → caches stay independent~/.claude/CLAUDE.md · global example# Coding style
TypeScript first. React + Next.js + Tailwind.
Functional components + hooks. No classes.
camelCase variables, PascalCase components.
Comments explain "why" only.
# Commits
Conventional Commits (feat/fix/refactor/docs/test/chore).
No console.log debug code left in commits.
# Off-limits
Don't explain things I didn't ask about.
Don't add unnecessary dependencies to package.json.
.claude/CLAUDE.md · project example# This project
Next.js 15 + App Router. /src/app based.
DB: Supabase (Postgres). ORM: Drizzle.
# Build
pnpm dev → localhost:3000
pnpm test → vitest
pnpm lint → eslint + prettier
# Watch out
Don't touch /src/lib/auth.ts (auth logic being stabilized).
API routes go under /src/app/api/ only.
Subagents are
Isolated Minds
Protect the main mind. Long explorations go in isolated minds; the main does only judgment.
Design Intent
The most interesting thing about how AgentTool creates subagents is the isolation model. Subagents inherit all of the parent's permissions and tools, but cannot write to the main AppState. Only shared infrastructure like the task registry is an exception. Everything else follows the "parent is read-only; results come back as messages only" rule.
setAppState is a no-op inside subagents.Why does this matter? LLM quality degrades as context grows longer. Model vendors brag about "1 million token support," but in practice a conversation at 500K tokens gives hazier answers than one at 200K. The authors knew this. So keeping the main conversation as clean as possible was the top priority, and all "burns lots of tokens but produces a short result" work — exploration, investigation, verification — was pushed into isolated subagents.
What Changes for You
The 3-File Rule for the Main Thread
If the main thread is directly reading three or more files, stop. You're already off track. Spin up a Task tool or Explore agent and say "read these three files and summarize X." Only a one-paragraph summary comes back to the main context. Doing the same work in the main thread piles up thousands of tokens.
Don't be afraid to run subagents in parallel. They're isolated — no state collisions. Just be explicit about write scope per agent. "Agent A touches /services only, Agent B touches /components only." Without this in the prompt, two agents can hit the same file simultaneously and clobber each other.
Always verify with a separate agent. Asking the implementing agent "did you get it right?" always gets "yes." Spin up a code-reviewer agent to read the same output with fresh eyes. The reason the authors isolated AgentTool is precisely this fresh perspective.
You've read the first 4 chapters.
The remaining 10 chapters, bonus workflow, CLAUDE.md templates, and MCP setup guide are in the full ebook.
Ch 09 Work Units · Ch 10 Skills · Ch 11 Delegation · Ch 12 Cost Cutting
Ch 13 Debugging · Ch 14 Design Intents · Bonus: First 30 Min Workflow
Pay what you want · PDF · Instant download · Free updates
Hooks are the Escape Hatch
"Go beyond what we expected." The door the authors left open for users.
Design Intent
The hook system lets users attach scripts to events like PreToolUse, PostToolUse, SessionStart, UserPromptSubmit, and Stop. The most powerful feature is PreToolUse's updatedInput. A hook can rewrite the tool input the model just called, and return the modified version to be executed.
Hook response schema{
"continue": true,
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow",
// Hook rewrites the model's called input before execution
"updatedInput": {
"command": "git commit -m 'fix' --signoff"
},
"additionalContext": "signoff added automatically"
}
}
For example: the model calls git commit -m "fix". The hook intercepts it, rewrites it to git commit -m "fix" --signoff, and passes that to execution. The model thinks its original command went through. The user silently applied their own policy.
What Changes for You
Stop repeating instructions in prompts — encode them in hooks. Writing "don't git push --force," "don't rm -rf," "don't leave console.log" in every prompt is waste. Build it as a hook once and it works forever. Saves tokens. Prevents mistakes.
The distinction between hooks, skills, and commands is covered in Chapter 10. This chapter focuses on the hook internal mechanism only.
Hooks can fail silently
If you only return permissionDecision: 'deny', the tool call is blocked — but the model can immediately try the next tool. Add additionalContext with "this command is forbidden" so the model doesn't repeat the attempt. And always add logging to hook scripts.
Planning and Executing
are Different States
Exploration and modification use the brain differently. The authors made this a physical separation.
EnterPlanModeTool takes no input. It simply flips the session state to "plan mode." In this mode, all file-editing tools are locked. Only read-only tools survive — Read, Grep, Glob.
Why make this a separate state? Exploration requires "see as many possibilities as broadly as possible" thinking; modification requires "execute exactly one thing" thinking. Both LLMs and humans produce lower quality when they try to do both simultaneously.
Entry Criteria
Changes spanning five or more files. Unclear requirements. Irreversible operations — DB migrations, file deletion, git reset. Environment changes like framework upgrades. First exploration of an unfamiliar codebase. If any of these apply, entering plan mode first is the safer path.
Flow After Entry
Read broadly, organize questions, write a plan, critique the plan, exit with ExitPlanMode and execute. Skipping any of the five steps defeats the purpose of the plan. The most important step is the fourth: critique the plan. Asking the model "what are the weaknesses of this plan?" and "what cases are missing?" just once surfaces the gaps.
"This also needs fixing" — twice
If you hear "this also needs fixing" two or more times during a refactor, stop what you're doing and return to plan mode. The moment scope starts expanding is the signal that the plan has already gone off track.
Memory is
Just Files
Not a database — markdown. An architecture of trust, because users must be able to open and edit it themselves.
Design Intent
Claude Code's memory isn't a database. It's stored as markdown files under ~/.claude/projects/<project>/memory/. MEMORY.md is the index; details live in individual files under topics/. The index has a 25KB / 200-line limit — exceed it and it's truncated with a warning.
What Changes for You
Don't write body text into MEMORY.md
This is an index file. Fill it with one-line entries only. Details go in individual files under topics/. Ignore this and when you hit 200 lines and it truncates, important information disappears.
Default uses by type. The user type holds long-term facts about you — set once, rarely touched. feedback holds recurring mistake corrections — it grows every time you catch a mistake. project holds project structure — be careful with things that change frequently. reference holds external link collections.
Version-control it with git. Turn ~/.claude/projects/<slug>/memory/ into a git repo and you can track "when was this memory added and by what." When a memory corruption incident happens — the model remembering a wrong fact — you can roll back.
You Use It?
The Shape of a Prompt
Claude Code digests imperative single sentences in a 3-part structure with explicit files and line numbers.
Claude Code's system prompt already lays in "solve it with tools." So the user prompt needs to be an instruction, not an explanation. Ask for an answer and it gives up on using tools. Ask for long exploration and it delegates to a subagent.
LoginForm.tsx:42 to RFC 5322. Don't touch the UI. Add 3 tests. No changes to package.json.3-Part Structure · Goal + Constraints + Verification
[goal] Fix the email validation in the login form to RFC 5322.
[files] src/components/LoginForm.tsx, src/utils/validation.ts
[constraints] Don't touch UI markup. Keep existing props signature.
[verification] npm test must pass. Add 3 new test cases.
[off-limits] Don't touch any other form components.
Turn this into a copy-paste template and thinking time shrinks. Goal alone and it breaks constraints arbitrarily. Constraints alone and the goal blurs. No verification and it terminates at "roughly done."
Shapes That Don't Work
Requests that start with "actually I was thinking…" Putting an unfinished thought process into the prompt blurs the model too. Decide first, then ask.
Multiple tasks in one prompt. "Do A and B and C" almost always gets B done sloppily. One turn, one task.
"If possible…" Conditional requests move the LLM toward avoiding the condition judgment. If you mean yes, say yes. If you mean no, say no.
Units of Work — Turn · Session · Worktree
Confuse the three and you'll suffer for it.
| Unit | Size | Criterion |
|---|---|---|
| Turn | 1 submitMessage call | One change |
| Session | Full resumable conversation | One feature · one topic |
| Worktree | Isolated git directory | High-risk experiment |
Feature complete means start a new session. Experimental change means enter a worktree. Several small fixes means multiple turns in one session. Start each day by deciding whether to resume yesterday's session or start fresh.
When to Use a Worktree
When you want to experiment without touching the existing branch. EnterWorktree creates a copy of the current repo in a temporary directory; you work freely inside it. If the result looks good, merge. If not, throw it away. The main code stays clean — no experiment residue.
Resume a Session or Start Fresh?
Resume when: follow-up work on the same feature. Adding tests after a bug fix. Incorporating review feedback. The context is still valid.
Start fresh when: switching to a different domain. The session has grown long and the model is starting to forget early instructions. You've run /compact twice and the context is still fuzzy. In these cases, cut the session and open a new one with "just finished X, next is Y."
The decision criterion is simple. "Does this session still know what I want?" If not, start fresh.
NDJSON is intentional
A broken session can be opened directly at session-<id>.jsonl. NDJSON means one line per message. Trim the last few lines and resume — you're back to the state just before the incident. Choosing NDJSON over plain JSON was a deliberate author decision: when something goes wrong, you can salvage the session by cutting from the end.
Skill · Command · Hook
Multiple extension methods get confusing. A simple decision tree.
Examples in Practice
"Publish a blog post to 3 platforms simultaneously." A frequent natural-language trigger. Make it a Skill. Write "blog publishing" in whenToUse and the model invokes it automatically.
"Always run lint before git push." Needs to fire automatically every time. Make it a Hook. Attach a matcher to the PreToolUse event for Bash; if the command is git push, run lint first. Don't write this in every prompt.
Hook internal mechanics (updatedInput, blocking branches) were covered in Chapter 05.
"I need a summary of this project's current state." A shortcut I run when I want it. Command. Type /status and get the summary.
"I need to pull tasks from Notion." External system connection. MCP Server. Connect an MCP server wrapping the Notion API. Call the MCP server's tools from inside a skill or command.
Connecting MCP Servers
MCP (Model Context Protocol) is the standard for connecting Claude Code to external systems. It lets you use services like Notion, Gmail, Linear, and Slack as tools.
Configure in .claude/mcp.json.
.claude/mcp.json{
"mcpServers": {
"github": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "ghp_..."
}
},
"filesystem": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"]
}
}
}
stdio vs SSE. Spawning a local process uses stdio. Connecting to a remote server uses sse with a URL. Most community servers are stdio.
Note. When an MCP server is registered, its tools are added to ToolSearch as deferred tools. The model searches for and calls them on demand. They don't load into the system prompt on every turn, so the token cost stays low.
When to Delegate
The boundary between what the main does and what subagents do.
Delegation Signals
You need to read 3 or more files. The output is a summary or report, not a file modification. The same pattern of work needs to happen independently multiple times. The result doesn't need to persist in the main context. You don't want to contaminate the current context. Any of these — delegate to a subagent.
Don't Delegate Signals
The opposite: stay in main when this task's result is the premise for the next turn. When you need to adjust things interactively with the user. When interactive decisions are required. When the domain is unclear enough that you can't write a spec for the subagent.
For write-scope partitioning and agent count limits in parallel delegation, see Chapter 04. This chapter focuses on when to decide to delegate.
Delegation Cost vs Explanation Cost
If writing the prompt for a subagent takes longer than doing it yourself, don't delegate. Writing a 10-line prompt to fix 3 lines is a net loss. Delegation pays off only when the exploration scope is wide or there are 2+ independent domains.
Efficiency
10 Habits That Cut
Your Bill in Half
Follow the design intent and the costs naturally fall.
Don't put examples in the system prompt
Pasting long "do it like this" examples into CLAUDE.md inflates input tokens on every turn. Put examples in the first prompt once; keep rules only in CLAUDE.md.
Narrow the scope when Read results are long
Use the offset and limit parameters to read only the lines you need. Reading an entire 2,000-line file dumps all of it into context.
Grep > Bash grep
The built-in Grep tool is more token-efficient than running grep or rg through Bash. Dedicated tools return structured results; Bash output is raw text that eats into context.
Run /compact manually
More predictable than auto. Right after finishing a feature is the best time.
TodoWrite vs TaskCreate
Using prompts to manage to-do lists costs tokens on every turn. TaskCreate/TaskUpdate manages state in a separate store, keeping it out of context.
Block unnecessary tool calls with hooks
When the model accidentally tries to read a large file, a hook that blocks it saves tokens.
Don't paste the entire error log
Dumping a 500-line stack trace makes the model miss the signal. Paste only the first error message and the 5–10 relevant lines. For the rest: "full log is at /tmp/error.log" is enough.
Control auto-continuation
Explicitly lower the token budget to reduce incidents where the model just keeps running and burns through the budget.
Kill failing sessions early
If the model goes off the rails twice in one session, /clear. Continuing with a contaminated context just burns money.
Make /cost a routine
Check it a few times a day. You can't find where the leaks are if you don't know how much you're spending.
What to Check When Things Break
Symptom → cause → fix cheat sheet.
| Symptom | Cause | Fix |
|---|---|---|
| Acts like it doesn't know a file it just edited | autocompact restoration limit (5 files/50KB) | Explicitly instruct it to Read that file again |
| Repeating the same mistake | No feedback memory | Explicitly request a save, then verify the file |
| Forgets something from 5 minutes ago | Context near 90% | /compact or /clear |
| Hook has no effect | preventContinuation missing | Set both fields in the hook response |
| Subagent talking nonsense | Insufficient context | Fatten up the delegation prompt |
| Cost spike | Cache miss | Stabilize CLAUDE.md |
| Parallel agents overwriting each other | Write scope not partitioned | Specify scope per agent |
| Plan keeps growing | Scope drift | Re-enter plan mode |
| Session is tangled | Multiple topics mixed | /clear then separate |
| Accidental git push | No guard | Deny in PreToolUse hook |
Files to Open Directly
When something breaks, open these in order. ~/.claude/sessions/<proj>/session-<id>.jsonl has the raw current session. ~/.claude/history.jsonl has global history. ~/.claude/projects/<slug>/memory/MEMORY.md is the index memory; detailed memories are under topics/ below it. Hook, MCP, and LSP logs are under ~/.claude/logs/ (if present).
Your First 30 Minutes
in an Unfamiliar Repo
The actual sequence someone who knows the design intent follows when picking up a new project.
0–5 min: Session Setup
Open a new terminal, start Claude Code from the project root. Front-load goal, constraints, and verification criteria into the first prompt. That block becomes the cache prefix.
[goal] Understand this repo's architecture and beef up the README.
[constraints] Don't modify code. Read only.
[verification] Must be able to explain the main entry points, data flow, and dependency structure.
5–15 min: Parallel Explore Agents
Don't ask "figure out the project structure" directly. Instead, send three questions to subagents in parallel:
- "Where are the entry points? Find main, index, and app files."
- "What are the core data models? Find type definition files."
- "What are the external dependencies? Read package.json / go.mod / requirements.txt."
While waiting for results, the main thread reads README.md and CLAUDE.md if they exist. The 3-file rule — main reads only; heavy exploration goes to subs.
15–25 min: Drawing the Map
When subagent results come back, synthesize in the main thread. "Entry points are X, data goes through Y to Z, external dependencies are A, B, C." That one paragraph is the mental model for this repo.
At this point, run /compact once. Exploration results have piled up in context — clean up now so the next work runs efficiently.
25–30 min: First Task
With the mental model in place, start the actual work (writing the README, fixing a bug, adding a feature). From here, delegate to an implementation agent (Sonnet) and let the main thread do judgment only.
The key is spending the first 15 minutes on exploration. Jump straight to editing code and the context gets contaminated with exploration residue. Isolate exploration in subagents and the main stays clean for focused implementation.
Authors Left
15 Design Intents
Answers to "why did they do it this way" — extracted from source analysis.
AsyncGenerator throughout
"The user must be able to interrupt at any time." LLM responses are long. Long means wanting to change direction. A complete-then-display model doesn't allow interruption.
→ The yield* chain inside queryLoop runs five levels deep. Cut it anywhere — everything up to the previous yield is preserved.
All tool defaults are conservative
"Don't break things quietly." isReadOnly, isConcurrencySafe, isDestructive all default to false. The biggest risk with LLM agents is silent destruction.
→ BashTool explicitly declares isConcurrencySafe: false — shell commands running in parallel create race conditions.
Schema lazy-loading via ToolSearch
"Schemas are tokens too." Loading all 100 tools into the system prompt costs thousands of tokens. Load only what's needed; search for the rest.
→ The deferred tool list is exposed by name only in the system-reminder block; the full JSON Schema joins the context only after a ToolSearch call.
Subagent isolation via AgentTool
"Protect the main mind." Context length degrades quality. Long explorations go in isolated minds; results come back as summaries only.
→ Subagents don't inherit the parent's allowedTools — if you don't specify them at spawn time, the subagent runs with only default tools.
memdir on the filesystem
"Let users look at it." AI memory stored in a database is hard to trust. Trust comes from transparency.
→ Filenames under ~/.claude/memory/ are session-ID-based — you can trace back which session wrote which memory entry.
Compact 4 stages + 3-failure circuit breaker
"No infinite loops." After repeated failure, give up. A silent infinite loop is as scary as silent destruction.
→ After 3 failures, it throws CompactionError and flips the session to read-only — blocking further writes to a broken state.
Hook's updatedInput
"Go beyond what we expected." block/allow alone isn't enough. Let hooks rewrite what the model wrote.
→ updatedInput is only valid in PreToolUse hooks — returning it from PostToolUse is silently ignored.
Plan mode as a separate state
"Exploration and execution use different minds." Force users to physically acknowledge the mode switch.
→ In plan mode, tool calls with isDestructive: true are automatically blocked — preventing accidental file changes at the source.
PKCE OAuth + Keychain storage
"Don't put API keys in text files." A rejection of the culture of hardcoding keys in .env.
→ Token refresh is handled by refreshToken logic; if Keychain write fails, the session persists via in-memory fallback.
VCR fixture system
"LLMs must be testable too." A test infrastructure that caches responses by message hash.
→ Fixture files are stored under __fixtures__/ with SHA-256 hash names; collisions get a sequential suffix.
Ink-based React terminal UI
"The terminal is an app." React instead of ncurses. Terminal programs deserve a modern web dev experience.
→ The useInput hook subscribes to key events, so signals like Ctrl+C are processed inside the React event cycle.
Sessions stored as NDJSON
"Must be recoverable." With plain JSON, a damaged file tail makes the whole file unreadable. With NDJSON, you can salvage by trimming the end.
→ Each line holds a {"type":"message","uuid":"...","timestamp":...} structure — one jq command extracts any specific turn.
Session recovery (trimming the last few lines and resuming) is covered in Chapter 09.
Stop hook blocking/non-blocking split
"Response speed matters more than completeness." Background work like memory extraction is fire-and-forget.
→ Whether a type: "stop" hook blocks is controlled by the blocking field in settings.json; the default is false.
Build-time elimination via feature flags
"Keep external builds lean." Internal experiment features become dead code in production builds.
→ Flags branch on a single IS_INTERNAL_BUILD env var; esbuild's define option tree-shakes the false branch.
Ask when permission is uncertain
"Slow is better than wrong." The cost of an LLM agent making a mistake exceeds the cost of bothering the user one more time.
→ A tool with needsPermission: true requires approval once per session — the same path pattern is auto-allowed after that.
Closing
This book's central claim is one sentence. Using Claude Code well means following its internal structure.
Prompt engineering tips, cheat sheets, magic words — those are surface. Surfaces change when versions update. Design intent doesn't change easily. From the moment the authors decided "stream everything through a single async generator," through today, and going forward — the habits of someone who uses Claude Code well will remain aligned with that decision.
Everything else is application.