Tutorial · 2026-06-26
AI-Augmented Research & Development
Patterns, not a product pitch — a worked tour of how an NSF NCAR researcher actually runs claude.ai and Claude Code day to day.
This deck was researched, outlined, built and reviewed by the exact multi-agent workflow it teaches.
Depth = how technical
The top row is shared and complete. You lose nothing by staying on it — your track is just the floor you stop on.
▤ Cheat sheet → newcomer-skeptic01
Act One
I don't "chat with an AI." I stand up small, role-specialized teams of agents.
Everything else in this talk — MCP, skills, the collab wrapper, model routing — is a variation on this single move.
The actual DAG that built this deck — each node a child Claude with its own context window, model and effort.
Personas are cheap reviewers with named blind spots. Isolation buys focus & security — not correctness. You add that at the seams.
▤ Cheat sheet → orchestrator02
Act Two
| claude.ai (web · scientist home) | Claude Code (CLI · engineer home) | |
|---|---|---|
| Access | Conversational, self-contained | Reads & writes real files in your repo |
| Tools | Artifacts you copy out | Runs commands, tests, MCP, sub-agents |
| Autonomy | One turn at a time | Iterates, schedulable, remote-controllable |
| Verification | Human eyeballs | Agent runs tests / Playwright for real |
| Cost shape | Lower per turn… | …but shares the same weekly bucket |
03
Act Three
MCP lets the model call a typed, authenticated tool against the live system instead of guessing from training data.
My setup: 6 plugin MCP servers, authenticated per-session via /mcp rather than left always-on. Security is the next slide.
Two attack surfaces
Mitigations, in order
First-party servers + pinned versions → scope to a project dir → no creds in the env → prefer read-only → human approval for write/exec → keep an allowlist. Project-scoped .mcp.json is the secure default.
15 installed. Many are retrieval-first — they fetch live docs over training memory, fixing confidently-stale knowledge on fast-moving platforms.
In-house proof
ncar-brand-toolkit@local — SVG waves, logo lookup, brand & accessibility rules.
A plugin bundles skills + agents + commands + hooks + MCP into one installable unit — how a team standardizes practice instead of re-teaching each engineer.
22+ installed: official (code-review, security-guidance, …), the Cloudflare bundle, and my ncar-brand-toolkit@local.
04
Act Four
| Task type | Model | Effort |
|---|---|---|
| Discovery · grep · triage | Haiku→Sonnet | low–medium |
| Codegen · refactor · drafting | Sonnet | medium |
| Boilerplate · retrieval | Sonnet | low |
| Synthesis · architecture · judgment | Opus | high |
| Multi-step physics · legacy ports | Opus | high–xhigh |
Default: Sonnet + medium. Escalate only when the first pass is wrong, or being wrong is costly.
Keep two Opus ratios distinct: ~5× Sonnet per API token (price) ≠ the ~10–12× weekly-hours gap on subscriptions. The four-rung effort ladder is a CLI/API dial; claude.ai is coarser — a model picker plus an extended-thinking toggle.
For dense answers, ask for a self-contained HTML artifact with SVG — a wall of prose becomes spatial and interactive.
claude.ai · Artifacts
A sandboxed, size-limited preview pane. Great for quick, self-contained explainers you copy out.
Claude Code · on disk
Writes a self-contained file to disk — no size sandbox. Portable evidence: open offline, embed in a deck, attach to a PR.
SVG is resolution-independent, diffable as text, and the model authors it directly. Every diagram in this deck is this pattern.
▤ Cheat sheet → research-scientist
Click any element → comment or edit its text → export one JSON change-set → Claude applies all edits in a single batch pass.
It kills the N-prompt deixis problem and the JSON is the audit trail. Honest status: articulated, not yet built — the worktree is near-empty.
▤ Cheat sheet → orchestrator05
Act Five
| Modality | Trade | Relative tokens |
|---|---|---|
| a · Copy/paste chat | Max human effort, can't verify, lowest risk | ▍ low |
b · Inline autocomplete external tool — not Claude Code | Copilot / Cursor Tab / Windsurf; high frequency | ▍▍ moderate |
| c · CLI single-agent | One rich context, many tool-call turns | ▍▍▍ moderate–high |
| d · CLI workflow + sub-agents | Min labor, max capability, hardest to inspect | ▍▍▍▍ highest |
I live in (d) — tiering models inside it (Haiku → Sonnet → Opus) is the biggest lever for keeping it affordable. LSP (pyright/clangd/gopls) is something else again: deterministic, zero-token code intelligence — not a completion modality.
/status check.
| Claude | Codex | Antigravity | |
|---|---|---|---|
| Short window | ~5-hr rolling | 5-hr msg ranges | agent-request |
| Long window | weekly + Opus cap | credit rate card | ~tripled at I/O |
| Metering | tokens × premium | tokens (since 04-02) | requests |
| Training default | consumer opt-out | verify | verify |
| Overage | credit top-ups | buy credits | own API key ⚠️ |
Two buckets, everywhere
"Suddenly less helpful mid-conversation" = you hit a window. Design long jobs to be resumable, not all-or-nothing.
06
Act Six
A horizontal spine everyone follows, with optional vertical descents that get more technical the deeper you press — the structure is the information hierarchy.
07
Act Seven
Transferable (~80%)
Decomposition, role/persona design, context-setting, verify/critic loops, artifact-driven output, batching edits, treating output as reviewable.
Non-transferable plumbing
CLI/sub-agent mechanics, skill & plugin format, .mcp.json wiring, model names, effort knobs.
Current
Improved
Plus: harden unattended autonomy (least-privilege + worktree sandbox + PreToolUse block-hooks), and add validation at the seams — there's no built-in inter-agent correctness guarantee.
The whole spine in one line: decompose → assign altitude → concrete, inspectable hand-offs.
The full handout set
Scientist
track home · artifacts
Engineer
track home · skills
security-governance
H5 · V5.x · V14.2
orchestrator
H3 · H10
prompting-patterns
H15
newcomer-skeptic
H1 · trust slides
pi-manager
full handout set
This deck
built by the method it teaches
The 13 highest-return questions are the backbone of this talk — let's take them.