NSF NCAR Tutorial · 2026-06-26

AI-Augmented Research & Development

How I Use AI for
Research & Development

Patterns, not a product pitch — a worked tour of how an NSF NCAR researcher actually runs claude.ai and Claude Code day to day.

↻ Meta

This deck was researched, outlined, built and reviewed by the exact multi-agent workflow it teaches.

Both

Choose your track

NSF NCAR
Scientist claude.ai · papers, analysis, figures Engineer Claude Code · refactors, tests, orchestration SHARED CORE · Privacy & Terms of Use · MCP security · Model & effort selection · Token economics

Depth = how technical

L1Conceptual — what & why
L2Practical — how, recipes, tradeoffs
L3Expert — config, schema, code

The top row is shared and complete. You lose nothing by staying on it — your track is just the floor you stop on.

▤ Cheat sheet → newcomer-skeptic

01

Act One

The throughline: small teams of agents

Both

The one move: decompose, assign altitude, make hand-offs concrete

NSF NCAR

I don't "chat with an AI." I stand up small, role-specialized teams of agents.

1 · Decompose Split the goal into scoped roles: planner · implementer · critic · fetcher 2 · Assign altitude Each agent gets a persona + a budget: model + effort. 3 · Concrete hand-off Artifacts carry state: review docs, HTML decks, JSON change-sets.

Everything else in this talk — MCP, skills, the collab wrapper, model routing — is a variation on this single move.

Both↻ Meta

Sub-agents, workflows & personas

NSF NCAR

The actual DAG that built this deck — each node a child Claude with its own context window, model and effort.

PHASE 1 · DISCOVER PHASE 2 · SYNTHESIZE PHASE 3 · BUILD PHASE 4 · CRITIQUE Orchestrator main loop 9× discovery agents 8× persona reviewers …parallel fan-out Sonnet · medium barrier · data dependency 3× synthesis Opus · high 8× build agents outline · slides · sheets Sonnet · medium self-critique Opus · high

Personas are cheap reviewers with named blind spots. Isolation buys focus & security — not correctness. You add that at the seams.

▤ Cheat sheet → orchestrator

02

Act Two

The two surfaces

Both

claude.ai vs Claude Code

NSF NCAR
claude.ai (web · scientist home) Claude Code (CLI · engineer home)
AccessConversational, self-containedReads & writes real files in your repo
ToolsArtifacts you copy outRuns commands, tests, MCP, sub-agents
AutonomyOne turn at a timeIterates, schedulable, remote-controllable
VerificationHuman eyeballsAgent runs tests / Playwright for real
Cost shapeLower per turn……but shares the same weekly bucket
Tipping point Copy-pasting file contents back and forth more than twice? → switch to the CLI. Most prompting patterns transfer either way.

03

Act Three

Extending Claude Code

Both

MCP servers: typed tools, not guesswork

NSF NCAR

MCP lets the model call a typed, authenticated tool against the live system instead of guessing from training data.

Modelyour prompt MCP servertyped · authenticated Live systemAPI · DB · files · logs ✓ correct acts on reality Modelyour prompt training datastale · plausible ✗ guessing

My setup: 6 plugin MCP servers, authenticated per-session via /mcp rather than left always-on. Security is the next slide.

Both★ Must-have

MCP security: least privilege limits the blast radius

NSF NCAR
Unscoped stdio server — runs as YOU server whole home dir · secrets · network Scoped, read-only, pinned PROJECT DIR ONLY server human-approved writes · allowlist

Two attack surfaces

  • The model can be tricked into calling a tool (prompt injection).
  • The tool itself can be malicious (supply chain).

Mitigations, in order

First-party servers + pinned versions → scope to a project dir → no creds in the env → prefer read-only → human approval for write/exec → keep an allowlist. Project-scoped .mcp.json is the secure default.

▤ Cheat sheet → security-governance
Both

Skills: progressive disclosure

NSF NCAR
In context just a short description trigger Full skill loads full instructions scripts & resources retrieval of live docs

15 installed. Many are retrieval-first — they fetch live docs over training memory, fixing confidently-stale knowledge on fast-moving platforms.

In-house proof

ncar-brand-toolkit@local — SVG waves, logo lookup, brand & accessibility rules.

Both

Plugins: one versioned container

NSF NCAR
Plugin · installable · versioned Skillspackaged prompts Agentsnamed roles & personas Commandsslash-commands Hookspre/post gates MCP.mcp.json servers

A plugin bundles skills + agents + commands + hooks + MCP into one installable unit — how a team standardizes practice instead of re-teaching each engineer.

22+ installed: official (code-review, security-guidance, …), the Cloudflare bundle, and my ncar-brand-toolkit@local.

04

Act Four

Tuning the dials

Both★ Must-have

Model & effort: route, don't max

NSF NCAR
Task type Model Effort
Discovery · grep · triageHaiku→Sonnetlow–medium
Codegen · refactor · draftingSonnetmedium
Boilerplate · retrievalSonnetlow
Synthesis · architecture · judgmentOpushigh
Multi-step physics · legacy portsOpushigh–xhigh

Default: Sonnet + medium. Escalate only when the first pass is wrong, or being wrong is costly.

Keep two Opus ratios distinct: ~5× Sonnet per API token (price) ≠ the ~10–12× weekly-hours gap on subscriptions. The four-rung effort ladder is a CLI/API dial; claude.ai is coarser — a model picker plus an extended-thinking toggle.

Both↻ Meta

Artifact-driven interaction: HTML + SVG

NSF NCAR

For dense answers, ask for a self-contained HTML artifact with SVG — a wall of prose becomes spatial and interactive.

claude.ai · Artifacts

A sandboxed, size-limited preview pane. Great for quick, self-contained explainers you copy out.

Claude Code · on disk

Writes a self-contained file to disk — no size sandbox. Portable evidence: open offline, embed in a deck, attach to a PR.

SVG is resolution-independent, diffable as text, and the model authors it directly. Every diagram in this deck is this pattern.

▤ Cheat sheet → research-scientist
Both⭐ Signature

The collab wrapper: point, don't describe

NSF NCAR

Click any element → comment or edit its text → export one JSON change-set → Claude applies all edits in a single batch pass.

1 · Selectclick any element 2 · Comment / editinline change 3 · Export JSONstable data-collab-id 4 · One batch promptN edits → ~1 pass 5 · Batched diffreview & commit

It kills the N-prompt deixis problem and the JSON is the audit trail. Honest status: articulated, not yet built — the worktree is near-empty.

▤ Cheat sheet → orchestrator

05

Act Five

Economics & honest limits

Both

Co-development modalities & their token cost

NSF NCAR
Modality Trade Relative tokens
a · Copy/paste chatMax human effort, can't verify, lowest risk▍ low
b · Inline autocomplete
external tool — not Claude Code
Copilot / Cursor Tab / Windsurf; high frequency▍▍ moderate
c · CLI single-agentOne rich context, many tool-call turns▍▍▍ moderate–high
d · CLI workflow + sub-agentsMin labor, max capability, hardest to inspect▍▍▍▍ highest

I live in (d) — tiering models inside it (Haiku → Sonnet → Opus) is the biggest lever for keeping it affordable. LSP (pyright/clangd/gopls) is something else again: deterministic, zero-token code intelligence — not a completion modality.

Both★ Must-have

The token model — Claude vs Codex vs Antigravity

NSF NCAR
⚠️Every figure here is a placeholder — as of 2026-06-26, verify live. Teach the framework, not the number; prefer a live /status check.
Claude Codex Antigravity
Short window~5-hr rolling5-hr msg rangesagent-request
Long windowweekly + Opus capcredit rate card~tripled at I/O
Meteringtokens × premiumtokens (since 04-02)requests
Training defaultconsumer opt-outverifyverify
Overagecredit top-upsbuy creditsown API key ⚠️

Two buckets, everywhere

~5-hr window burst limit weekly cap rolling 7-day

"Suddenly less helpful mid-conversation" = you hit a window. Design long jobs to be resumable, not all-or-nothing.

06

Act Six

The deck as the demo

Both↻ Meta

The deck explains itself

NSF NCAR

A horizontal spine everyone follows, with optional vertical descents that get more technical the deeper you press — the structure is the information hierarchy.

HORIZONTAL SPINE — the core path (~25 min) L1 concept L2 practice L3 expert L1 L2 L1 L2 L3 press ▾ for deeper — scientists stay up top, engineers descend

07

Act Seven

Honesty & close

Both

The things AI gets confidently wrong

NSF NCAR
Verify-then-trust loop Generate Assert · baseline· cite the formula Diff · critic passnever the only check Commit withprovenance
VerificationFluently wrong code runs cleanly — test units, ranges, conservation; check a known-good baseline.
ReproducibilityCommit code + a provenance record (model+version, effort, prompt, diff) — not the conversation.
Air-gapThere is no fully air-gapped Claude. Bedrock/Vertex give residency + no-train for restricted-but-not-air-gapped data.
AttributionAI is not an author — disclose tool + model+version + what it did. Over-disclose.
Both

What transfers, what's lock-in

NSF NCAR
Patterns ~80% transfers Claude Codex Antigravity adapter rings = replaceable plumbing

Transferable (~80%)

Decomposition, role/persona design, context-setting, verify/critic loops, artifact-driven output, batching edits, treating output as reviewable.

Non-transferable plumbing

CLI/sub-agent mechanics, skill & plugin format, .mcp.json wiring, model names, effort knobs.

▤ Cheat sheet → prompting-patterns
Both↻ Meta

How I'd improve this workflow

NSF NCAR

Current

Workflow DAG lives in orchestrator prose
xhigh everywhere in the main loop; no Haiku tier
Collab wrapper is an idea, no artifact

Improved

Versioned workflow schema + published plugin agents
Tiered Haiku → Sonnet → Opus + token/hour telemetry
Collab wrapper shipped — stable IDs, validated schema

Plus: harden unattended autonomy (least-privilege + worktree sandbox + PreToolUse block-hooks), and add validation at the seams — there's no built-in inter-agent correctness guarantee.

Both↻ Meta

Recap & cheat sheets

NSF NCAR

The whole spine in one line: decompose → assign altitude → concrete, inspectable hand-offs.

The full handout set

Scientist

track home · artifacts

Engineer

track home · skills

security-governance

H5 · V5.x · V14.2

orchestrator

H3 · H10

prompting-patterns

H15

newcomer-skeptic

H1 · trust slides

pi-manager

full handout set

This deck

built by the method it teaches

The 13 highest-return questions are the backbone of this talk — let's take them.