Tutorial · 2026-06-26

AI-Augmented Research & Development

How I Use AI for
Research & Development

Patterns, not a product pitch — a worked tour of how an NSF NCAR researcher actually runs claude.ai and Claude Code day to day.

↻ Meta

This deck was researched, outlined, built and reviewed by the exact multi-agent workflow it teaches.

Both

Choose your track

Depth = how technical

L1Conceptual — what & why

L2Practical — how, recipes, tradeoffs

L3Expert — config, schema, code

The top row is shared and complete. You lose nothing by staying on it — your track is just the floor you stop on.

▤ Cheat sheet → newcomer-skeptic

Both

The one move: decompose, assign altitude, make hand-offs concrete

I don't "chat with an AI." I stand up small, role-specialized teams of agents.

Everything else in this talk — MCP, skills, the collab wrapper, model routing — is a variation on this single move.

Both↻ Meta

Sub-agents, workflows & personas

The actual DAG that built this deck — each node a child Claude with its own context window, model and effort.

Personas are cheap reviewers with named blind spots. Isolation buys focus & security — not correctness. You add that at the seams.

▤ Cheat sheet → orchestrator

Both

claude.ai vs Claude Code

	claude.ai (web · scientist home)	Claude Code (CLI · engineer home)
Access	Conversational, self-contained	Reads & writes real files in your repo
Tools	Artifacts you copy out	Runs commands, tests, MCP, sub-agents
Autonomy	One turn at a time	Iterates, schedulable, remote-controllable
Verification	Human eyeballs	Agent runs tests / Playwright for real
Cost shape	Lower per turn…	…but shares the same weekly bucket

Tipping point Copy-pasting file contents back and forth more than twice? → switch to the CLI. Most prompting patterns transfer either way.

Both

MCP servers: typed tools, not guesswork

MCP lets the model call a typed, authenticated tool against the live system instead of guessing from training data.

My setup: 6 plugin MCP servers, authenticated per-session via /mcp rather than left always-on. Security is the next slide.

Both★ Must-have

MCP security: least privilege limits the blast radius

Two attack surfaces

The model can be tricked into calling a tool (prompt injection).
The tool itself can be malicious (supply chain).

Mitigations, in order

First-party servers + pinned versions → scope to a project dir → no creds in the env → prefer read-only → human approval for write/exec → keep an allowlist. Project-scoped .mcp.json is the secure default.

▤ Cheat sheet → security-governance

Both

Skills: progressive disclosure

15 installed. Many are retrieval-first — they fetch live docs over training memory, fixing confidently-stale knowledge on fast-moving platforms.

In-house proof

ncar-brand-toolkit@local — SVG waves, logo lookup, brand & accessibility rules.

Both

Plugins: one versioned container

A plugin bundles skills + agents + commands + hooks + MCP into one installable unit — how a team standardizes practice instead of re-teaching each engineer.

22+ installed: official (code-review, security-guidance, …), the Cloudflare bundle, and my ncar-brand-toolkit@local.

Both★ Must-have

Model & effort: route, don't max

Task type	Model	Effort
Discovery · grep · triage	Haiku→Sonnet	low–medium
Codegen · refactor · drafting	Sonnet	medium
Boilerplate · retrieval	Sonnet	low
Synthesis · architecture · judgment	Opus	high
Multi-step physics · legacy ports	Opus	high–xhigh

Default: Sonnet + medium. Escalate only when the first pass is wrong, or being wrong is costly.

Keep two Opus ratios distinct: ~5× Sonnet per API token (price) ≠ the ~10–12× weekly-hours gap on subscriptions. The four-rung effort ladder is a CLI/API dial; claude.ai is coarser — a model picker plus an extended-thinking toggle.

Both↻ Meta

Artifact-driven interaction: HTML + SVG

For dense answers, ask for a self-contained HTML artifact with SVG — a wall of prose becomes spatial and interactive.

claude.ai · Artifacts

A sandboxed, size-limited preview pane. Great for quick, self-contained explainers you copy out.

Claude Code · on disk

Writes a self-contained file to disk — no size sandbox. Portable evidence: open offline, embed in a deck, attach to a PR.

SVG is resolution-independent, diffable as text, and the model authors it directly. Every diagram in this deck is this pattern.

▤ Cheat sheet → research-scientist

Both⭐ Signature

The collab wrapper: point, don't describe

Click any element → comment or edit its text → export one JSON change-set → Claude applies all edits in a single batch pass.

It kills the N-prompt deixis problem and the JSON is the audit trail. Honest status: articulated, not yet built — the worktree is near-empty.

▤ Cheat sheet → orchestrator

Both

Co-development modalities & their token cost

Modality	Trade	Relative tokens
a · Copy/paste chat	Max human effort, can't verify, lowest risk	▍ low
b · Inline autocomplete external tool — not Claude Code	Copilot / Cursor Tab / Windsurf; high frequency	▍▍ moderate
c · CLI single-agent	One rich context, many tool-call turns	▍▍▍ moderate–high
d · CLI workflow + sub-agents	Min labor, max capability, hardest to inspect	▍▍▍▍ highest

I live in (d) — tiering models inside it (Haiku → Sonnet → Opus) is the biggest lever for keeping it affordable. LSP (pyright/clangd/gopls) is something else again: deterministic, zero-token code intelligence — not a completion modality.

Both★ Must-have

The token model — Claude vs Codex vs Antigravity

⚠️Every figure here is a placeholder — as of 2026-06-26, verify live. Teach the framework, not the number; prefer a live /status check.

	Claude	Codex	Antigravity
Short window	~5-hr rolling	5-hr msg ranges	agent-request
Long window	weekly + Opus cap	credit rate card	~tripled at I/O
Metering	tokens × premium	tokens (since 04-02)	requests
Training default	consumer opt-out	verify	verify
Overage	credit top-ups	buy credits	own API key ⚠️

Two buckets, everywhere

"Suddenly less helpful mid-conversation" = you hit a window. Design long jobs to be resumable, not all-or-nothing.

Both

The things AI gets confidently wrong

VerificationFluently wrong code runs cleanly — test units, ranges, conservation; check a known-good baseline.

ReproducibilityCommit code + a provenance record (model+version, effort, prompt, diff) — not the conversation.

Air-gapThere is no fully air-gapped Claude. Bedrock/Vertex give residency + no-train for restricted-but-not-air-gapped data.

AttributionAI is not an author — disclose tool + model+version + what it did. Over-disclose.

Both

What transfers, what's lock-in

Transferable (~80%)

Decomposition, role/persona design, context-setting, verify/critic loops, artifact-driven output, batching edits, treating output as reviewable.

Non-transferable plumbing

CLI/sub-agent mechanics, skill & plugin format, .mcp.json wiring, model names, effort knobs.

▤ Cheat sheet → prompting-patterns

Both↻ Meta

How I'd improve this workflow

Current

Workflow DAG lives in orchestrator prose

xhigh everywhere in the main loop; no Haiku tier

Collab wrapper is an idea, no artifact

→

Improved

Versioned workflow schema + published plugin agents

Tiered Haiku → Sonnet → Opus + token/hour telemetry

Collab wrapper shipped — stable IDs, validated schema

Plus: harden unattended autonomy (least-privilege + worktree sandbox + PreToolUse block-hooks), and add validation at the seams — there's no built-in inter-agent correctness guarantee.

Both↻ Meta

Recap & cheat sheets

The whole spine in one line: decompose → assign altitude → concrete, inspectable hand-offs.

The full handout set

Scientist

track home · artifacts

Engineer

track home · skills

security-governance

H5 · V5.x · V14.2

orchestrator

H3 · H10

prompting-patterns

H15

newcomer-skeptic

H1 · trust slides

pi-manager

full handout set

This deck

built by the method it teaches

The 13 highest-return questions are the backbone of this talk — let's take them.

How I Use AI for
Research & Development

Choose your track

The throughline: small teams of agents

The one move: decompose, assign altitude, make hand-offs concrete

Sub-agents, workflows & personas

The two surfaces

claude.ai vs Claude Code

Extending Claude Code

MCP servers: typed tools, not guesswork

MCP security: least privilege limits the blast radius

Skills: progressive disclosure

Plugins: one versioned container

Tuning the dials

Model & effort: route, don't max

Artifact-driven interaction: HTML + SVG

The collab wrapper: point, don't describe

Economics & honest limits

Co-development modalities & their token cost

The token model — Claude vs Codex vs Antigravity

The deck as the demo

The deck explains itself

Honesty & close

The things AI gets confidently wrong

What transfers, what's lock-in

How I'd improve this workflow

Recap & cheat sheets

How I Use AI forResearch & Development

Choose your track

The throughline: small teams of agents

The one move: decompose, assign altitude, make hand-offs concrete

Sub-agents, workflows & personas

The two surfaces

claude.ai vs Claude Code

Extending Claude Code

MCP servers: typed tools, not guesswork

MCP security: least privilege limits the blast radius

Skills: progressive disclosure

Plugins: one versioned container

Tuning the dials

Model & effort: route, don't max

Artifact-driven interaction: HTML + SVG

The collab wrapper: point, don't describe

Economics & honest limits

Co-development modalities & their token cost

The token model — Claude vs Codex vs Antigravity

The deck as the demo

The deck explains itself

Honesty & close

The things AI gets confidently wrong

What transfers, what's lock-in

How I'd improve this workflow

Recap & cheat sheets

How I Use AI for
Research & Development