AI-Augmented Development & Research Tutorial

Security & Governance Cheat Sheet

Who this is for: people responsible for MCP security, tool permissions, data policy, and token/cost governance when teams adopt Claude and Claude Code.

1. Threat Model in One Card

What changes when an AI agent can read code, call tools, and run commands on a developer's behalf.

  • Confused-deputy risk: the agent acts with the user's full privileges. A malicious doc, README, or web page can carry prompt-injection instructions the agent may follow.
  • Tool reach: an MCP server or shell tool can read secrets, exfiltrate data, or make irreversible changes (delete, push, deploy).
  • Data egress: prompts, file contents, and tool outputs leave the machine and go to the model provider — and possibly to third-party MCP servers.
  • Supply chain: plugins, skills, and community MCP servers are untrusted code until reviewed.
  • Cost as a risk surface: runaway agent loops and large-context sessions burn tokens and money; treat budget like a security control.
Tip Govern the three flows independently: what comes in (untrusted content), what the agent can do (tools/permissions), and what goes out (data egress + spend).

2. Securing MCP Servers

MCP servers extend Claude with external tools/data. Each one is a new trust boundary.

Vet before enabling

  • Prefer local stdio servers you control over remote HTTP/SSE servers when handling sensitive data.
  • Pin versions and read the source. Avoid npx/uvx pulling latest on every run for production use.
  • Least scope: give each server read-only or narrowly-scoped credentials, never a personal admin token.
  • Isolate secrets: pass tokens via env vars / secret stores, never hard-coded in .mcp.json committed to git.
  • Inventory: keep a reviewed allow-list of approved servers; block unknown ones at the org level.

Inspect a server's config

# See what's wired up and where secrets come from
claude mcp list
cat .mcp.json     # check command, args, env, URLs
Caution A remote MCP server sees every argument you send it. Tool descriptions returned by a server are injected into the model's context — a hostile server can attempt prompt injection. Treat third-party MCP output as untrusted input.
Tip Keep .mcp.json in source control (config is reviewable) but secrets in .env/secret manager referenced by name. Review MCP changes in PRs like any other dependency.

3. Permissions & Tool Gating

Default to ask; allow-list the safe, deny-list the dangerous.

  • Allow low-risk reads (e.g. Read, Grep, git status, build/test runs).
  • Ask for writes, network calls, and any MCP tool by default.
  • Deny destructive patterns outright: rm -rf, force-push, curl … | sh, secret-file reads.
  • Avoid blanket bypass (--dangerously-skip-permissions / "YOLO" mode) outside a throwaway sandbox.
  • Sandbox high-autonomy runs in a container/VM with no prod credentials and limited network egress.

Example .claude/settings.json

{
  "permissions": {
    "allow": ["Read", "Grep", "Bash(git status:*)",
              "Bash(npm test:*)"],
    "ask":   ["Edit", "Write", "WebFetch"],
    "deny":  ["Bash(rm -rf:*)", "Bash(git push --force:*)",
              "Read(./.env)", "Read(**/secrets/**)"]
  }
}
Tip Commit a project-level settings file so the whole team inherits the same guardrails. Use hooks (PreToolUse) to enforce policy the model cannot talk its way around.

4. Data Policy & Egress

Decide what may leave the building before anyone opens a chat window.

  • Classify first: public, internal, confidential, regulated (PII/PHI/export-controlled). Map each to "may / may not go to a model."
  • Prefer enterprise/zero-retention tiers for sensitive work; consumer tiers may use data differently — confirm against current Terms of Use.
  • Redact secrets and identifiers before pasting. Keys, tokens, real subject data, and unpublished results are easy to leak by accident.
  • Mind indirect egress: file uploads, MCP tool calls to external APIs, and WebFetch all send data outward.
  • Keep an audit trail: log which projects use AI tools and at what data tier; for research, note reproducibility and attribution expectations.
Caution "It's just a prompt" still counts as disclosure. Treat the chat box like an email to an outside vendor: assume it leaves your control once sent.
Tip Publish a one-page "green / yellow / red" data table for your group so scientists and engineers don't have to re-decide every session.

5. Token & Cost Governance

Limits and premiums are levers — set expectations and watch the burn.

LeverGovernance angle
Per-session & weekly limitsPlan work so heavy jobs don't exhaust a shared cap mid-week; stagger large runs.
Model premiumsOpus costs more per token than Sonnet/Haiku; reserve it for hard reasoning, not bulk edits.
Context sizeLong sessions re-send context each turn. Start fresh sessions; avoid dumping whole repos.
Agent loopsSub-agent orchestration multiplies token use. Cap iterations and require checkpoints.
Terms of UseConfirm acceptable-use and data-retention terms per plan before approving a tool for the org.

Comparative awareness

  • Claude, OpenAI Codex, and Google Antigravity each meter usage differently (session/weekly caps vs. metered API spend vs. bundled quotas). Map the model to your billing and policy before standardizing.
Tip Make cost visible: ask the team to note model + effort level in PRs. "Sonnet / medium" vs "Opus / high" tells you where the budget went.

6. Model & Effort for Governance Work

Match horsepower to the task; document the choice.

TaskSuggested
Threat-model a new MCP server / pluginOpus, high/xhigh effort
Security review of a diffOpus or Sonnet, high effort
Draft a data-policy table / checklistSonnet, medium
Summarize logs, lint configs, bulk redactionHaiku / Sonnet, low
Tip Higher effort = more reasoning tokens = more cost and latency. Use high/xhigh where a missed risk is expensive; drop to low for mechanical passes.

7. Co-Dev Modalities & Their Risk/Cost

Where the agent runs changes both data exposure and spend.

ModalityExposureToken use
Copy/paste to chatYou control exactly what leavesLow
Inline editor autocompleteSends surrounding code continuouslyLow–med
CLI single agentReads files + runs tools it's allowedMedium
CLI workflow + sub-agentsBroadest reach, hardest to auditHigh
Caution The more autonomous the modality, the more your permissions, sandboxing, and audit logging have to carry the weight — the human is no longer reviewing each step.

8. Copyable Review Prompts

Paste these into Claude / Claude Code. Adjust paths and policy names to your org.

Audit an MCP server before approving

Review this MCP server for security before we enable it org-wide.
Source: <repo URL or local path>. Tell me:
1) Every external endpoint or filesystem path it can touch.
2) What credentials/scopes it requires and the minimum it needs.
3) Prompt-injection or data-exfiltration risks in its tools and
   tool descriptions.
4) A least-privilege .mcp.json + recommended permission rules.
Flag anything you can't verify rather than guessing.

Generate permission guardrails

Propose a .claude/settings.json for this repo. Allow only safe
read + test commands, ask for writes/network/MCP, and deny
destructive shell, force-push, and reads of .env or secrets/**.
Explain each deny rule in one line.

Security review of a change set

Act as a security reviewer. Review the current git diff for:
secret leakage, injection, unsafe shell, over-broad permissions,
and new data egress. Rate each finding (high/med/low), cite the
file:line, and give the smallest safe fix. No code changes yet.

Data-policy redaction pass

Scan these files for things that must NOT be sent to an external
model: credentials, tokens, PII, unpublished data, internal URLs.
List each with file:line and a redacted replacement suggestion.

Token/cost sanity check

This workflow uses sub-agents. Estimate where token use concentrates,
cap the iteration count, and suggest where a cheaper model (Haiku/
Sonnet) or smaller context would cut cost without losing rigor.
Tip These prompts are largely model-agnostic — the same wording transfers to other AI coding tools. Keep them in a shared snippets file so review quality doesn't depend on who's driving.

9. Do / Avoid

Do

  • Maintain a reviewed allow-list of MCP servers, plugins, and skills.
  • Default permissions to ask; allow-list only safe reads.
  • Reference secrets by name; keep them out of git and prompts.
  • Sandbox high-autonomy agents away from prod credentials.
  • Publish a green/yellow/red data-classification table.
  • Record model + effort in PRs for cost visibility.
  • Use hooks to enforce rules the model can't override.

Avoid

  • Enabling community MCP servers without reading the source.
  • Blanket permission bypass outside a throwaway sandbox.
  • Pasting secrets, PII, or unpublished results into any chat.
  • Handing agents personal admin tokens "to save time."
  • Trusting tool output / web content as instructions.
  • Running Opus + high effort for bulk mechanical edits.
  • Letting sub-agent loops run uncapped.

10. How This Could Be Better (Honest Notes)

This is a starting governance posture, not a finished standard. Known gaps to keep improving:

  • Telemetry: per-session permission decisions and token spend are hard to aggregate centrally — better dashboards would help.
  • MCP provenance: signing/attestation for servers and skills is still immature; today we rely on manual source review.
  • Prompt-injection defense: there is no airtight fix; layered controls (least privilege, egress limits, human-in-loop) reduce but don't eliminate risk.
  • Policy drift: Terms of Use, retention, and limits change across Claude, Codex, and Antigravity — re-verify on a schedule rather than once.
Tip Treat this sheet as a living document. If you find a stronger practice, fold it back in — the goal is to learn better governance, not just record current habits.