Security & Governance — AI-Augmented Development Cheat Sheet

1. Threat Model in One Card

What changes when an AI agent can read code, call tools, and run commands on a developer's behalf.

Confused-deputy risk: the agent acts with the user's full privileges. A malicious doc, README, or web page can carry prompt-injection instructions the agent may follow.
Tool reach: an MCP server or shell tool can read secrets, exfiltrate data, or make irreversible changes (delete, push, deploy).
Data egress: prompts, file contents, and tool outputs leave the machine and go to the model provider — and possibly to third-party MCP servers.
Supply chain: plugins, skills, and community MCP servers are untrusted code until reviewed.
Cost as a risk surface: runaway agent loops and large-context sessions burn tokens and money; treat budget like a security control.

Tip Govern the three flows independently: what comes in (untrusted content), what the agent can do (tools/permissions), and what goes out (data egress + spend).

2. Securing MCP Servers

MCP servers extend Claude with external tools/data. Each one is a new trust boundary.

Vet before enabling

Prefer local stdio servers you control over remote HTTP/SSE servers when handling sensitive data.
Pin versions and read the source. Avoid npx/uvx pulling latest on every run for production use.
Least scope: give each server read-only or narrowly-scoped credentials, never a personal admin token.
Isolate secrets: pass tokens via env vars / secret stores, never hard-coded in .mcp.json committed to git.
Inventory: keep a reviewed allow-list of approved servers; block unknown ones at the org level.

Inspect a server's config

# See what's wired up and where secrets come from
claude mcp list
cat .mcp.json     # check command, args, env, URLs

Caution A remote MCP server sees every argument you send it. Tool descriptions returned by a server are injected into the model's context — a hostile server can attempt prompt injection. Treat third-party MCP output as untrusted input.

Tip Keep .mcp.json in source control (config is reviewable) but secrets in .env/secret manager referenced by name. Review MCP changes in PRs like any other dependency.

3. Permissions & Tool Gating

Default to ask; allow-list the safe, deny-list the dangerous.

Allow low-risk reads (e.g. Read, Grep, git status, build/test runs).
Ask for writes, network calls, and any MCP tool by default.
Deny destructive patterns outright: rm -rf, force-push, curl … | sh, secret-file reads.
Avoid blanket bypass (--dangerously-skip-permissions / "YOLO" mode) outside a throwaway sandbox.
Sandbox high-autonomy runs in a container/VM with no prod credentials and limited network egress.

Example `.claude/settings.json`

{
  "permissions": {
    "allow": ["Read", "Grep", "Bash(git status:*)",
              "Bash(npm test:*)"],
    "ask":   ["Edit", "Write", "WebFetch"],
    "deny":  ["Bash(rm -rf:*)", "Bash(git push --force:*)",
              "Read(./.env)", "Read(**/secrets/**)"]
  }
}

Tip Commit a project-level settings file so the whole team inherits the same guardrails. Use hooks (PreToolUse) to enforce policy the model cannot talk its way around.

4. Data Policy & Egress

Decide what may leave the building before anyone opens a chat window.

Classify first: public, internal, confidential, regulated (PII/PHI/export-controlled). Map each to "may / may not go to a model."
Prefer enterprise/zero-retention tiers for sensitive work; consumer tiers may use data differently — confirm against current Terms of Use.
Redact secrets and identifiers before pasting. Keys, tokens, real subject data, and unpublished results are easy to leak by accident.
Mind indirect egress: file uploads, MCP tool calls to external APIs, and WebFetch all send data outward.
Keep an audit trail: log which projects use AI tools and at what data tier; for research, note reproducibility and attribution expectations.

Caution "It's just a prompt" still counts as disclosure. Treat the chat box like an email to an outside vendor: assume it leaves your control once sent.

Tip Publish a one-page "green / yellow / red" data table for your group so scientists and engineers don't have to re-decide every session.

5. Token & Cost Governance

Limits and premiums are levers — set expectations and watch the burn.

Lever	Governance angle
Per-session & weekly limits	Plan work so heavy jobs don't exhaust a shared cap mid-week; stagger large runs.
Model premiums	Opus costs more per token than Sonnet/Haiku; reserve it for hard reasoning, not bulk edits.
Context size	Long sessions re-send context each turn. Start fresh sessions; avoid dumping whole repos.
Agent loops	Sub-agent orchestration multiplies token use. Cap iterations and require checkpoints.
Terms of Use	Confirm acceptable-use and data-retention terms per plan before approving a tool for the org.

Comparative awareness

Claude, OpenAI Codex, and Google Antigravity each meter usage differently (session/weekly caps vs. metered API spend vs. bundled quotas). Map the model to your billing and policy before standardizing.

Tip Make cost visible: ask the team to note model + effort level in PRs. "Sonnet / medium" vs "Opus / high" tells you where the budget went.

6. Model & Effort for Governance Work

Match horsepower to the task; document the choice.

Task	Suggested
Threat-model a new MCP server / plugin	Opus, high/xhigh effort
Security review of a diff	Opus or Sonnet, high effort
Draft a data-policy table / checklist	Sonnet, medium
Summarize logs, lint configs, bulk redaction	Haiku / Sonnet, low

Tip Higher effort = more reasoning tokens = more cost and latency. Use high/xhigh where a missed risk is expensive; drop to low for mechanical passes.

7. Co-Dev Modalities & Their Risk/Cost

Where the agent runs changes both data exposure and spend.

Modality	Exposure	Token use
Copy/paste to chat	You control exactly what leaves	Low
Inline editor autocomplete	Sends surrounding code continuously	Low–med
CLI single agent	Reads files + runs tools it's allowed	Medium
CLI workflow + sub-agents	Broadest reach, hardest to audit	High

Caution The more autonomous the modality, the more your permissions, sandboxing, and audit logging have to carry the weight — the human is no longer reviewing each step.

8. Copyable Review Prompts

Paste these into Claude / Claude Code. Adjust paths and policy names to your org.

Audit an MCP server before approving

Review this MCP server for security before we enable it org-wide.
Source: <repo URL or local path>. Tell me:
1) Every external endpoint or filesystem path it can touch.
2) What credentials/scopes it requires and the minimum it needs.
3) Prompt-injection or data-exfiltration risks in its tools and
   tool descriptions.
4) A least-privilege .mcp.json + recommended permission rules.
Flag anything you can't verify rather than guessing.

Generate permission guardrails

Propose a .claude/settings.json for this repo. Allow only safe
read + test commands, ask for writes/network/MCP, and deny
destructive shell, force-push, and reads of .env or secrets/**.
Explain each deny rule in one line.

Security review of a change set

Act as a security reviewer. Review the current git diff for:
secret leakage, injection, unsafe shell, over-broad permissions,
and new data egress. Rate each finding (high/med/low), cite the
file:line, and give the smallest safe fix. No code changes yet.

Data-policy redaction pass

Scan these files for things that must NOT be sent to an external
model: credentials, tokens, PII, unpublished data, internal URLs.
List each with file:line and a redacted replacement suggestion.

Token/cost sanity check

This workflow uses sub-agents. Estimate where token use concentrates,
cap the iteration count, and suggest where a cheaper model (Haiku/
Sonnet) or smaller context would cut cost without losing rigor.

Tip These prompts are largely model-agnostic — the same wording transfers to other AI coding tools. Keep them in a shared snippets file so review quality doesn't depend on who's driving.

9. Do / Avoid

Do

Maintain a reviewed allow-list of MCP servers, plugins, and skills.
Default permissions to ask; allow-list only safe reads.
Reference secrets by name; keep them out of git and prompts.
Sandbox high-autonomy agents away from prod credentials.
Publish a green/yellow/red data-classification table.
Record model + effort in PRs for cost visibility.
Use hooks to enforce rules the model can't override.

Avoid

Enabling community MCP servers without reading the source.
Blanket permission bypass outside a throwaway sandbox.
Pasting secrets, PII, or unpublished results into any chat.
Handing agents personal admin tokens "to save time."
Trusting tool output / web content as instructions.
Running Opus + high effort for bulk mechanical edits.
Letting sub-agent loops run uncapped.

10. How This Could Be Better (Honest Notes)

This is a starting governance posture, not a finished standard. Known gaps to keep improving:

Telemetry: per-session permission decisions and token spend are hard to aggregate centrally — better dashboards would help.
MCP provenance: signing/attestation for servers and skills is still immature; today we rely on manual source review.
Prompt-injection defense: there is no airtight fix; layered controls (least privilege, egress limits, human-in-loop) reduce but don't eliminate risk.
Policy drift: Terms of Use, retention, and limits change across Claude, Codex, and Antigravity — re-verify on a schedule rather than once.

Tip Treat this sheet as a living document. If you find a stronger practice, fold it back in — the goal is to learn better governance, not just record current habits.

1. Threat Model in One Card

2. Securing MCP Servers

Vet before enabling

Inspect a server's config

3. Permissions & Tool Gating

Example .claude/settings.json

4. Data Policy & Egress

5. Token & Cost Governance

Comparative awareness

6. Model & Effort for Governance Work

7. Co-Dev Modalities & Their Risk/Cost

8. Copyable Review Prompts

Audit an MCP server before approving

Generate permission guardrails

Security review of a change set

Data-policy redaction pass

Token/cost sanity check

9. Do / Avoid

Do

Avoid

10. How This Could Be Better (Honest Notes)

Example `.claude/settings.json`