1. Threat Model in One Card
What changes when an AI agent can read code, call tools, and run commands on a developer's behalf.
- Confused-deputy risk: the agent acts with the user's full privileges. A malicious doc, README, or web page can carry prompt-injection instructions the agent may follow.
- Tool reach: an MCP server or shell tool can read secrets, exfiltrate data, or make irreversible changes (delete, push, deploy).
- Data egress: prompts, file contents, and tool outputs leave the machine and go to the model provider — and possibly to third-party MCP servers.
- Supply chain: plugins, skills, and community MCP servers are untrusted code until reviewed.
- Cost as a risk surface: runaway agent loops and large-context sessions burn tokens and money; treat budget like a security control.
2. Securing MCP Servers
MCP servers extend Claude with external tools/data. Each one is a new trust boundary.
Vet before enabling
- Prefer local
stdioservers you control over remote HTTP/SSE servers when handling sensitive data. - Pin versions and read the source. Avoid
npx/uvxpulling latest on every run for production use. - Least scope: give each server read-only or narrowly-scoped credentials, never a personal admin token.
- Isolate secrets: pass tokens via env vars / secret stores, never hard-coded in
.mcp.jsoncommitted to git. - Inventory: keep a reviewed allow-list of approved servers; block unknown ones at the org level.
Inspect a server's config
# See what's wired up and where secrets come from
claude mcp list
cat .mcp.json # check command, args, env, URLs
.mcp.json in source control (config is reviewable) but secrets in .env/secret manager referenced by name. Review MCP changes in PRs like any other dependency.
3. Permissions & Tool Gating
Default to ask; allow-list the safe, deny-list the dangerous.
- Allow low-risk reads (e.g.
Read,Grep,git status, build/test runs). - Ask for writes, network calls, and any MCP tool by default.
- Deny destructive patterns outright:
rm -rf, force-push,curl … | sh, secret-file reads. - Avoid blanket bypass (
--dangerously-skip-permissions/ "YOLO" mode) outside a throwaway sandbox. - Sandbox high-autonomy runs in a container/VM with no prod credentials and limited network egress.
Example .claude/settings.json
{
"permissions": {
"allow": ["Read", "Grep", "Bash(git status:*)",
"Bash(npm test:*)"],
"ask": ["Edit", "Write", "WebFetch"],
"deny": ["Bash(rm -rf:*)", "Bash(git push --force:*)",
"Read(./.env)", "Read(**/secrets/**)"]
}
}
PreToolUse) to enforce policy the model cannot talk its way around.
4. Data Policy & Egress
Decide what may leave the building before anyone opens a chat window.
- Classify first: public, internal, confidential, regulated (PII/PHI/export-controlled). Map each to "may / may not go to a model."
- Prefer enterprise/zero-retention tiers for sensitive work; consumer tiers may use data differently — confirm against current Terms of Use.
- Redact secrets and identifiers before pasting. Keys, tokens, real subject data, and unpublished results are easy to leak by accident.
- Mind indirect egress: file uploads, MCP tool calls to external APIs, and
WebFetchall send data outward. - Keep an audit trail: log which projects use AI tools and at what data tier; for research, note reproducibility and attribution expectations.
5. Token & Cost Governance
Limits and premiums are levers — set expectations and watch the burn.
| Lever | Governance angle |
|---|---|
| Per-session & weekly limits | Plan work so heavy jobs don't exhaust a shared cap mid-week; stagger large runs. |
| Model premiums | Opus costs more per token than Sonnet/Haiku; reserve it for hard reasoning, not bulk edits. |
| Context size | Long sessions re-send context each turn. Start fresh sessions; avoid dumping whole repos. |
| Agent loops | Sub-agent orchestration multiplies token use. Cap iterations and require checkpoints. |
| Terms of Use | Confirm acceptable-use and data-retention terms per plan before approving a tool for the org. |
Comparative awareness
- Claude, OpenAI Codex, and Google Antigravity each meter usage differently (session/weekly caps vs. metered API spend vs. bundled quotas). Map the model to your billing and policy before standardizing.
6. Model & Effort for Governance Work
Match horsepower to the task; document the choice.
| Task | Suggested |
|---|---|
| Threat-model a new MCP server / plugin | Opus, high/xhigh effort |
| Security review of a diff | Opus or Sonnet, high effort |
| Draft a data-policy table / checklist | Sonnet, medium |
| Summarize logs, lint configs, bulk redaction | Haiku / Sonnet, low |
7. Co-Dev Modalities & Their Risk/Cost
Where the agent runs changes both data exposure and spend.
| Modality | Exposure | Token use |
|---|---|---|
| Copy/paste to chat | You control exactly what leaves | Low |
| Inline editor autocomplete | Sends surrounding code continuously | Low–med |
| CLI single agent | Reads files + runs tools it's allowed | Medium |
| CLI workflow + sub-agents | Broadest reach, hardest to audit | High |
8. Copyable Review Prompts
Paste these into Claude / Claude Code. Adjust paths and policy names to your org.
Audit an MCP server before approving
Review this MCP server for security before we enable it org-wide.
Source: <repo URL or local path>. Tell me:
1) Every external endpoint or filesystem path it can touch.
2) What credentials/scopes it requires and the minimum it needs.
3) Prompt-injection or data-exfiltration risks in its tools and
tool descriptions.
4) A least-privilege .mcp.json + recommended permission rules.
Flag anything you can't verify rather than guessing.
Generate permission guardrails
Propose a .claude/settings.json for this repo. Allow only safe
read + test commands, ask for writes/network/MCP, and deny
destructive shell, force-push, and reads of .env or secrets/**.
Explain each deny rule in one line.
Security review of a change set
Act as a security reviewer. Review the current git diff for:
secret leakage, injection, unsafe shell, over-broad permissions,
and new data egress. Rate each finding (high/med/low), cite the
file:line, and give the smallest safe fix. No code changes yet.
Data-policy redaction pass
Scan these files for things that must NOT be sent to an external
model: credentials, tokens, PII, unpublished data, internal URLs.
List each with file:line and a redacted replacement suggestion.
Token/cost sanity check
This workflow uses sub-agents. Estimate where token use concentrates,
cap the iteration count, and suggest where a cheaper model (Haiku/
Sonnet) or smaller context would cut cost without losing rigor.
9. Do / Avoid
Do
- Maintain a reviewed allow-list of MCP servers, plugins, and skills.
- Default permissions to ask; allow-list only safe reads.
- Reference secrets by name; keep them out of git and prompts.
- Sandbox high-autonomy agents away from prod credentials.
- Publish a green/yellow/red data-classification table.
- Record model + effort in PRs for cost visibility.
- Use hooks to enforce rules the model can't override.
Avoid
- Enabling community MCP servers without reading the source.
- Blanket permission bypass outside a throwaway sandbox.
- Pasting secrets, PII, or unpublished results into any chat.
- Handing agents personal admin tokens "to save time."
- Trusting tool output / web content as instructions.
- Running Opus + high effort for bulk mechanical edits.
- Letting sub-agent loops run uncapped.
10. How This Could Be Better (Honest Notes)
This is a starting governance posture, not a finished standard. Known gaps to keep improving:
- Telemetry: per-session permission decisions and token spend are hard to aggregate centrally — better dashboards would help.
- MCP provenance: signing/attestation for servers and skills is still immature; today we rely on manual source review.
- Prompt-injection defense: there is no airtight fix; layered controls (least privilege, egress limits, human-in-loop) reduce but don't eliminate risk.
- Policy drift: Terms of Use, retention, and limits change across Claude, Codex, and Antigravity — re-verify on a schedule rather than once.