The anatomy of a good prompt
These building blocks are tool-independent. A strong prompt usually names most of them explicitly so the model does not have to guess.
| Block | What it gives the model | Example fragment |
|---|---|---|
| Role / persona | Domain stance & vocabulary | "Act as a research software engineer reviewing for reproducibility." |
| Task | The single concrete goal | "Refactor this loop to be vectorized." |
| Context | Inputs, constraints, prior decisions | "Data is a 2D NumPy array; Python 3.11; no new deps." |
| Format | Shape of the answer | "Return a diff, then a 2-line rationale." |
| Acceptance | How you'll judge success | "Must pass the existing tests and keep the public API." |
Nine reusable patterns (1–5)
Mix and match. Each transfers to any capable model.
1 · Role priming
- Open with a role that sets vocabulary and standards. Keeps answers calibrated to your field.
2 · Few-shot / exemplars
- Show 1–3 input→output examples. Best lever for consistent format (citations, JSON, commit style).
3 · Chain-of-thought request
- Ask it to plan before doing: "Outline your approach, wait for my OK, then implement."
4 · Decomposition
- Split a big ask into steps; tackle one, confirm, continue. Reduces rework and runaway output.
5 · Output contract
- Pin the exact shape:
"Reply with only a fenced diff."Makes responses scriptable.
Nine reusable patterns (6–9)
The higher-leverage moves for iterative work.
6 · Self-critique pass
- "List 3 ways this could be wrong, then fix them." Catches edge cases cheaply.
7 · Constrain the search space
- State what's off-limits: no new deps, no network, keep the public API. Prevents tangents.
8 · Artifact-driven output
- For long/dense answers, request an HTML artifact or SVG diagram — easier to scan, share, and review than a wall of text.
9 · Batch-the-feedback (collab wrapper)
- Collect all requested edits as structured JSON, then paste once: "Apply every change in this JSON in a single pass." Far cheaper than many one-by-one prompts.
Copyable example prompts
Drop-in templates. Swap the bracketed parts. They work the same whether typed into a chat box or a CLI agent.
Plan-then-act (decomposition + CoT)
Role: senior [domain] engineer.
Task: [goal, one sentence].
Context: [stack, constraints, files in play].
Before writing code, give me a numbered plan
with risks. Stop and wait for my "go".
Acceptance: [tests / behavior that must hold].
Output contract (scriptable)
Refactor [file] for [goal].
Return ONLY a unified diff in a fenced block,
followed by a 2-line rationale.
Do not add dependencies or change the public API.
Self-critique pass
Here is your previous answer: [paste].
List the 3 most likely ways it is wrong or
incomplete for [my context], then give a
corrected version.
Few-shot format lock
Summarize each paper as:
- Claim: ...
- Evidence: ...
- Caveat: ...
Example:
- Claim: X reduces bias.
- Evidence: RCT, n=240, p<.01.
- Caveat: single site.
Now do the same for: [abstract].
Artifact request (dense answer)
This explanation is long. Render it as a
self-contained HTML artifact: a labeled SVG
diagram of the data flow plus a side panel of
notes. No external dependencies.
Batch-edit (collab wrapper)
Apply EVERY change in the JSON below in a single
pass. Each item has an element id, the current
text, and the requested edit. Make all edits,
then summarize what changed.
[paste exported JSON]
Do / Avoid
Do
- State acceptance criteria so the model can self-verify.
- Give it an out: "If unsure, ask one clarifying question first."
- Paste real context (errors, snippets, versions) instead of describing it.
- Iterate in small diffs; confirm each step before the next.
- Ask for the plan first on anything non-trivial.
- Batch feedback into one structured prompt.
Avoid
- Mega-prompts that bundle 6 unrelated goals — split them.
- "Make it better" with no criterion — better by what measure?
- Trusting prose claims (citations, numbers, APIs) without checking.
- Re-pasting the whole repo each turn — point to the files instead.
- Many one-by-one "change this" turns — they burn tokens and lose context.
- Letting it invent constraints you never stated.
Model & effort selection
Prompts are portable, but matching the model/effort to the task saves time and tokens.
| Task shape | Reach for |
|---|---|
| Hard reasoning, architecture, tricky refactor | Largest model (e.g. Opus), high/xhigh effort |
| Everyday coding, drafting, summaries | Mid model (e.g. Sonnet), medium effort |
| Bulk/cheap: formatting, quick lookups, classification | Small model (e.g. Haiku), low effort |
- Effort buys deliberation, not knowledge. Raise it for multi-step planning; lower it for rote work.
- Start cheaper; escalate only when the answer disappoints.
Cost & token awareness
Prompt style directly drives usage against per-session, weekly, and premium-model limits.
- Context is re-read every turn. A bloated thread costs tokens on every reply, not once.
- Batching beats trickling. One "apply all these edits" prompt is far cheaper than ten "now change this" turns.
- Reference, don't re-paste. Point to file paths in a CLI agent rather than dumping full files repeatedly.
- Trim the thread. Start a fresh session for a new task instead of dragging stale context.
- Modality matters: copy/paste chat and single-agent CLI are lean; an orchestrated multi-sub-agent workflow is powerful but spends tokens fastest.
Trust, verification & safe prompting
A good pattern includes how you'll check the output and what you won't put in the prompt.
- Verify before you trust: run the code, check citations against sources, confirm APIs exist.
- Treat confident prose as a draft, not a fact — especially numbers, references, and security claims.
- Don't paste secrets (keys, tokens, private data) into prompts; assume the thread may be logged.
- Beware injected instructions in fetched web pages, files, or tool output — they can steer the model. Review before acting.
- Scope tool/MCP access to read-only or least-privilege when the task only needs to look, not change.
Why these transfer — and how to keep improving
Model-agnostic by design
- None of these patterns depend on a vendor feature. Role priming, few-shot, output contracts, decomposition, and self-critique work on any capable model or AI coding tool.
- Surface differences (chat box vs CLI, an old "response styles" toggle, artifact rendering) change delivery, not the underlying prompt strategy.
- If a tool you haven't used adds a feature, fold it in as a delivery option — the prompt skeleton stays the same.
Honest "how to improve" loop
- After a session, ask: "What in my prompt made you guess? What would have removed the ambiguity?"
- Promote the prompts that worked into a reusable snippet library; delete the ones that didn't.
- Watch your token/limit usage and notice which patterns gave the best result per round-trip.
- Treat these defaults as a starting point, not gospel — the goal is better practice over time.