April 1, 2026

Writing SKILL.md: 4 rules for reliable activation

Agent skills are easy to write and hard to write well. Four rules for SKILL.md frontmatter and body that make the difference between a skill that fires and one that gathers dust.

skillsskill-authoringclaude-codeagents

Skills are the extension mechanism for modern AI coding agents. You drop a SKILL.md file into a directory; the model loads it; it activates when the context matches. Simple system, huge leverage — one well-written skill can change an agent's behavior on thousands of conversations.

But "simple" hides a trap. Most skills people write never fire. The frontmatter is too vague. The body is a wall of text. The keywords don't match what users actually say. Four rules separate skills that work from skills that don't.

Rule 1: The description is activation, not documentation

The YAML description field is what the model reads to decide whether to load your skill. It's not documentation. Write it in three sentences:

What the skill does (action-first)
When to use it (specific trigger conditions)
Activate when — keywords, phrases, error strings users actually type

description: |
  Systematic merge conflict resolution with context analysis.
  Use this skill when the user has merge conflicts, needs help resolving
  conflicting changes, or is stuck on a rebase.
  Activate when: merge conflict, CONFLICT markers, <<<<<<< HEAD, rebase
  conflict, resolve conflicts, auto-merge failed.

The third line is doing the heaviest lifting. Include jargon ("merge conflict"), literal tokens users paste ("CONFLICT markers"), and natural phrasings ("resolve conflicts"). Mix them.

Bad skill descriptions trend toward the abstract. "Helps with Git." "Debugging utilities." "Code quality." These match everything, which means they match nothing well. Specificity wins.

Rule 2: Disambiguate with "NOT for X"

When two skills could fire on the same query, they compete. The model picks inconsistently. The user sees different behavior on identical prompts. Disambiguate in the description itself:

description: |
  Generate unit tests for React components using Testing Library.
  Use this skill for JSX/TSX component tests.
  NOT for backend code — use js-test-generator for Node/utilities.
  Activate when: React test, component test, Testing Library, render, screen.

The "NOT for" line is the single highest-leverage sentence you can add. It routes the disambiguation without the model having to enumerate every nearby skill. We see activation accuracy jump 15-25% when teams add explicit negatives to their skill set.

Rule 3: The body uses progressive disclosure

A SKILL.md is consumed by a model with finite attention. Structure from highest-signal to lowest:

One-line bolded summary
"When to Use" — 3-6 concrete bullets
Core concept in ≤ 5 sentences
Minimal example (fewest lines that demonstrate the thing)
Deeper patterns and variations
Anti-patterns (numbered)
Best practices (numbered, imperative)

Target 150-300 lines. Past 500, split into multiple skills — you probably have two or three concepts masquerading as one.

Resist three urges:

Writing history ("In 2023 this feature launched...") — rots immediately
Explaining obvious things the model already knows — wastes context
Enumerating every edge case — pick the important three

The scannability test: can someone skim the skill in 10 seconds and answer "what, when, minimal example"? If yes, you're there. If no, reorder or trim.

Rule 4: Test activation with a query suite

Every skill gets a sibling tests.yaml:

should_activate:
  - "my git merge failed"
  - "how do I resolve a CONFLICT marker?"
  - "stuck on rebase with conflicts"
  - "fix merge conflicts in auth.ts"

should_not_activate:
  - "merge the PR when CI passes"       # different meaning of "merge"
  - "conflict of interest in code review"
  - "how does git work?"                 # too general

Run it on every PR that touches the skill. For each should_activate query, verify the skill loads. For each should_not_activate query, verify it doesn't.

You can run the suite through the model itself as a cheap judge:

async function wouldActivate(skill, query) {
  const prompt = `Given skill:
Name: ${skill.name}
Description: ${skill.description}

Should this skill activate for: "${query}"?
Answer only "yes" or "no".`;
  const resp = await client.messages.create({
    model: "claude-haiku-4-5",
    max_tokens: 10,
    messages: [{ role: "user", content: prompt }],
  });
  return resp.content[0].text.trim().toLowerCase().startsWith("y");
}

Target: precision ≥ 0.85, recall ≥ 0.90. Lower precision = over-activation (noise). Lower recall = under-activation (your skill exists but never fires). Both are fixable in the description — rewrite and re-measure.

The workflow

When you add a skill to a library, the authoring loop should be:

Draft frontmatter (name, 3-sentence description with keywords)
Draft body (summary → when → concept → example → anti-patterns → best practices)
Write 10 should_activate + 10 should_not_activate queries
Run the activation test. Iterate description until both scores are high
Lint the body: frontmatter valid, ≤ 300 lines, required sections present, no stale model IDs
CI gates merge on all of the above

Without this loop, skills rot. A skill that was right six months ago uses a deprecated model ID, overlaps with a skill you added last month, and fires on queries you never intended. With it, skills stay sharp.

Why this matters

The failure mode of a bad skill isn't "the agent does the wrong thing loudly". It's silence. The skill quietly doesn't activate, the agent uses default behavior, the user never knows the skill exists. You won't see the failure in a bug tracker — you'll see it in activation-rate dashboards if you have them, or nowhere if you don't.

So build them. Instrument activation. Gate merges on tests. Your skill library becomes an asset that compounds, instead of a graveyard that costs you maintenance.

For the full set of patterns — frontmatter templates, activation debugging, progressive disclosure layouts, CI configs — see our skills-authoring meta-suite.