<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://binaryphile.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://binaryphile.github.io/" rel="alternate" type="text/html" /><updated>2026-05-19T15:55:43+00:00</updated><id>https://binaryphile.github.io/feed.xml</id><title type="html">binary.phile</title><subtitle>Musings, mostly on technology.</subtitle><entry><title type="html">Codifying a Bash Style Guide as ShellCheck Plugins</title><link href="https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/codifying-a-bash-style-guide-as-shellcheck-plugins.html" rel="alternate" type="text/html" title="Codifying a Bash Style Guide as ShellCheck Plugins" /><published>2026-05-19T14:30:00+00:00</published><updated>2026-05-19T14:30:00+00:00</updated><id>https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/codifying-a-bash-style-guide-as-shellcheck-plugins</id><content type="html" xml:base="https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/codifying-a-bash-style-guide-as-shellcheck-plugins.html"><![CDATA[<p>A style guide is just text. An enforced check is a tool that catches mistakes.</p>

<p>I have a <a href="/2026/02/27/bash-style-guide.html">bash style guide</a> that I keep in a repo and re-read when I forget which way around the <code class="language-plaintext highlighter-rouge">*List</code> convention goes. I also have a <a href="/bash/shellcheck/haskell/2026/05/19/adding-a-plugin-system-to-shellcheck.html">shellcheck fork with a plugin system</a>. The natural next step is to translate the guide into checks. That’s <a href="https://github.com/binaryphile/shellcheck-convention-plugin">shellcheck-convention-plugin</a>, and it ships nine checks codifying nine rules.</p>

<p>This post is the catalog plus two lessons from building it. The lessons are the value; the catalog is reference.</p>

<h2 id="the-catalog">The catalog</h2>

<table>
  <thead>
    <tr>
      <th>Check</th>
      <th>Rule</th>
      <th>Guide section</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SC9001</td>
      <td>Taint flows from unquoted parameter expansion to test/cmdsub contexts</td>
      <td>§5 quoting</td>
    </tr>
    <tr>
      <td>SC9002</td>
      <td>Command substitution result is tainted; quote it before using</td>
      <td>§5 quoting</td>
    </tr>
    <tr>
      <td>SC9003</td>
      <td>Quoting an already-quoted-by-context value is noise</td>
      <td>§5 quoting</td>
    </tr>
    <tr>
      <td>SC9004</td>
      <td>A variable cannot end in both <code class="language-plaintext highlighter-rouge">_</code> and <code class="language-plaintext highlighter-rouge">List</code> (the two mutually exclusive suffixes)</td>
      <td>§3 naming</td>
    </tr>
    <tr>
      <td>SC9005</td>
      <td>Numeric variables don’t belong inside <code class="language-plaintext highlighter-rouge">[[ ... ]]</code> — use <code class="language-plaintext highlighter-rouge">(( ... ))</code></td>
      <td>§11 conditionals</td>
    </tr>
    <tr>
      <td>SC9006</td>
      <td>Inclusive language in identifiers <em>and</em> comments</td>
      <td>§3 naming</td>
    </tr>
    <tr>
      <td>SC9007</td>
      <td>Function docstring shape: first body statement is a <code class="language-plaintext highlighter-rouge"># description</code> comment</td>
      <td>§6 functions</td>
    </tr>
    <tr>
      <td>SC9008</td>
      <td><code class="language-plaintext highlighter-rouge">*List</code> is an IFS-newline-serialized string, not an array — disallow array operations on it</td>
      <td>§3 naming + §7 arrays</td>
    </tr>
    <tr>
      <td>SC9009</td>
      <td>A <code class="language-plaintext highlighter-rouge">local</code> declaration without initialization followed by an append (<code class="language-plaintext highlighter-rouge">x+=...</code>, <code class="language-plaintext highlighter-rouge">printf -v x</code>, <code class="language-plaintext highlighter-rouge">read x</code>) reads from outer scope</td>
      <td>§6 functions + §15 FP-style</td>
    </tr>
  </tbody>
</table>

<p>Each check has positive (should fire) and negative (should not fire) test fixtures. The plugin ships as one <code class="language-plaintext highlighter-rouge">.so</code> and reports <code class="language-plaintext highlighter-rouge">Loaded plugin: libconvention-checks.so (9 check(s))</code> at startup. Each check has its own SC code so users can disable individuals with <code class="language-plaintext highlighter-rouge">--disable=SC9008</code>.</p>

<p>The codes are in the SC9xxx range. Upstream uses SC1xxx (parser), SC2xxx (analytics), SC3xxx (shell-dialect). SC9xxx is a convention I picked for plugins — it doesn’t collide with anything upstream is likely to issue, and a future reader can tell at a glance that an SC9xxx warning is from a plugin, not from shellcheck core.</p>

<h2 id="lesson-1-when-the-task-and-the-guide-disagree-the-guide-wins">Lesson 1: when the task and the guide disagree, the guide wins</h2>

<p>SC9008 shipped backwards.</p>

<p>The task description said “warn on array operations applied to <code class="language-plaintext highlighter-rouge">*List</code> variables.” I read that, wrote the check, shipped it. The fixtures passed. The check fired on <code class="language-plaintext highlighter-rouge">octopiList[0]</code> and didn’t fire on <code class="language-plaintext highlighter-rouge">octopi[0]</code>. Looked correct.</p>

<p>It was inverted.</p>

<p><code class="language-plaintext highlighter-rouge">*List</code> in my style guide means an IFS-serialized <em>string</em> — newline-separated values you read with <code class="language-plaintext highlighter-rouge">while IFS= read -r line</code>. Arrays use <em>plural</em> names: <code class="language-plaintext highlighter-rouge">octopi</code>, <code class="language-plaintext highlighter-rouge">requestedTests</code>, <code class="language-plaintext highlighter-rouge">filenames</code>. The task had been filed months earlier, when the convention was still in flux, and the wording reflected the older form where <code class="language-plaintext highlighter-rouge">*List</code> meant “array.” By the time I implemented it, the convention had inverted. The clarification lived in a separate task I didn’t read. I followed the task wording, not the guide.</p>

<p>The lesson: when implementing a rule, read the guide section, not the task description. Tasks describe <em>what to do</em>; guides describe <em>what’s true</em>. If they disagree, the guide wins, because the guide is what users will be checked against.</p>

<p>The fix: <code class="language-plaintext highlighter-rouge">git revert</code>, file a corrected task, re-implement against the guide, write a process retro. The retro is the part that mattered — it’s the reason I’ll catch this class of mistake next time.</p>

<h2 id="lesson-2-scope-aware-checks-are-hard-and-theyre-worth-the-trouble">Lesson 2: scope-aware checks are hard, and they’re worth the trouble</h2>

<p>SC9009 is the only check in the catalog that requires reasoning about variable <em>scope</em> and <em>order of operations</em> within a function. Everything else can be decided from the AST node in isolation.</p>

<p>The rule sounds simple:</p>

<blockquote>
  <p>A <code class="language-plaintext highlighter-rouge">local x</code> declaration followed by an append (<code class="language-plaintext highlighter-rouge">x+=...</code>, <code class="language-plaintext highlighter-rouge">printf -v x ...</code>, <code class="language-plaintext highlighter-rouge">read x</code>, <code class="language-plaintext highlighter-rouge">(( x += ... ))</code>) without an intervening initialization is a bug. The append reads from outer scope before assigning, so the function silently captures and mutates a global.</p>
</blockquote>

<p>Implementing it took 7 grade/improve cycles past the plan’s approval, each finding a new defect class:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">read -p prompt var</code> — the <code class="language-plaintext highlighter-rouge">-p</code> value got treated as a write target. Fix: extract a <code class="language-plaintext highlighter-rouge">extractReadTargets</code> helper that knows which <code class="language-plaintext highlighter-rouge">read</code> flags take values.</li>
  <li><code class="language-plaintext highlighter-rouge">mapfile -t arr</code> — same flag-value bug for <code class="language-plaintext highlighter-rouge">mapfile</code>. Fix: shared <code class="language-plaintext highlighter-rouge">extractFlagAwareTargets</code> helper.</li>
  <li><code class="language-plaintext highlighter-rouge">declare -p name</code> — the <code class="language-plaintext highlighter-rouge">-p</code> form is a <em>query</em>, not a declaration. Fix: skip <code class="language-plaintext highlighter-rouge">declare</code> when <code class="language-plaintext highlighter-rouge">-p</code>/<code class="language-plaintext highlighter-rouge">-f</code>/<code class="language-plaintext highlighter-rouge">-F</code> is present.</li>
  <li><code class="language-plaintext highlighter-rouge">declare -n alias=...</code> — the <code class="language-plaintext highlighter-rouge">-n</code> form is a nameref, not a value. Fix: skip when <code class="language-plaintext highlighter-rouge">-n</code> is present.</li>
  <li><code class="language-plaintext highlighter-rouge">(( x )) </code> — <code class="language-plaintext highlighter-rouge">TA_Variable</code> LHS of an arithmetic expression was being indexed as a read. Fix: track arith LHS IDs in a separate set, exclude from read positions.</li>
  <li><code class="language-plaintext highlighter-rouge">(( x = y = 1 ))</code> — chained arithmetic only registered the outer write. Fix: recurse into matched <code class="language-plaintext highlighter-rouge">TA_Assignment</code> for chained writes.</li>
  <li><code class="language-plaintext highlighter-rouge">printf -v var fmt</code> — the <code class="language-plaintext highlighter-rouge">-v</code> form <em>is</em> a write, but only when the flag is actually present. Fix: detect the <code class="language-plaintext highlighter-rouge">-v</code> flag explicitly rather than assuming any <code class="language-plaintext highlighter-rouge">printf</code> invocation with a variable arg is a write.</li>
</ol>

<p>Each of these passed the previous round’s fixtures. Each surfaced when I added one more real-world script to the negative-fixture set.</p>

<p>The check is still not CFG-path-sensitive. It’s a <em>lexical</em> heuristic: walk the AST in order, build a per-scope index of <code class="language-plaintext highlighter-rouge">(variable, first-write-kind, first-read-or-write-position)</code>, flag when the first write is an append and there’s no preceding initialization. A real CFG analysis would handle conditional initialization — <code class="language-plaintext highlighter-rouge">if foo; then x=1; fi; x+=more</code> — without flagging it. The lexical version flags it. That’s a known false positive and it’s documented in the check.</p>

<p>I shipped the lexical version because it catches the bug class — uninitialized-then-appended — without the implementation cost of a CFG. If I see real false positives in real scripts, I’ll revisit. So far, the rate is low enough that the lexical heuristic is the right cost/benefit point.</p>

<h2 id="what-this-experiment-proved">What this experiment proved</h2>

<p>Before this work, my bash style guide was a document. People who read it (mostly me) tried to apply it; mistakes were caught in code review, when caught at all.</p>

<p>After this work, the guide is a <em>tool</em>. The same shellcheck I already run on save now refuses to let me declare <code class="language-plaintext highlighter-rouge">userList=( inky blinky )</code>, refuses to let me write <code class="language-plaintext highlighter-rouge">local count; count+=1</code>, refuses to let me write a function whose first body statement isn’t a docstring comment.</p>

<p>The translation isn’t perfect. SC9009 has known false positives. SC9007 fires on section-header comments that aren’t intended as docstrings. SC9006 can’t tell that <code class="language-plaintext highlighter-rouge">master</code> as a git branch context is allowed where <code class="language-plaintext highlighter-rouge">master</code> as a deployment role isn’t. These are tradeoffs — false positives are cheaper to suppress than false negatives are to find by hand.</p>

<p>The repo: <a href="https://github.com/binaryphile/shellcheck-convention-plugin">binaryphile/shellcheck-convention-plugin</a>. The catalog with full per-check rationale: <code class="language-plaintext highlighter-rouge">docs/design.md</code> in that repo. The host fork: <a href="https://github.com/binaryphile/shellcheck">binaryphile/shellcheck</a>, covered in <a href="/bash/shellcheck/haskell/2026/05/19/adding-a-plugin-system-to-shellcheck.html">the previous post</a>.</p>

<p>If you’ve written a style guide for any language and wish it were enforced, write a plugin for whichever linter your team already runs. The ROI is real. The first check costs a day; the second costs an hour.</p>]]></content><author><name></name></author><category term="bash" /><category term="shellcheck" /><category term="haskell" /><summary type="html"><![CDATA[A style guide is just text. An enforced check is a tool that catches mistakes.]]></summary></entry><entry><title type="html">Adding a Plugin System to ShellCheck</title><link href="https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/adding-a-plugin-system-to-shellcheck.html" rel="alternate" type="text/html" title="Adding a Plugin System to ShellCheck" /><published>2026-05-19T14:00:00+00:00</published><updated>2026-05-19T14:00:00+00:00</updated><id>https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/adding-a-plugin-system-to-shellcheck</id><content type="html" xml:base="https://binaryphile.github.io/bash/shellcheck/haskell/2026/05/19/adding-a-plugin-system-to-shellcheck.html"><![CDATA[<p>I wanted shellcheck to catch a class of mistakes it wasn’t designed to catch — conventions specific to my bash style. Naming rules. Quoting under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>. Docstring shape. Things upstream would (rightly) never accept as core checks, because they’re house rules, not bash mistakes.</p>

<p>ShellCheck has no plugin system. The options are: fork it, vendor a patch, or stop wanting the thing.</p>

<p>So I forked it. The fork is <a href="https://github.com/binaryphile/shellcheck">binaryphile/shellcheck</a> and it now loads <code class="language-plaintext highlighter-rouge">.so</code> files at startup. This post is about how the plugin loader works and the one parser change I had to make to keep my docstring checks honest.</p>

<h2 id="the-plugin-shape">The plugin shape</h2>

<p>A plugin is a shared library exporting two C entry points:</p>

<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">foreign</span> <span class="n">export</span> <span class="n">ccall</span> <span class="n">plugin_api_version</span> <span class="o">::</span> <span class="kt">IO</span> <span class="kt">CInt</span>
<span class="n">foreign</span> <span class="n">export</span> <span class="n">ccall</span> <span class="n">plugin_init</span>        <span class="o">::</span> <span class="kt">IO</span> <span class="p">(</span><span class="kt">StablePtr</span> <span class="p">[</span><span class="kt">CustomCheck</span><span class="p">])</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">plugin_api_version</code> returns an integer. The host (the shellcheck binary) refuses to load a plugin whose version doesn’t match. <code class="language-plaintext highlighter-rouge">plugin_init</code> returns a list of <code class="language-plaintext highlighter-rouge">CustomCheck</code> values — each is a function <code class="language-plaintext highlighter-rouge">Parameters -&gt; Token -&gt; Writer [TokenComment] ()</code>, the same type as a built-in check.</p>

<p>At startup, shellcheck scans <code class="language-plaintext highlighter-rouge">$XDG_DATA_HOME/shellcheck/plugins/</code> for <code class="language-plaintext highlighter-rouge">*.so</code> files, <code class="language-plaintext highlighter-rouge">dlopen</code>s each one, calls <code class="language-plaintext highlighter-rouge">plugin_api_version</code>, then <code class="language-plaintext highlighter-rouge">plugin_init</code>, then registers the returned checks alongside the built-ins. They run as part of the same analysis pass. The error reporter has no idea they came from a plugin.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ shellcheck script.bash
Loaded plugin: libconvention-checks.so (9 check(s))
script.bash:3:1: warning: SC9001: ...
</code></pre></div></div>

<p>The plugin can use any of the AST helpers shellcheck exports — <code class="language-plaintext highlighter-rouge">getLiteralString</code>, the sugared pattern aliases like <code class="language-plaintext highlighter-rouge">T_Literal id str</code>, the whole shape-matching kit. From the plugin’s perspective, it’s writing the same code as a built-in check. It just lives in a separate package.</p>

<h2 id="the-catch-same-compiler-careful-linking">The catch: same compiler, careful linking</h2>

<p>The plugin and the host are both Haskell. Haskell linking is not stable across GHC versions, so the plugin and host must be built with the same compiler. The plugin must not link the runtime (the host already has one), and the host must build with <code class="language-plaintext highlighter-rouge">-rdynamic</code> so the plugin can see its symbols.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># host: shellcheck
ghc-options: -threaded -rdynamic

# plugin: convention-checks
ghc-options: -shared -fPIC -dynamic
ld-options:  -Wl,--unresolved-symbols=ignore-all
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">ignore-all</code> says the plugin’s references to host symbols don’t have to resolve at link time — they’ll resolve at <code class="language-plaintext highlighter-rouge">dlopen</code> time, when the host is loaded in the same process.</p>

<p>For nix users this is straightforward — both packages pin the same GHC and the lockfile keeps them in sync. For everyone else: build the host and the plugin from the same machine on the same day.</p>

<h2 id="the-wrinkle-shellchecks-parser-drops-comments">The wrinkle: shellcheck’s parser drops comments</h2>

<p>I was building a docstring-shape check — flag a function whose first body statement isn’t a <code class="language-plaintext highlighter-rouge"># description</code> comment. Standard convention check. Trivial to write.</p>

<p>Except shellcheck’s parser drops comments. The lexer matches them, the parser discards them, and the AST has no <code class="language-plaintext highlighter-rouge">T_Comment</code> node. Comments simply do not exist downstream of parsing.</p>

<p>This is fine for shellcheck’s purposes — comments don’t affect shell behavior, so a static analyzer that produces warnings about behavior can ignore them. It’s not fine for a plugin author writing a docstring check.</p>

<p>The fix is a splice: keep comments around, attach them to their nearest following AST node, and expose them through an accessor for plugin authors.</p>

<h2 id="the-splice">The splice</h2>

<p>Three pieces:</p>

<ol>
  <li>A new AST node, <code class="language-plaintext highlighter-rouge">T_Comment id text</code>, with all the standard <code class="language-plaintext highlighter-rouge">Token</code> machinery (positions, IDs).</li>
  <li>A post-parse pass — <code class="language-plaintext highlighter-rouge">attachComments</code> — that walks the comment list and the AST in parallel and slips <code class="language-plaintext highlighter-rouge">T_Comment</code> nodes into the body lists they belong to.</li>
  <li>An accessor — <code class="language-plaintext highlighter-rouge">getDocCommentsBefore :: Token -&gt; [Token]</code> — that returns the comments immediately preceding a given token, with no blank line separating them from the token.</li>
</ol>

<p>The splice is post-parse rather than mid-parse because the parser is Parsec-based and rewiring the existing rules to thread comments around would touch hundreds of productions. A post-pass that walks the AST once is cheap and isolated.</p>

<h2 id="two-bugs-in-the-splice">Two bugs in the splice</h2>

<p>The first version of the splice passed all unit tests but produced reordered output for any function with more than one statement.</p>

<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- buggy: collisions combine new-on-left</span>
<span class="kt">Map</span><span class="o">.</span><span class="n">fromListWith</span> <span class="p">(</span><span class="o">++</span><span class="p">)</span> <span class="p">[(</span><span class="n">parent</span><span class="p">,</span> <span class="p">[</span><span class="n">a</span><span class="p">]),</span> <span class="p">(</span><span class="n">parent</span><span class="p">,</span> <span class="p">[</span><span class="n">b</span><span class="p">])]</span>
<span class="c1">-- result: parent → [b, a]</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">fromListWith f</code> applies <code class="language-plaintext highlighter-rouge">f new old</code> on key collision, so <code class="language-plaintext highlighter-rouge">(++)</code> runs as <code class="language-plaintext highlighter-rouge">[b] ++ [a] = [b, a]</code>. Two siblings inserted in order ended up reversed in the output.</p>

<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- fix: flip the combine so old-on-left</span>
<span class="kt">Map</span><span class="o">.</span><span class="n">fromListWith</span> <span class="p">(</span><span class="n">flip</span> <span class="p">(</span><span class="o">++</span><span class="p">))</span>
</code></pre></div></div>

<p>Order preserved.</p>

<p>The second bug was sneakier. The splice descended through the AST looking for nodes whose source range contained a comment, and stopped when it found a containing node. But some node types report a point range (start == end) for nodes whose children span a larger region — <code class="language-plaintext highlighter-rouge">T_Redirecting</code> is one. The check <code class="language-plaintext highlighter-rouge">posInRange pos node</code> returned false at the point-range node, so descent stopped, and the comment never reached its real target.</p>

<p>The fix was to remove the range filter entirely. Descend unconditionally, attach the comment at the deepest matching child, and let the absence of a matching child be the stop condition.</p>

<p>Both bugs survived the unit tests I wrote first. They surfaced when I ran the splice against real fixtures — a function body with three statements and a comment before the second one. The first time I saw the comment land before the wrong sibling, I knew the data structure was wrong. The second time I saw a comment disappear entirely, I knew the descent was wrong.</p>

<p>It took me longer to root-cause than to fix. That’s the usual ratio for problems in code you wrote yesterday.</p>

<h2 id="where-this-leaves-the-fork">Where this leaves the fork</h2>

<p>ShellCheck-the-fork now has:</p>

<ul>
  <li>A <code class="language-plaintext highlighter-rouge">pluginApiVersion</code> constant the host and plugin agree on (currently 2; bumped from 1 when <code class="language-plaintext highlighter-rouge">getDocCommentsBefore</code> was added).</li>
  <li>Dynamic loading from <code class="language-plaintext highlighter-rouge">$XDG_DATA_HOME/shellcheck/plugins/</code>.</li>
  <li>Docs at <code class="language-plaintext highlighter-rouge">docs/use-cases.md</code>, <code class="language-plaintext highlighter-rouge">docs/design.md</code>, and <code class="language-plaintext highlighter-rouge">docs/plugins.md</code> covering the three personas: plugin author, plugin user, fork maintainer.</li>
  <li>A worked example plugin in a separate repo — <a href="https://github.com/binaryphile/shellcheck-convention-plugin">binaryphile/shellcheck-convention-plugin</a>. That plugin is the subject of <a href="/bash/shellcheck/haskell/2026/05/19/codifying-a-bash-style-guide-as-shellcheck-plugins.html">the next post</a>.</li>
</ul>

<p>I haven’t pitched any of this upstream. ShellCheck’s value to most users is its curated check set, and a plugin ecosystem fragments that — I’d be asking the maintainers to take on a maintenance surface that benefits a minority of users. The fork is fine. It exists so I can write checks for <em>my</em> conventions without convincing anyone else they’re worth maintaining.</p>

<p>If your conventions look like mine, both repos are on GitHub. If they don’t — write your own plugin. The ABI is two functions.</p>]]></content><author><name></name></author><category term="bash" /><category term="shellcheck" /><category term="haskell" /><summary type="html"><![CDATA[I wanted shellcheck to catch a class of mistakes it wasn’t designed to catch — conventions specific to my bash style. Naming rules. Quoting under IFS=$'\n'; set -o noglob. Docstring shape. Things upstream would (rightly) never accept as core checks, because they’re house rules, not bash mistakes.]]></summary></entry><entry><title type="html">Cockburn Use Cases Guide</title><link href="https://binaryphile.github.io/software-engineering/requirements/use-cases/2026/05/10/cockburn-use-cases-guide.html" rel="alternate" type="text/html" title="Cockburn Use Cases Guide" /><published>2026-05-10T17:00:00+00:00</published><updated>2026-05-10T17:00:00+00:00</updated><id>https://binaryphile.github.io/software-engineering/requirements/use-cases/2026/05/10/cockburn-use-cases-guide</id><content type="html" xml:base="https://binaryphile.github.io/software-engineering/requirements/use-cases/2026/05/10/cockburn-use-cases-guide.html"><![CDATA[<p>A practical reference for writing use cases per Alistair Cockburn’s <em>Writing Effective Use Cases</em> (2001). Template, goal levels, and step-writing guidelines distilled for software teams that want to capture behavior without designing the UI.</p>

<p><em>Originally authored as a working guide; published here on 2026-05-10 as part of the binaryphile.com compliance-references set.</em></p>

<hr />

<p>I keep returning to Cockburn’s framework when a team needs to write down what the system actually does, in a form that survives implementation changes. This is the version I reach for when I’m reviewing requirements drafts.</p>

<hr />

<h2 id="template-fully-dressed">Template (Fully Dressed)</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>### UC-N: Active Verb Phrase (Goal)

- **Primary Actor:** Role name (singular, capitalized)
- **Goal:** What the actor wants to achieve
- **Scope:** System under design (the black box)
- **Level:** User goal | Summary | Subfunction
- **Secondary Actors:** External systems the SUD calls upon
- **Trigger:** Event that starts the use case
- **Preconditions:** What must already be true (not tested within the UC)
- **Stakeholders:**
  - Role — what they need from this use case (drives MSS, extensions, guarantees)
- **Main Success Scenario:**
  1. Triggering event / first interaction
  2. Actor does X; System responds Y
  ...
  N. Goal is achieved
- **Extensions:**
  - 3a. Condition detected as fact:
    1. Recovery step
    2. Resume step N / Fail / Separate success
- **Technology &amp; Data Variations:** Sub-variations in how a step may be executed
- **Minimal Guarantee:** Promise to all stakeholders even on failure
- **Success Guarantee:** What must be true on completion
</code></pre></div></div>

<h2 id="goal-levels">Goal Levels</h2>

<table>
  <thead>
    <tr>
      <th>Level</th>
      <th>Test</th>
      <th>Size</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Summary</strong></td>
      <td>“That’s not just one thing” — encompasses multiple user goals</td>
      <td>Hours</td>
    </tr>
    <tr>
      <td><strong>User Goal</strong></td>
      <td>Boss test: “Would your boss accept you did this all day?” EBP test: one person, one place, one time, measurable value</td>
      <td>3-9 steps, minutes</td>
    </tr>
    <tr>
      <td><strong>Subfunction</strong></td>
      <td>Needed to support a user-goal UC; not independently valuable</td>
      <td>Seconds</td>
    </tr>
  </tbody>
</table>

<h2 id="the-three-kinds-of-action-steps">The Three Kinds of Action Steps</h2>

<p>Every step must be one of:</p>
<ol>
  <li><strong>Interaction</strong> between two actors</li>
  <li><strong>Validation</strong> protecting a stakeholder’s interest</li>
  <li><strong>Internal state change</strong> satisfying a stakeholder</li>
</ol>

<h2 id="twelve-step-writing-guidelines">Twelve Step-Writing Guidelines</h2>

<ol>
  <li><strong>Simple grammar.</strong> Subject-verb-object.</li>
  <li><strong>Who has the ball.</strong> Name the actor explicitly in every step.</li>
  <li><strong>Bird’s-eye view.</strong> Describe from above, not inside any actor’s head.</li>
  <li><strong>Process moves forward.</strong> Each step advances toward the goal. No step leaves the scenario unchanged.</li>
  <li><strong>Intent, not movements.</strong> “Customer provides address” not “Customer clicks field and types.”</li>
  <li><strong>Reasonable transaction size.</strong> Actor sends request+data, system validates, system updates state, system responds. One step or decomposed — use judgment.</li>
  <li><strong>“Validate,” don’t “check whether.”</strong> “System validates credentials” moves forward; “System checks whether credentials are valid” requires an if/else branch. Validation failures go in extensions.</li>
  <li><strong>Mention timing when it matters.</strong> “System responds within 3 seconds.”</li>
  <li><strong>“Actor has System A kick System B.”</strong> When the primary actor causes inter-system communication.</li>
  <li><strong>“Do steps x-y until condition.”</strong> For loops.</li>
  <li><strong>Condition says what was detected.</strong> Extensions state facts, not questions. “Invalid card number:” not “Is the card valid?”</li>
  <li><strong>Indent condition handling.</strong> Extension handling indented under the condition.</li>
</ol>

<h2 id="extension-rules">Extension Rules</h2>

<ul>
  <li>Keyed to MSS step numbers: <code class="language-plaintext highlighter-rouge">3a</code>, <code class="language-plaintext highlighter-rouge">3b</code>, <code class="language-plaintext highlighter-rouge">*a</code> (any step)</li>
  <li>State conditions as <strong>detected facts</strong>, not questions</li>
  <li>Each extension ends one of three ways:
    <ol>
      <li>Rejoins MSS at a specific step</li>
      <li>Reaches a separate success exit</li>
      <li>Ends in failure</li>
    </ol>
  </li>
  <li>Brainstorm exhaustively — completeness comes from extensions, not the MSS</li>
  <li>Complex extensions can be extracted into sub-use cases</li>
</ul>

<h2 id="stakeholder-interests">Stakeholder Interests</h2>

<ul>
  <li>Ask: “Who cares, and what do they want?”</li>
  <li>The system responds to the actor while <strong>protecting the interests of all stakeholders</strong></li>
  <li>Every interest must be addressed somewhere in the MSS, extensions, or guarantees</li>
  <li>This section is the key mechanism for <strong>preventing missing requirements</strong></li>
  <li>Stakeholder interests drive MSS steps, guarantees, and extensions</li>
</ul>

<h2 id="preconditions-and-guarantees">Preconditions and Guarantees</h2>

<ul>
  <li><strong>Preconditions:</strong> Assumed true, not tested. Only state what’s worth telling the reader.</li>
  <li><strong>Minimal Guarantee:</strong> Fewest promises even on failure (e.g., “audit trail preserved”)</li>
  <li><strong>Success Guarantee:</strong> What must be true on completion, meeting all stakeholder interests</li>
</ul>

<h2 id="quality-tests">Quality Tests</h2>

<ul>
  <li><strong>Boss Test:</strong> Would your boss accept you doing this all day? (user goal level)</li>
  <li><strong>EBP Test:</strong> One person, one place, one time, measurable value, consistent state?</li>
  <li><strong>Size Test:</strong> MSS has 3-9 steps. 20+ means decompose.</li>
  <li><strong>Purpose-content alignment:</strong> Does the goal match what the steps accomplish?</li>
</ul>

<h2 id="common-mistakes">Common Mistakes</h2>

<ol>
  <li><strong>Designing the UI</strong> — intent, not widgets</li>
  <li><strong>Wrong goal level</strong> — apply Boss/EBP/Size tests</li>
  <li><strong>No primary actor</strong> — every UC needs one</li>
  <li><strong>Missing stakeholder interests</strong> — leads to gaps</li>
  <li><strong>CRUD explosion</strong> — use “Manage X” and only extract complex operations</li>
  <li><strong>Excessive precision</strong> — rigor beyond what’s needed wastes time</li>
  <li><strong>Goal-content mismatch</strong> — stated goal doesn’t match steps</li>
</ol>

<h2 id="process">Process</h2>

<ol>
  <li>Find system boundary (scope)</li>
  <li>Find actors — characterize each (technical skill, constraints, behavior patterns)</li>
  <li>Find goals — exhaustive brainstorm per actor; produce <strong>actor-goal list</strong> table</li>
  <li>Write stakeholder interests — the key mechanism for preventing missing requirements</li>
  <li>Write preconditions and guarantees (minimal + success)</li>
  <li>Write MSS (3-9 steps meeting all interests)</li>
  <li>Brainstorm extension conditions exhaustively — completeness comes from here</li>
  <li>Write extension handling — each ends in rejoin, separate success, or failure</li>
  <li>Extract/merge sub-use cases as needed</li>
  <li>Readjust the set</li>
</ol>]]></content><author><name></name></author><category term="software-engineering" /><category term="requirements" /><category term="use-cases" /><summary type="html"><![CDATA[A practical reference for writing use cases per Alistair Cockburn’s Writing Effective Use Cases (2001). Template, goal levels, and step-writing guidelines distilled for software teams that want to capture behavior without designing the UI.]]></summary></entry><entry><title type="html">Shostack Threat Modeling Guide</title><link href="https://binaryphile.github.io/security/software-engineering/threat-modeling/2026/05/10/shostack-threat-modeling-guide.html" rel="alternate" type="text/html" title="Shostack Threat Modeling Guide" /><published>2026-05-10T17:00:00+00:00</published><updated>2026-05-10T17:00:00+00:00</updated><id>https://binaryphile.github.io/security/software-engineering/threat-modeling/2026/05/10/shostack-threat-modeling-guide</id><content type="html" xml:base="https://binaryphile.github.io/security/software-engineering/threat-modeling/2026/05/10/shostack-threat-modeling-guide.html"><![CDATA[<p>A practical guide to threat modeling principles, extracted from Adam Shostack’s <em>Threat Modeling: Designing for Security</em> (2014).</p>

<p><em>Originally authored as a working guide; published here on 2026-05-10 as part of the binaryphile.com compliance-references set.</em></p>

<p>Threat modeling replaces reactive security (“whack-a-mole”) with systematic, focused defense. This guide distills Shostack’s comprehensive framework into actionable patterns for software teams.</p>

<p><strong>What this guide covers:</strong></p>
<ul>
  <li>The four-question framework for all threat models</li>
  <li>STRIDE mnemonic for systematic threat discovery</li>
  <li>Data flow diagrams for visualizing systems</li>
  <li>Mitigations mapped to each threat category</li>
  <li>Practical worked examples and checklists</li>
</ul>

<p><strong>What it doesn’t cover:</strong></p>
<ul>
  <li>Extended case studies (Acme-DB)</li>
  <li>Full appendices and attack trees</li>
  <li>STRIDE variants in detail (STRIDE-per-interaction, DESIST)</li>
  <li>Extended privacy framework coverage</li>
  <li>Historical context</li>
</ul>

<hr />

<h2 id="1-the-goal-focused-defense-over-whack-a-mole">1. The Goal: Focused Defense Over Whack-a-Mole</h2>

<p>Security without structure is firefighting. You patch one vulnerability, another appears. You chase the latest exploit, missing the architectural flaw. Threat modeling breaks this cycle.</p>

<blockquote>
  <p>“Threat modeling is the key to a focused defense. Without threat models, you can never stop playing whack-a-mole.”</p>
</blockquote>

<blockquote>
  <p>“In short, threat modeling is the use of abstractions to aid in thinking about risks.”</p>
</blockquote>

<p><strong>What threat modeling accomplishes:</strong></p>

<table>
  <thead>
    <tr>
      <th>Outcome</th>
      <th>How It Helps</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Find bugs early</td>
      <td>Design issues found before code is written</td>
    </tr>
    <tr>
      <td>Clarify requirements</td>
      <td>“Is that really a requirement?” becomes answerable</td>
    </tr>
    <tr>
      <td>Better products</td>
      <td>Fewer redesigns, predictable schedules</td>
    </tr>
    <tr>
      <td>Unique discoveries</td>
      <td>Finds issues other tools miss (omissions, novel threats)</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>“If you think about building a house, decisions you make early will have dramatic effects on security. Wooden walls and lots of ground-level windows expose you to more risks than brick construction. Once you’ve chosen, changes will be expensive.”</p>
</blockquote>

<p><strong>Who it’s for:</strong> Software developers, architects, operations, security professionals. You don’t need to be a security expert to benefit.</p>

<p><strong>The real value:</strong> Threat modeling finds issues other techniques won’t find—errors of omission like forgetting to authenticate a connection. Code analysis tools can’t find these. Your unique design may have unique threats that only systematic analysis will reveal.</p>

<hr />

<h2 id="2-the-four-questions">2. The Four Questions</h2>

<p>Every threat model answers four questions:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────────────────────────────┐
│ 1. What are you building?               │
│    → Draw diagrams, identify components │
├─────────────────────────────────────────┤
│ 2. What can go wrong?                   │
│    → Use STRIDE, attack trees, etc.     │
├─────────────────────────────────────────┤
│ 3. What should you do about it?         │
│    → Mitigate, accept, transfer         │
├─────────────────────────────────────────┤
│ 4. Did you do a decent job?             │
│    → Validate completeness              │
└─────────────────────────────────────────┘
</code></pre></div></div>

<p>You start and end with familiar tasks: drawing on a whiteboard and managing bugs. Everything in between is structured analysis.</p>

<p><strong>Why these four questions work:</strong></p>
<ul>
  <li>Question 1 (what are you building?) forces shared understanding</li>
  <li>Question 2 (what can go wrong?) finds threats systematically</li>
  <li>Question 3 (what to do?) produces actionable bugs</li>
  <li>Question 4 (did we do a good job?) validates completeness</li>
</ul>

<p>The framework is recursive: you can apply it to a whole system, a component, a feature, or even a single function.</p>

<hr />

<h2 id="3-drawing-your-system-data-flow-diagrams">3. Drawing Your System (Data Flow Diagrams)</h2>

<blockquote>
  <p>“All models are wrong. Some models are useful.”</p>
</blockquote>

<p>Data flow diagrams (DFDs) are the foundation. They show:</p>

<table>
  <thead>
    <tr>
      <th>Element</th>
      <th>Symbol</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>External Entity</td>
      <td>Rectangle</td>
      <td>People, systems outside your control</td>
    </tr>
    <tr>
      <td>Process</td>
      <td>Circle/Rounded</td>
      <td>Code that transforms data</td>
    </tr>
    <tr>
      <td>Data Store</td>
      <td>Parallel lines</td>
      <td>Databases, files, caches</td>
    </tr>
    <tr>
      <td>Data Flow</td>
      <td>Arrow</td>
      <td>Movement of data</td>
    </tr>
    <tr>
      <td>Trust Boundary</td>
      <td>Dashed line</td>
      <td>Where privilege changes</td>
    </tr>
  </tbody>
</table>

<p><strong>Trust boundaries</strong> are critical—they show where threats concentrate. A trust boundary exists wherever:</p>
<ul>
  <li>Privilege levels change</li>
  <li>Different principals interact</li>
  <li>Data crosses network/machine/process limits</li>
</ul>

<blockquote>
  <p>Trust boundaries and attack surfaces are very similar views of the same thing. An attack surface is a trust boundary plus a direction from which an attacker could launch an attack.</p>
</blockquote>

<p><strong>Diagram rules:</strong></p>
<ul>
  <li>Number each process, data flow, and data store</li>
  <li>Data can’t move itself—show the process that moves it</li>
  <li>If a component has a trust boundary, it’s a candidate for its own diagram</li>
  <li>Don’t draw an eye chart—break complex systems into sub-diagrams</li>
  <li>The diagram should tell a story and support you telling stories while pointing at it</li>
</ul>

<p><strong>Updating diagrams (validation questions):</strong></p>
<ol>
  <li>Can we tell a story without changing the diagram?</li>
  <li>Can we tell that story without using “sometimes” or “also”?</li>
  <li>Can we see exactly where the software makes security decisions?</li>
  <li>Does the diagram show all trust boundaries (UIDs, roles, network interfaces)?</li>
  <li>Does it reflect current or planned reality?</li>
  <li>Can we see where all data goes and who uses it?</li>
</ol>

<hr />

<h2 id="4-where-to-start-three-approaches">4. Where to Start: Three Approaches</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>What drives your analysis?
  │
  ├─ ASSETS → "What are we protecting?"
  │           Best when: Clear valuable targets
  │           Risk: May miss stepping-stone assets
  │
  ├─ ATTACKERS → "Who's attacking us?"
  │              Best when: Known threat actors
  │              Risk: Attackers not on list still attack
  │
  └─ SOFTWARE → "What are we building?"
                Best when: Development teams
                Risk: May miss operational context
</code></pre></div></div>

<p><strong>Recommendation:</strong> Start with software (what you’re building), use STRIDE to find threats, then validate against known attacker motivations. This combines the benefits of all three.</p>

<h3 id="the-cautionary-tale-of-zero-knowledge-systems">The Cautionary Tale of Zero-Knowledge Systems</h3>

<blockquote>
  <p>“Zero-Knowledge Systems didn’t have a clear answer to ‘what’s your threat model?’ Because there was no clear answer, there wasn’t consistency in what security features were built.”</p>
</blockquote>

<p>Without a clear threat model, the company invested heavily in preventing governments from spying—a fun technical challenge but one that had significant performance impacts. The emotional appeal of fighting government surveillance made it hard to make practical business decisions. Eventually, a clearer threat model let them invest in mitigations that all addressed the same subset of threats.</p>

<p><strong>The lesson:</strong> Without answering “what’s your threat model?”, you may build elaborate defenses against unlikely attacks while ignoring common ones.</p>

<h3 id="standard-answers-to-whats-your-threat-model">Standard Answers to “What’s Your Threat Model?”</h3>

<table>
  <thead>
    <tr>
      <th>Answer</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>“A thief who could steal your money”</td>
      <td>Financial motivation, external</td>
    </tr>
    <tr>
      <td>“Untrusted network”</td>
      <td>Assume network traffic can be read/modified</td>
    </tr>
    <tr>
      <td>“Malicious insiders”</td>
      <td>Employees, contractors with access</td>
    </tr>
    <tr>
      <td>“An attacker who could steal your cookie”</td>
      <td>Session hijacking, web app threats</td>
    </tr>
    <tr>
      <td>“Script kiddie”</td>
      <td>Low-skill attacker using automated tools</td>
    </tr>
    <tr>
      <td>“Nation-state actor”</td>
      <td>High-skill, well-resourced attacker</td>
    </tr>
  </tbody>
</table>

<p>Having a clear answer focuses your defense investments.</p>

<hr />

<h2 id="5-stride-the-six-threat-categories">5. STRIDE: The Six Threat Categories</h2>

<p>STRIDE is a mnemonic for finding threats. It was developed at Microsoft and has been refined over more than a decade of use. Each letter represents a threat that violates a security property:</p>

<table>
  <thead>
    <tr>
      <th>Threat</th>
      <th>Property Violated</th>
      <th>Definition</th>
      <th>Typical Victims</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>S</strong>poofing</td>
      <td>Authentication</td>
      <td>Pretending to be something/someone else</td>
      <td>Processes, external entities, people</td>
    </tr>
    <tr>
      <td><strong>T</strong>ampering</td>
      <td>Integrity</td>
      <td>Modifying data (disk, network, memory)</td>
      <td>Data stores, data flows, processes</td>
    </tr>
    <tr>
      <td><strong>R</strong>epudiation</td>
      <td>Non-repudiation</td>
      <td>Claiming you didn’t do something</td>
      <td>Processes</td>
    </tr>
    <tr>
      <td><strong>I</strong>nfo Disclosure</td>
      <td>Confidentiality</td>
      <td>Exposing data to unauthorized parties</td>
      <td>Processes, data stores, data flows</td>
    </tr>
    <tr>
      <td><strong>D</strong>enial of Service</td>
      <td>Availability</td>
      <td>Absorbing resources needed for service</td>
      <td>Processes, data stores, data flows</td>
    </tr>
    <tr>
      <td><strong>E</strong>levation of Privilege</td>
      <td>Authorization</td>
      <td>Doing things you’re not authorized to do</td>
      <td>Processes</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>“STRIDE is a tool to guide you to threats, not to ask you to categorize what you’ve found; it makes a lousy taxonomy, anyway.”</p>
</blockquote>

<p><strong>Usage:</strong> Walk through each element in your diagram and ask “How could an attacker achieve S? T? R? I? D? E?” Don’t worry about categorization—if you find a threat, record it.</p>

<h3 id="detailed-threat-examples">Detailed Threat Examples</h3>

<p><strong>Spoofing:</strong></p>
<ul>
  <li>Spoofing a process on the same machine (creating a file before the real process)</li>
  <li>Spoofing a file (creating in local directory, changing links)</li>
  <li>Spoofing a machine (ARP, IP, DNS spoofing)</li>
  <li>Spoofing a person (phishing, account takeover)</li>
  <li>Spoofing a role (declaring themselves to be that role)</li>
</ul>

<p><strong>Tampering:</strong></p>
<ul>
  <li>Tampering with a file (modify files on disk, servers, or remote includes)</li>
  <li>Tampering with memory (modify running code or API data by reference)</li>
  <li>Tampering with a network (redirect traffic, modify packets, especially wireless)</li>
</ul>

<p><strong>Repudiation:</strong></p>
<ul>
  <li>Claiming to have not clicked/received/ordered</li>
  <li>Claiming to be a fraud victim</li>
  <li>Attacking the logs (no logs, filling logs, injecting attacks into logs)</li>
</ul>

<p><strong>Information Disclosure:</strong></p>
<ul>
  <li>Extracting secrets from error messages</li>
  <li>Reading files with inappropriate ACLs</li>
  <li>Finding crypto keys on disk or in memory</li>
  <li>Reading network traffic (sniffing)</li>
  <li>Analyzing traffic metadata (DNS, social network connections)</li>
</ul>

<p><strong>Denial of Service:</strong></p>
<ul>
  <li>Absorbing memory (RAM or disk)</li>
  <li>Absorbing CPU</li>
  <li>Using process as an amplifier</li>
  <li>Filling data stores</li>
  <li>Consuming network resources</li>
</ul>

<p><strong>Elevation of Privilege:</strong></p>
<ul>
  <li>Sending inputs the code doesn’t handle properly (buffer overflow, injection)</li>
  <li>Gaining inappropriate memory access</li>
  <li>Bypassing authorization checks</li>
  <li>Data/code confusion (treating data as executable code)</li>
</ul>

<h3 id="focus-on-feasible-threats">Focus on Feasible Threats</h3>

<blockquote>
  <p>“Along the way, you might come up with threats like ‘someone might insert a back door at the chip factory.’ These are real possibilities but not very likely compared to using an exploit to attack a vulnerability for which you haven’t applied the patch.”</p>
</blockquote>

<p>Good threat modeling focuses on threats you can actually address. If you can’t do anything about motherboard backdoors, acknowledge them and move on.</p>

<hr />

<h2 id="6-stride-per-element">6. STRIDE-per-Element</h2>

<p>Not all threats apply to all elements. This matrix focuses your analysis:</p>

<table>
  <thead>
    <tr>
      <th>Element</th>
      <th>S</th>
      <th>T</th>
      <th>R</th>
      <th>I</th>
      <th>D</th>
      <th>E</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>External Entity</td>
      <td>✓</td>
      <td> </td>
      <td>✓</td>
      <td> </td>
      <td> </td>
      <td> </td>
    </tr>
    <tr>
      <td>Process</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
      <td>✓</td>
    </tr>
    <tr>
      <td>Data Flow</td>
      <td> </td>
      <td>✓</td>
      <td> </td>
      <td>✓</td>
      <td>✓</td>
      <td> </td>
    </tr>
    <tr>
      <td>Data Store</td>
      <td> </td>
      <td>✓</td>
      <td>?</td>
      <td>✓</td>
      <td>✓</td>
      <td> </td>
    </tr>
  </tbody>
</table>

<p><em>(? = Logs are data stores involved in addressing repudiation)</em></p>

<p><strong>Exit criteria:</strong> You have at least one threat per checked cell in your diagram.</p>

<p><strong>Customization:</strong> This matrix is somewhat Microsoft-specific. Adapt it to your context. For example, if privacy matters, add “Information Disclosure by External Entity.”</p>

<p><strong>STRIDE-per-element weaknesses:</strong></p>
<ol>
  <li>Similar issues crop up repeatedly in a given threat model</li>
  <li>The chart may not represent your specific issues</li>
</ol>

<blockquote>
  <p>“If you want to be comprehensive, this is helpful; if you want to focus on the most likely issues, it may be a distraction.”</p>
</blockquote>

<p><strong>Variants:</strong></p>
<ul>
  <li><strong>STRIDE-per-interaction:</strong> Consider (origin, destination, interaction) tuples. Same number of threats but may be easier to understand.</li>
  <li><strong>DESIST:</strong> Dispute, Elevation, Spoofing, Information disclosure, Service denial, Tampering. Same concepts, different acronym.</li>
</ul>

<hr />

<h2 id="7-attack-trees">7. Attack Trees</h2>

<p>Attack trees decompose a goal into sub-goals:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Goal: Steal credentials
├─ [OR] Phish user
│   ├─ [AND] Create fake login page
│   └─ [AND] Send convincing email
├─ [OR] Compromise database
│   ├─ [OR] SQL injection
│   └─ [OR] Stolen backup
└─ [OR] Intercept network traffic
    └─ [AND] Man-in-the-middle attack
</code></pre></div></div>

<p><strong>OR nodes:</strong> Any child achieves the goal
<strong>AND nodes:</strong> All children required</p>

<p><strong>When to use:</strong></p>
<ul>
  <li>Organizing threats found with STRIDE</li>
  <li>Deep-diving a specific attack scenario</li>
  <li>Communicating threats to stakeholders</li>
</ul>

<p>Trees can be created per-project or reused across similar systems.</p>

<p><strong>Creating an attack tree:</strong></p>
<ol>
  <li>Decide on a representation (AND or OR tree, most are OR)</li>
  <li>Create a root node (the attacker’s goal)</li>
  <li>Create subnodes (ways to achieve that goal)</li>
  <li>Consider completeness (are there other paths?)</li>
  <li>Prune the tree (remove irrelevant branches)</li>
  <li>Check the presentation (is it understandable?)</li>
</ol>

<p><strong>Exit criteria:</strong> When you have threats for each leaf node that applies to your system.</p>

<hr />

<h2 id="8-attack-libraries-capec-owasp">8. Attack Libraries (CAPEC, OWASP)</h2>

<p>Attack libraries provide pre-built threat catalogs:</p>

<table>
  <thead>
    <tr>
      <th>Library</th>
      <th>Scope</th>
      <th>Best For</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>CAPEC</td>
      <td>475+ attack patterns</td>
      <td>Comprehensive coverage, training</td>
    </tr>
    <tr>
      <td>OWASP Top Ten</td>
      <td>Web application risks</td>
      <td>Web projects, quick reference</td>
    </tr>
  </tbody>
</table>

<p><strong>CAPEC trade-off:</strong> Comprehensive but time-intensive (40+ hours for full review). Consider category-level review instead of entry-by-entry.</p>

<p><strong>CAPEC exit criteria:</strong> At least one issue per categories 1-11:</p>
<ol>
  <li>Data Leakage</li>
  <li>Resource Depletion</li>
  <li>Injection</li>
  <li>Spoofing</li>
  <li>Time and State</li>
  <li>Abuse of Functionality</li>
  <li>Probabilistic Techniques</li>
  <li>Exploitation of Authentication</li>
  <li>Exploitation of Privilege/Trust</li>
  <li>Data Structure Attacks</li>
  <li>Resource Manipulation</li>
</ol>

<p>Categories 12-15 (Network Reconnaissance, Social Engineering, Physical Security, Supply Chain) may be relevant depending on your system.</p>

<p><strong>OWASP Top Ten (2013 example):</strong></p>
<ol>
  <li>Injection</li>
  <li>Broken Authentication/Session Management</li>
  <li>Cross-Site Scripting</li>
  <li>Insecure Direct Object References</li>
  <li>Security Misconfiguration</li>
  <li>Sensitive Data Exposure</li>
  <li>Missing Function-Level Access Control</li>
  <li>Cross-Site Request Forgery</li>
  <li>Components with Known Vulnerabilities</li>
  <li>Unvalidated Redirects and Forwards</li>
</ol>

<blockquote>
  <p>“CAPEC is a classification of common attacks, whereas STRIDE is a set of security properties. CAPEC may have more promise than STRIDE for many populations of threat modelers.”</p>
</blockquote>

<p><strong>Using OWASP for threat modeling:</strong></p>

<p>The OWASP Top Ten works well as an adjunct to STRIDE for web projects. To turn it into a methodology:</p>
<ul>
  <li>Create a “Top Ten per Element” approach (like STRIDE-per-element)</li>
  <li>Look for risks at each point where data crosses a trust boundary</li>
</ul>

<p><strong>Trade-off:</strong> Cross-site scripting and CSRF may be overly specific for threat modeling—better as input to test planning. The Top Ten changes yearly based on volunteer input, so its value varies over time.</p>

<h3 id="when-to-use-which">When to Use Which</h3>

<table>
  <thead>
    <tr>
      <th>Situation</th>
      <th>Approach</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>New system design</td>
      <td>STRIDE (comprehensive, principle-based)</td>
    </tr>
    <tr>
      <td>Web application</td>
      <td>OWASP Top Ten + STRIDE</td>
    </tr>
    <tr>
      <td>Deep-dive on specific attack</td>
      <td>Attack trees</td>
    </tr>
    <tr>
      <td>Unknown domain</td>
      <td>CAPEC categories (structured exploration)</td>
    </tr>
    <tr>
      <td>Privacy-sensitive</td>
      <td>LINDDUN or Solove taxonomy</td>
    </tr>
    <tr>
      <td>Quick review</td>
      <td>STRIDE-per-element on key components</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="9-privacy-threats-brief-overview">9. Privacy Threats (Brief Overview)</h2>

<p>Privacy threat modeling is an emergent field. Key frameworks:</p>

<p><strong>LINDDUN</strong> (mirror of STRIDE for privacy):</p>
<ul>
  <li>Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness, Non-compliance</li>
</ul>

<p><strong>Solove’s Taxonomy:</strong></p>
<ul>
  <li>Information collection (surveillance, interrogation)</li>
  <li>Information processing (aggregation, identification, secondary use)</li>
  <li>Information dissemination (disclosure, breach)</li>
  <li>Invasion (intrusion, decisional interference)</li>
</ul>

<p><strong>Practical approach:</strong> Treat privacy as complementary to security threat modeling. Focus on data flows involving personal information.</p>

<p><strong>The nymity slider (Ian Goldberg):</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Less Privacy ←────────────────────────────→ More Privacy
Verinymity    Persistent    Linkable    Unlinkable
(Gov't ID,    Pseudonym     Anonymity   Anonymity
Credit Card)  (Pen name)    (Prepaid    (Tor, mixnets)
                            phone)
</code></pre></div></div>

<p>Key insight: It’s easy to move toward more nymity (more identifying), extremely difficult to move toward less. Design for privacy from the start.</p>

<p><strong>Where to look for privacy threats:</strong>
| Solove Category | Where to Focus |
|—————–|—————-|
| Identifier creation | Wherever your system creates or assigns IDs |
| Surveillance | Data collection points, especially broad collection |
| Interrogation | “Required” fields on forms |
| Aggregation | Inbound data flows from external entities |
| Identification | Where data is matched to real people |
| Exclusion | Decision points, especially fraud management |
| Information dissemination | Outbound data flows crossing trust boundaries |</p>

<hr />

<h2 id="10-from-threats-to-bugs">10. From Threats to Bugs</h2>

<p>Every threat needs action. Track them as bugs in your existing system. The key question: “Did I do something with each unique threat I found?”</p>

<blockquote>
  <p>“You really don’t want to drop stuff on the floor. This is ‘turning the crank’ sort of work. It’s rarely glamorous or exciting until you find the thing you overlooked.”</p>
</blockquote>

<p><strong>Bug template:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Title: [STRIDE category] [Element] - [Threat description]
Description: [How the attack works]
Mitigation: [Proposed defense]
Priority: [Based on impact and likelihood]
</code></pre></div></div>

<p><strong>Prioritization approaches:</strong></p>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Complexity</th>
      <th>Best For</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Simple triage</td>
      <td>Low</td>
      <td>Most teams</td>
    </tr>
    <tr>
      <td>DREAD scoring</td>
      <td>Medium</td>
      <td>Quantitative comparison</td>
    </tr>
    <tr>
      <td>Bug bars</td>
      <td>Medium</td>
      <td>Consistent thresholds</td>
    </tr>
    <tr>
      <td>Risk matrices</td>
      <td>High</td>
      <td>Compliance requirements</td>
    </tr>
  </tbody>
</table>

<p>Shostack recommends simple approaches. Elaborate risk scoring often provides false precision.</p>

<p><strong>Validation checklist:</strong></p>
<ol>
  <li>Have we written down or filed a bug for each threat?</li>
  <li>Is there a proposed/planned/implemented way to address each threat?</li>
  <li>Do we have a test case per threat?</li>
  <li>Has the software passed the test?</li>
</ol>

<hr />

<h2 id="11-the-three-responses">11. The Three Responses</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>How do you respond to a threat?
  │
  ├─ MITIGATE → Make attack harder
  │             Your go-to approach
  │             Example: Add authentication
  │
  ├─ ACCEPT → Acknowledge the risk
  │           When: Low probability OR low impact
  │           Warning: Can't accept on behalf of users
  │
  └─ TRANSFER → Let someone else handle it
                To: OS, framework, customer, insurer
                Warning: Transferred risk still exists
</code></pre></div></div>

<p><strong>Anti-pattern: IGNORE</strong></p>
<blockquote>
  <p>“A traditional approach to risk in information security is to ignore it… This approach is becoming less effective as contracts, lawsuits, and laws increase the risk of ignoring risks.”</p>
</blockquote>

<p><strong>Decision guidance:</strong></p>
<ul>
  <li>If there’s an easy fix, just fix it (skip strategizing)</li>
  <li>Mitigation is generally easiest and best for customers</li>
  <li>Document accepted risks explicitly</li>
</ul>

<p><strong>The “ignoring risks” trap:</strong></p>

<blockquote>
  <p>“A traditional approach to risk in information security is to ignore it… This approach is becoming less effective as contracts, lawsuits, and laws increase the risk of ignoring risks.”</p>
</blockquote>

<p>If you create a list of security problems you decide not to address, be aware:</p>
<ul>
  <li>Breach disclosure laws may require action</li>
  <li>Whistleblowers may expose the list</li>
  <li>Legal discovery in lawsuits may reveal it</li>
  <li>Regulatory requirements continue to increase</li>
</ul>

<blockquote>
  <p>“If you are threat modeling and create a list of security problems that you decide not to address, please send a copy of the list to the author, care of the publisher. There will be quarterly auctions to sell them to plaintiff’s attorneys.”</p>
</blockquote>

<hr />

<h2 id="12-mitigations-mapped-to-stride">12. Mitigations Mapped to STRIDE</h2>

<table>
  <thead>
    <tr>
      <th>Threat</th>
      <th>Mitigation Strategy</th>
      <th>Techniques</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Spoofing</strong></td>
      <td>Authentication</td>
      <td>Passwords, tokens, biometrics, digital signatures, HTTPS/SSL</td>
    </tr>
    <tr>
      <td><strong>Tampering</strong></td>
      <td>Integrity protection</td>
      <td>ACLs, digital signatures, MACs, HTTPS/SSL</td>
    </tr>
    <tr>
      <td><strong>Repudiation</strong></td>
      <td>Logging/Auditing</td>
      <td>Comprehensive logs, protected log storage, log over TCP/SSL</td>
    </tr>
    <tr>
      <td><strong>Info Disclosure</strong></td>
      <td>Confidentiality</td>
      <td>Encryption (SSL, IPsec), ACLs, careful API design</td>
    </tr>
    <tr>
      <td><strong>Denial of Service</strong></td>
      <td>Availability</td>
      <td>Elastic resources, rate limiting, quotas</td>
    </tr>
    <tr>
      <td><strong>Elevation</strong></td>
      <td>Authorization</td>
      <td>Type-safe languages, sandboxing, input validation, prepared statements</td>
    </tr>
  </tbody>
</table>

<h3 id="detailed-mitigation-techniques">Detailed Mitigation Techniques</h3>

<p><strong>Addressing Spoofing:</strong></p>
<ul>
  <li>Spoofing a person → Unique usernames + authentication (passwords, tokens, biometrics)</li>
  <li>Spoofing a file → Use full paths (not <code class="language-plaintext highlighter-rouge">./file</code>), check ACLs after opening</li>
  <li>Spoofing a network address → DNSSEC, SSL, IPsec</li>
  <li>Spoofing a program → Leverage OS application identifiers</li>
</ul>

<p><strong>Addressing Tampering:</strong></p>
<ul>
  <li>Tampering with a file → ACLs, digital signatures, keyed MACs</li>
  <li>Racing to create a file → Protected directories, private directory structures</li>
  <li>Tampering with network packets → HTTPS/SSL, IPsec</li>
  <li>Anti-pattern: Network isolation doesn’t work long-term
    <ul>
      <li>“The isolated United States SIPRNet was thoroughly infested with malware, and the operation to clean it up took 14 months.”</li>
    </ul>
  </li>
</ul>

<p><strong>Addressing Repudiation:</strong></p>
<ul>
  <li>No logs → Log all security-relevant information</li>
  <li>Logs under attack → Send over network (TCP/SSL, not UDP), use ACLs</li>
  <li>Logs as attack channel → Tightly specify log format early in development</li>
</ul>

<p><strong>Addressing Information Disclosure:</strong></p>
<ul>
  <li>Network monitoring → Encryption (HTTPS/SSL, IPsec)</li>
  <li>Sensitive filenames → Create innocuous parent directory with ACLs</li>
  <li>File contents → ACLs or file/disk encryption</li>
  <li>APIs revealing info → Be selective about what you return</li>
</ul>

<p><strong>Addressing Denial of Service:</strong></p>
<ul>
  <li>Network flooding → Elastic resources, ensure attacker effort ≥ yours, network ACLs</li>
  <li>Program resources → Careful design, proof of work, require work before expensive operations</li>
  <li>System resources → Use OS quotas and limits</li>
</ul>

<p><strong>Addressing Elevation of Privilege:</strong></p>
<ul>
  <li>Data/code confusion → Prepared statements, clear separators, late validation</li>
  <li>Memory corruption → Type-safe languages, ASLR, sandboxes (AppArmor, AppContainer)</li>
  <li>Command injection → Validate input size and form; don’t sanitize—log and discard weird input</li>
</ul>

<p><strong>Key principles:</strong></p>

<blockquote>
  <p>“Validate, don’t sanitize. Know what you expect to see, how much you expect to see, and validate that that’s what you’re receiving. If you get something else, throw it away.”</p>
</blockquote>

<blockquote>
  <p>“Trust the operating system. The OS provides security features so you can focus on your unique value proposition.”</p>
</blockquote>

<hr />

<h2 id="13-️-taking-it-too-far">13. ⚠️ Taking It Too Far</h2>

<h3 id="over-modeling">Over-modeling</h3>
<p>Threat modeling every component of a well-understood framework wastes effort. Focus on your unique code and architecture, not commodity components.</p>

<h3 id="paralysis-by-analysis">Paralysis by Analysis</h3>
<p>Don’t wait for the “complete” threat model. Start with what you know, iterate as you learn. An 80% threat model today beats a 100% model never delivered.</p>

<h3 id="category-obsession">Category Obsession</h3>
<blockquote>
  <p>“If you’ve already come up with the attack, why bother putting it in a category? The goal of STRIDE is to help you find attacks. Categorizing them might help you figure out the right defenses, or it may be a waste of effort.”</p>
</blockquote>

<p>If you find yourself debating whether “unauthorized database access” is spoofing or information disclosure, stop. Record the threat and move on. STRIDE is a finding tool, not a taxonomy.</p>

<h3 id="security-that-creates-insecurity">Security That Creates Insecurity</h3>

<p>Shostack dedicates an entire chapter (Chapter 15) to human factors because cumbersome security creates its own vulnerabilities.</p>

<blockquote>
  <p>“People are not, as is often claimed, the weakest link, or beyond help. The weakest link is almost always a vulnerability in Internet-facing code.”</p>
</blockquote>

<p><strong>The compliance budget:</strong> Angela Sasse’s research found that workers allocate a limited “budget” to security tasks. They spend time and energy until exhausted, then move on. Exceed the budget, and compliance drops.</p>

<blockquote>
  <p>“People do listen. They don’t act on security advice because it’s often bizarre, time consuming, and sometimes followed by, ‘Of course, you’ll still be at risk.’ You need to craft advice that works for the people who are listening to you.”</p>
</blockquote>

<p><strong>Warning fatigue:</strong></p>
<blockquote>
  <p>“Given a choice between ignoring a warning that they’ve clicked through a thousand times before without apparent ill effects and without being entertained, people will bypass a warning every time.”</p>
</blockquote>

<p><strong>The fix:</strong> Minimize what you ask of people. They should only be involved when they have information the system can’t determine (e.g., “Is this a home or coffee shop network?”).</p>

<blockquote>
  <p>“You can also transfer risk to customers, for example, by asking them to click through lots of hard-to-understand dialogs before they can do the work they need to do. That’s obviously not a great solution.”</p>
</blockquote>

<h3 id="ignoring-easy-fixes">Ignoring Easy Fixes</h3>
<blockquote>
  <p>“When there is an easy way to address a problem, you should skip strategizing and just address it.”</p>
</blockquote>

<blockquote>
  <p>“The diagram is intended to help ensure that you understand and can discuss the system. Don’t ask ‘Is this the right way to do it?’ Ask ‘Does this help me think about what might go wrong?’”</p>
</blockquote>

<h3 id="letting-perfect-be-the-enemy-of-good">Letting Perfect Be the Enemy of Good</h3>
<p>Start practicing now. You’re not going to get good at threat modeling by reading—you have to do it.</p>

<blockquote>
  <p>“You’re not going to get to Carnegie Hall if you don’t practice, practice, practice.”</p>
</blockquote>

<p>Pick a system you’re working on and threat model it:</p>
<ol>
  <li>Draw a diagram</li>
  <li>Use STRIDE to find threats</li>
  <li>Address each threat in some way</li>
  <li>Check your work with checklists</li>
  <li>Celebrate and share your work</li>
</ol>

<p><strong>What to threat model next:</strong></p>
<ul>
  <li>What you’re working on now (if it has trust boundaries)</li>
  <li>Something not too simple (trivial systems won’t be satisfying)</li>
  <li>Something not too complex (don’t chew off more than you can handle)</li>
  <li>Something you can collaborate on with trusted colleagues</li>
</ul>

<p><strong>Starting small:</strong> If you’re working on a large team or across organizational boundaries, start with a component you own. Build your skills before tackling complex cross-team systems.</p>

<hr />

<h2 id="14-worked-example-login-flow">14. Worked Example: Login Flow</h2>

<p><strong>Context:</strong> Web application login endpoint</p>

<p><strong>Step 1: Draw the diagram</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[Browser] --(credentials)--&gt; [Login Process] --(query)--&gt; [User DB]
                                    |
                                    v
                             [Session Store]

Trust Boundary: -------- Internet --------
</code></pre></div></div>

<p><strong>Step 2: Apply STRIDE to Login Process</strong></p>

<table>
  <thead>
    <tr>
      <th>Threat</th>
      <th>Question</th>
      <th>Finding</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>S</td>
      <td>Can someone pretend to be a legitimate user?</td>
      <td>Yes—stolen credentials, session hijacking</td>
    </tr>
    <tr>
      <td>T</td>
      <td>Can data be modified?</td>
      <td>Yes—MITM attack on credentials</td>
    </tr>
    <tr>
      <td>R</td>
      <td>Can user deny actions?</td>
      <td>Yes—if no session logging</td>
    </tr>
    <tr>
      <td>I</td>
      <td>Can credentials leak?</td>
      <td>Yes—error messages, timing attacks</td>
    </tr>
    <tr>
      <td>D</td>
      <td>Can login be blocked?</td>
      <td>Yes—flood attacks, account lockout abuse</td>
    </tr>
    <tr>
      <td>E</td>
      <td>Can attacker gain admin?</td>
      <td>Yes—SQL injection in query</td>
    </tr>
  </tbody>
</table>

<p><strong>Step 3: Prioritize and mitigate</strong></p>

<table>
  <thead>
    <tr>
      <th>Threat</th>
      <th>Priority</th>
      <th>Mitigation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Credential theft</td>
      <td>High</td>
      <td>HTTPS, MFA, session timeouts</td>
    </tr>
    <tr>
      <td>SQL injection</td>
      <td>High</td>
      <td>Prepared statements</td>
    </tr>
    <tr>
      <td>Session hijacking</td>
      <td>High</td>
      <td>Secure cookies, session binding</td>
    </tr>
    <tr>
      <td>Account lockout abuse</td>
      <td>Medium</td>
      <td>Captcha, IP rate limiting</td>
    </tr>
    <tr>
      <td>Credential timing</td>
      <td>Low</td>
      <td>Constant-time comparison</td>
    </tr>
  </tbody>
</table>

<p><strong>Step 4: Validate</strong></p>
<ul>
  <li>Did we address every STRIDE threat for every element?</li>
  <li>Do we have tests for each mitigation?</li>
  <li>Is anything still concerning?</li>
</ul>

<p><strong>Why this worked:</strong></p>
<ul>
  <li>The diagram made the system concrete and discussable</li>
  <li>STRIDE provided systematic coverage (no guessing what to look for)</li>
  <li>Each threat got a specific mitigation (not “improve security generally”)</li>
  <li>Tests will verify mitigations work</li>
</ul>

<p><strong>What could go wrong with this threat model:</strong></p>
<ul>
  <li>Missing trust boundaries (are there admin roles we didn’t show?)</li>
  <li>Missing data flows (are there logs, metrics, or debugging interfaces?)</li>
  <li>Assumptions about network security (is HTTPS really used everywhere?)</li>
</ul>

<hr />

<h2 id="15-quick-reference">15. Quick Reference</h2>

<h3 id="the-four-questions">The Four Questions</h3>
<ol>
  <li>What are you building?</li>
  <li>What can go wrong?</li>
  <li>What should you do about it?</li>
  <li>Did you do a decent job?</li>
</ol>

<h3 id="stride-threats">STRIDE Threats</h3>

<table>
  <thead>
    <tr>
      <th>Letter</th>
      <th>Threat</th>
      <th>Property</th>
      <th>Defense</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>S</td>
      <td>Spoofing</td>
      <td>Authentication</td>
      <td>Auth tokens, signatures</td>
    </tr>
    <tr>
      <td>T</td>
      <td>Tampering</td>
      <td>Integrity</td>
      <td>MACs, ACLs</td>
    </tr>
    <tr>
      <td>R</td>
      <td>Repudiation</td>
      <td>Non-repudiation</td>
      <td>Logging</td>
    </tr>
    <tr>
      <td>I</td>
      <td>Info Disclosure</td>
      <td>Confidentiality</td>
      <td>Encryption, ACLs</td>
    </tr>
    <tr>
      <td>D</td>
      <td>Denial of Service</td>
      <td>Availability</td>
      <td>Rate limits, quotas</td>
    </tr>
    <tr>
      <td>E</td>
      <td>Elevation</td>
      <td>Authorization</td>
      <td>Sandboxing, validation</td>
    </tr>
  </tbody>
</table>

<h3 id="stride-per-element-quick-check">STRIDE-per-Element Quick Check</h3>

<table>
  <thead>
    <tr>
      <th>Element</th>
      <th>Check For</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>External Entity</td>
      <td>S, R</td>
    </tr>
    <tr>
      <td>Process</td>
      <td>All (S, T, R, I, D, E)</td>
    </tr>
    <tr>
      <td>Data Flow</td>
      <td>T, I, D</td>
    </tr>
    <tr>
      <td>Data Store</td>
      <td>T, I, D (R for logs)</td>
    </tr>
  </tbody>
</table>

<h3 id="threat-response-checklist">Threat Response Checklist</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Can we eliminate the feature?</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Can we mitigate with standard patterns?</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Is the risk acceptable? (Document why)</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Can we transfer to a trusted component?</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Is our mitigation testable?</li>
</ul>

<h3 id="dfd-validation">DFD Validation</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />All trust boundaries marked</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />All processes numbered</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />No data moving without a process</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />External entities identified</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Data stores labeled</li>
</ul>

<h3 id="validation-checklist">Validation Checklist</h3>

<ul class="task-list">
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Diagram tells a story without “sometimes” or “also”</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />All trust boundaries, data flows, and stores visible</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />STRIDE checked for each element</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Bug filed for each threat</li>
  <li class="task-list-item"><input type="checkbox" class="task-list-item-checkbox" disabled="disabled" />Test case per threat</li>
</ul>

<hr />

<h2 id="16-connection-to-go-development-guide">16. Connection to Go Development Guide</h2>

<table>
  <thead>
    <tr>
      <th>Shostack (Threat Modeling)</th>
      <th>Go Development Guide</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Tampering with memory</td>
      <td>Value semantics prevent unexpected mutation</td>
    </tr>
    <tr>
      <td>Data/code confusion (EoP)</td>
      <td>Type safety, prepared statements</td>
    </tr>
    <tr>
      <td>Input validation</td>
      <td>“Validate, don’t sanitize”</td>
    </tr>
    <tr>
      <td>Trust the OS</td>
      <td>Use Go’s standard library security features</td>
    </tr>
    <tr>
      <td>Information disclosure</td>
      <td>Careful API design, minimal return values</td>
    </tr>
    <tr>
      <td>Denial of service</td>
      <td>Bounded resources, context timeouts</td>
    </tr>
  </tbody>
</table>

<p><strong>Shared insight:</strong> Both emphasize leveraging existing, trusted infrastructure rather than custom solutions.</p>

<p><strong>Why trust the OS:</strong></p>
<ul>
  <li>The OS provides security features so you can focus on your unique value proposition</li>
  <li>The OS runs with privileges not available to your program or attacker</li>
  <li>If the attacker controls the OS, you’re in a world of hurt anyway</li>
</ul>

<p>STRIDE maps directly to defensive coding:</p>
<ul>
  <li><strong>S → Authentication</strong> handled by OS/framework, not custom code</li>
  <li><strong>T → Integrity</strong> through immutability (value semantics)</li>
  <li><strong>I → Confidentiality</strong> through minimal exposure (return only needed data)</li>
  <li><strong>E → Authorization</strong> through type safety and sandboxing</li>
</ul>

<p><strong>Example: Context timeouts and DoS:</strong></p>

<p>Go’s <code class="language-plaintext highlighter-rouge">context.Context</code> with deadlines directly addresses denial-of-service threats:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Without timeout: vulnerable to slow clients</span>
<span class="k">func</span> <span class="n">handleRequest</span><span class="p">(</span><span class="n">r</span> <span class="o">*</span><span class="n">Request</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">result</span> <span class="o">:=</span> <span class="n">expensiveOperation</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">Data</span><span class="p">)</span>
    <span class="c">// ...</span>
<span class="p">}</span>

<span class="c">// With timeout: bounded resource consumption</span>
<span class="k">func</span> <span class="n">handleRequest</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">r</span> <span class="o">*</span><span class="n">Request</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
    <span class="n">ctx</span><span class="p">,</span> <span class="n">cancel</span> <span class="o">:=</span> <span class="n">context</span><span class="o">.</span><span class="n">WithTimeout</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="m">30</span><span class="o">*</span><span class="n">time</span><span class="o">.</span><span class="n">Second</span><span class="p">)</span>
    <span class="k">defer</span> <span class="n">cancel</span><span class="p">()</span>

    <span class="n">result</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">expensiveOperationWithContext</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">Data</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">err</span> <span class="c">// context deadline exceeded = DoS mitigated</span>
    <span class="p">}</span>
    <span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<hr />

<h2 id="17-glossary">17. Glossary</h2>

<table>
  <thead>
    <tr>
      <th>Term</th>
      <th>Definition</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Attack surface</strong></td>
      <td>Trust boundary + direction of potential attack</td>
    </tr>
    <tr>
      <td><strong>Attack tree</strong></td>
      <td>Hierarchical decomposition of attack goals</td>
    </tr>
    <tr>
      <td><strong>DFD</strong></td>
      <td>Data Flow Diagram—visual model showing data movement</td>
    </tr>
    <tr>
      <td><strong>STRIDE</strong></td>
      <td>Spoofing, Tampering, Repudiation, Info Disclosure, DoS, Elevation</td>
    </tr>
    <tr>
      <td><strong>Trust boundary</strong></td>
      <td>Where more than one principal interacts</td>
    </tr>
    <tr>
      <td><strong>Principal</strong></td>
      <td>Entity that can take action (user, process, system)</td>
    </tr>
    <tr>
      <td><strong>Mitigation</strong></td>
      <td>Action that makes an attack harder</td>
    </tr>
    <tr>
      <td><strong>Threat</strong></td>
      <td>Potential violation of a security property</td>
    </tr>
    <tr>
      <td><strong>Vulnerability</strong></td>
      <td>Specific weakness that enables a threat</td>
    </tr>
    <tr>
      <td><strong>CAPEC</strong></td>
      <td>Common Attack Pattern Enumeration and Classification</td>
    </tr>
    <tr>
      <td><strong>LINDDUN</strong></td>
      <td>Privacy threat framework (STRIDE mirror for privacy)</td>
    </tr>
    <tr>
      <td><strong>Elevation of Privilege</strong></td>
      <td>Both a STRIDE threat and a card game for threat modeling</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="18-key-quotes">18. Key Quotes</h2>

<blockquote>
  <p>“Threat modeling is the key to a focused defense. Without threat models, you can never stop playing whack-a-mole.”</p>
</blockquote>

<blockquote>
  <p>“In short, threat modeling is the use of abstractions to aid in thinking about risks.”</p>
</blockquote>

<blockquote>
  <p>“Your instincts are insufficient, and you’d need tools to help tackle the questions.”</p>
</blockquote>

<blockquote>
  <p>“If you think about building a house, decisions you make early will have dramatic effects on security.”</p>
</blockquote>

<blockquote>
  <p>“STRIDE is a tool to guide you to threats, not to ask you to categorize what you’ve found.”</p>
</blockquote>

<blockquote>
  <p>“Validate, don’t sanitize. Know what you expect to see… If you get something else, throw it away.”</p>
</blockquote>

<blockquote>
  <p>“Trust the operating system. The OS provides security features so you can focus on your unique value proposition.”</p>
</blockquote>

<blockquote>
  <p>“When there is an easy way to address a problem, you should skip strategizing and just address it.”</p>
</blockquote>

<blockquote>
  <p>“Any technical professional can learn to threat model. Threat modeling involves the intersection of two models: a model of what can go wrong (threats), applied to a model of the software you’re building.”</p>
</blockquote>

<blockquote>
  <p>“With a whiteboard diagram and a copy of Elevation of Privilege, developers can threat model software that they’re building, systems administrators can threat model software they’re deploying, and security professionals can introduce threat modeling to those with skillsets outside of security.”</p>
</blockquote>

<blockquote>
  <p>“The question ‘what’s your threat model?’ is a great one because in just four words, it can slice through many conundrums to determine what you are worried about.”</p>
</blockquote>]]></content><author><name></name></author><category term="security" /><category term="software-engineering" /><category term="threat-modeling" /><summary type="html"><![CDATA[A practical guide to threat modeling principles, extracted from Adam Shostack’s Threat Modeling: Designing for Security (2014).]]></summary></entry><entry><title type="html">It’s Been Eight Years Since NIST Said to Stop Rotating Passwords</title><link href="https://binaryphile.github.io/security/2026/04/07/its-been-eight-years-since-nist-said-to-stop-rotating-passwords.html" rel="alternate" type="text/html" title="It’s Been Eight Years Since NIST Said to Stop Rotating Passwords" /><published>2026-04-07T00:00:00+00:00</published><updated>2026-04-07T00:00:00+00:00</updated><id>https://binaryphile.github.io/security/2026/04/07/its-been-eight-years-since-nist-said-to-stop-rotating-passwords</id><content type="html" xml:base="https://binaryphile.github.io/security/2026/04/07/its-been-eight-years-since-nist-said-to-stop-rotating-passwords.html"><![CDATA[<p>In June 2017, NIST published <a href="https://pages.nist.gov/800-63-3/sp800-63b.html">SP 800-63B Rev 3</a> and told the world to
stop requiring periodic password changes. Eight years later, most
organizations still do it. In August 2025, NIST published <a href="https://pages.nist.gov/800-63-4/sp800-63b.html">Rev 4</a> and
upgraded that guidance from “you should stop” to “you must stop.”</p>

<p>This is the story of what changed, what it means for systems you build, and
what the actual requirements look like when you play them out as scenarios.</p>

<h2 id="the-old-world">The old world</h2>

<p>Before 2017, password policy was a checklist everyone knew by heart:</p>

<ul>
  <li>Change your password every 90 days</li>
  <li>Must contain uppercase, lowercase, digit, and special character</li>
  <li>Minimum 8 characters</li>
  <li>Can’t reuse any of your last 12 passwords</li>
</ul>

<p>Security teams enforced it. Auditors checked for it. Users hated it. And it
made passwords worse, not better.</p>

<h2 id="why-it-made-passwords-worse">Why it made passwords worse</h2>

<p>Every one of those rules has a specific failure mode. Here’s what actually
happens when you enforce them.</p>

<h3 id="forced-rotation-breeds-predictable-mutations">Forced rotation breeds predictable mutations</h3>

<p>A company requires 90-day password changes. Sarah, an account manager, has
been through this twelve times. Her current password is <code class="language-plaintext highlighter-rouge">Summer2024!</code>. In
October, the system forces a change. She types <code class="language-plaintext highlighter-rouge">Fall2024!</code>. In January,
<code class="language-plaintext highlighter-rouge">Winter2025!</code>.</p>

<p>An attacker obtains <code class="language-plaintext highlighter-rouge">Summer2024!</code> from a breach. They don’t try it directly —
they try the obvious seasonal mutations. <code class="language-plaintext highlighter-rouge">Fall2024!</code>, <code class="language-plaintext highlighter-rouge">Winter2024!</code>,
<code class="language-plaintext highlighter-rouge">Summer2025!</code>. They’re in within a handful of guesses.</p>

<p>But the damage starts before the breach. Sarah chose <code class="language-plaintext highlighter-rouge">Summer2024!</code> in
the first place <em>because</em> she knew it would expire. Why invest in memorizing
something strong when it’s gone in 90 days? Rotation discourages the upfront
investment in password quality that NIST is now explicitly trying to protect.</p>

<p>There’s a subtler cost too. Each rotation produces a “retired” password the
subscriber considers spent. At scale, retired passwords get recycled on
personal accounts, shared with colleagues, or written on sticky notes that
outlive the rotation window. This sounds like an edge case — and for any one
user it is. But this is security, where edge cases become certainties across
ten thousand accounts. Every rotation cycle produces a fresh crop of
unmanaged credentials floating in the wild. That exposure exists solely
because of the rotation policy.</p>

<p>NIST’s response: SHALL NOT require periodic password changes. Change only on
evidence of compromise.</p>

<p><em>(NIST uses <a href="https://www.rfc-editor.org/rfc/rfc2119">RFC 2119</a> requirement keywords: SHALL, SHALL NOT,
SHOULD, SHOULD NOT, MAY. Uppercase indicates a formal requirement level, not
emphasis.)</em></p>

<h3 id="composition-rules-produce-a-monoculture">Composition rules produce a monoculture</h3>

<p>A site requires uppercase, lowercase, digit, and special character. The
minimum is 8 characters. What does the average user type?</p>

<p><code class="language-plaintext highlighter-rouge">Password1!</code></p>

<p>Or <code class="language-plaintext highlighter-rouge">Welcome1!</code>. Or <code class="language-plaintext highlighter-rouge">Company1!</code>. Composition rules don’t increase entropy — the randomness that makes
a password hard to guess — they constrain the search space into a predictable shape. Attackers know the
shape. They try <code class="language-plaintext highlighter-rouge">[Word][Digit][Special]</code> patterns first.</p>

<p>NIST’s response: SHALL NOT impose composition rules.</p>

<h3 id="short-minimums-invite-brute-force">Short minimums invite brute force</h3>

<p>An 8-character password using the full ASCII printable set has about 52 bits
of entropy. That sounds like a lot until you consider that a modern GPU
cluster can test billions of password guesses per second against a
stolen password database. 8 characters falls in hours.</p>

<p>NIST’s response: SHALL require minimum 15 characters for single-factor
authentication. 8 characters only if a second factor is also required.</p>

<h3 id="blocking-paste-punishes-the-right-behavior">Blocking paste punishes the right behavior</h3>

<p>A site disables paste in the password field “for security.” The subscriber
who was about to paste a 40-character random string from their password
manager now has to type something they can remember. The security outcome
gets worse, not better.</p>

<p>NIST’s response: SHALL allow password managers and autofill. SHOULD
permit paste.</p>

<h3 id="no-blocklist-means-the-attackers-job-is-easy">No blocklist means the attacker’s job is easy</h3>

<p>A subscriber picks <code class="language-plaintext highlighter-rouge">123456</code> or <code class="language-plaintext highlighter-rouge">password</code> or <code class="language-plaintext highlighter-rouge">qwerty</code>. The system accepts it
because it meets the 8-character minimum (well, <code class="language-plaintext highlighter-rouge">password</code> does) and the
composition rules (it doesn’t, but many systems don’t actually enforce them
consistently).</p>

<p>Meanwhile, an attacker with a collection of 500 million passwords leaked from
previous breaches tries
the top 10,000. Most systems have at least a few accounts using them.</p>

<p>NIST’s response: SHALL compare prospective passwords against a blocklist
of breached passwords, dictionary words, sequential characters, and
context-specific terms.</p>

<h2 id="rev-3-vs-rev-4-from-recommendation-to-mandate">Rev 3 vs Rev 4: from recommendation to mandate</h2>

<p>Rev 3 (June 2017) said “SHOULD NOT” — recommended unless you have a
documented reason. Rev 4 (August 2025) says “SHALL NOT” — prohibited, no
exceptions.</p>

<table>
  <thead>
    <tr>
      <th>Requirement</th>
      <th>Rev 3 (2017)</th>
      <th>Rev 4 (2025)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Periodic rotation</td>
      <td>SHOULD NOT</td>
      <td>SHALL NOT</td>
    </tr>
    <tr>
      <td>Composition rules</td>
      <td>SHOULD NOT</td>
      <td>SHALL NOT</td>
    </tr>
    <tr>
      <td>Minimum length (single-factor)</td>
      <td>8 characters</td>
      <td><strong>15 characters</strong></td>
    </tr>
    <tr>
      <td>Password managers</td>
      <td>SHOULD permit paste</td>
      <td>SHALL allow managers + autofill</td>
    </tr>
    <tr>
      <td>Blocklist checking</td>
      <td>SHALL</td>
      <td>SHALL</td>
    </tr>
    <tr>
      <td>Strength guidance</td>
      <td>SHOULD offer</td>
      <td>SHALL offer</td>
    </tr>
  </tbody>
</table>

<p>The progression: “stop doing harmful things” became “you must stop doing
harmful things.”</p>

<h2 id="what-the-requirements-look-like-as-scenarios">What the requirements look like as scenarios</h2>

<p>I turned the Rev 4 guidance into use cases to see what a team actually needs
to build. Not a checklist of SHALLs — a set of scenarios showing what
happens when things go right and wrong, driven by how real subscribers and
real attackers behave.</p>

<p>NIST defines three Authentication Assurance Levels. AAL1 is password-only.
AAL2 requires two factors — a password plus something like a time-based one-time-password (TOTP)
app or a hardware security key. AAL3 requires two factors where one is a hardware cryptographic
device that resists phishing.</p>

<h3 id="setting-a-password">Setting a password</h3>

<p><strong>The happy path:</strong> A subscriber opens the password field and pastes a
64-character random string from their password manager. The system accepts it,
hashes it, stores the hash. Done.</p>

<p><strong>The attacker’s path:</strong> A different subscriber types <code class="language-plaintext highlighter-rouge">Company2025!</code> — a
predictable pattern that satisfies every legacy composition rule. The system
checks it against a blocklist of breached passwords. Found. Rejected. The
system explains why and suggests trying a passphrase. The subscriber tries
<code class="language-plaintext highlighter-rouge">correct horse battery staple</code> (16 characters, no special characters, no
uppercase). The system accepts it — length and unpredictability matter more
than character variety.</p>

<p><strong>The edge case:</strong> A subscriber tries to set a 6-character password. Rejected
— below the 15-character minimum for single-factor, or 8-character minimum
with MFA. They try <code class="language-plaintext highlighter-rouge">aaaaaaaaaaaaaaa</code> — 15 characters but sequential.
Rejected. They try their username with digits appended. Rejected —
context-specific.</p>

<p><strong>The infrastructure failure:</strong> The blocklist service is down. The system
cannot verify the password against breached corpuses. Rather than accept a
potentially compromised password (fail-open), the system refuses the change
and asks the subscriber to try again later.</p>

<h3 id="authentication">Authentication</h3>

<p><strong>The happy path:</strong> Subscriber submits username and password. The system
runs the submitted password through the same one-way hashing process
used when the password was stored, and compares the results. Match.
Session created.</p>

<p><strong>The attacker’s path — credential stuffing:</strong> An attacker has a list of
username/password pairs from a breach at another service. They try each one.
After 100 consecutive failures on a single account, the system requires
additional verification — a CAPTCHA, a temporary lockout with recovery, or
escalating delays. The account is never permanently locked, because permanent
lockout is a denial-of-service weapon the attacker can use against legitimate
users.</p>

<p><strong>The attacker’s path — user enumeration:</strong> The attacker tries a username
that doesn’t exist. The system performs a dummy hash computation so the
response time is identical to a real account. The error message is generic —
“invalid username or password.” The attacker learns nothing about whether the
account exists.</p>

<p><strong>The MFA path:</strong> Account is AAL2. Password validates. The system prompts for
a second factor. The subscriber provides a TOTP code from their authenticator
app. Valid. Session created. If the subscriber’s device is lost, they use a
recovery code or alternative factor — the system doesn’t fall back to
password-only.</p>

<h3 id="sessions">Sessions</h3>

<p><strong>The happy path:</strong> After authentication, the system generates a session
token — a random identifier that proves “this browser is logged in” —
with enough randomness to be unguessable. It’s delivered over an encrypted connection, never
embedded in URLs. The subscriber works. When done, they log out. The system
invalidates the session server-side — not just deleting the cookie.</p>

<p><strong>The absent subscriber:</strong> The subscriber walks away. After 30 minutes of
inactivity, the session expires. After 12 hours regardless of activity, the
session expires. Both timeouts are adjustable by assurance level — higher-risk
systems use shorter windows.</p>

<p><strong>The attacker’s path — session hijacking:</strong> An attacker obtains a session
token (perhaps through a compromised network or XSS vulnerability).
They replay it from a different IP and user-agent. The system flags
the anomaly and may invalidate the session or require reauthentication.</p>

<h3 id="compromise-response">Compromise response</h3>

<p><strong>The detection path:</strong> A breach monitoring service flags a subscriber’s
password as appearing in a newly published breach corpus. The system marks the
account for mandatory password change.</p>

<p><strong>The subscriber’s path:</strong> Next login, the subscriber authenticates (the
compromised password works this one last time), then is forced to choose a
new password before getting a session. They cannot reuse the compromised
password. The system does not just suggest a change — it requires one.</p>

<p><strong>The absent subscriber:</strong> The subscriber doesn’t log in for weeks. The
account stays flagged. Whenever they return, the forced change applies. The
system doesn’t age out the flag.</p>

<p><strong>The worst case:</strong> The attacker already used the compromised password to
change it. The subscriber can’t log in. Account recovery kicks in — and
recovery must not bypass the account’s assurance level. An AAL2 account
requires two-factor recovery, not just an email link.</p>

<h3 id="why-rotation-doesnt-appear-here">Why rotation doesn’t appear here</h3>

<p>Notice what’s absent from every scenario: periodic expiration. No 90-day
timer. No “your password is about to expire” banner. The only forced change
is on evidence of compromise — a specific, concrete signal that the current
password is no longer secret.</p>

<p>Rotation is absent because it makes every other scenario worse. It makes
subscribers choose weaker passwords. It makes their passwords more
predictable. It trains them to make minimal changes. And it provides zero
protection against the actual threat — an attacker who already has the
password.</p>

<h2 id="whats-still-missing-from-most-organizations">What’s still missing from most organizations</h2>

<p>Eight years after Rev 3, here’s what I still see:</p>

<ul>
  <li>90-day rotation policies</li>
  <li>Composition rules (uppercase + digit + special)</li>
  <li>Paste disabled in password fields</li>
  <li>8-character minimums with no blocklist checking</li>
  <li>“Security questions” as account recovery</li>
</ul>

<p>Every one of these is now explicitly prohibited or deprecated by the current
NIST standard. Not “not recommended.” Prohibited.</p>

<p>If your organization follows NIST — and if you’re a federal agency or
contractor, you must — Rev 4 leaves no room for interpretation. If you don’t
follow NIST but use it as a reference, Rev 4 is still the strongest signal
available that these practices are counterproductive.</p>

<p>The standard is <a href="https://pages.nist.gov/800-63-4/sp800-63b.html">free and online</a>. The <a href="https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver">password verifier
section</a> is the part that matters most. Read it. Then go check
what your systems actually enforce.</p>

<h2 id="references">References</h2>

<ul>
  <li><a href="https://pages.nist.gov/800-63-4/sp800-63b.html">NIST SP 800-63B Rev 4</a> (August 2025) — the current standard</li>
  <li><a href="https://pages.nist.gov/800-63-3/sp800-63b.html">NIST SP 800-63B Rev 3</a> (June 2017) — the paradigm shift</li>
  <li><a href="https://pages.nist.gov/800-63-4/sp800-63b.html#passwordver">Password Verifiers section</a> — the specific requirements</li>
</ul>

<hr />

<h2 id="appendix-formal-use-cases">Appendix: formal use cases</h2>

<p>The scenarios above, formalized as Cockburn-style use cases. These are
designed to be cut and pasted as a standalone requirements document. Each
NIST requirement appears as the scenario that motivated it — an attacker
exploiting a weakness, a subscriber hitting a wall, or a system failing
to protect its users.</p>

<p>Derived from <a href="https://pages.nist.gov/800-63-4/sp800-63b.html">NIST SP 800-63B Rev 4</a> (August 2025).</p>

<h3 id="system-scope">System Scope</h3>

<p><strong>System:</strong> Verifier — the authentication subsystem that validates subscriber credentials, manages sessions, and enforces credential policy.</p>

<h3 id="actors">Actors</h3>

<p><strong>Subscriber:</strong> End user who authenticates. May memorize passwords or use a password manager.</p>

<p><strong>Verifier:</strong> The system under design. Validates credentials, manages sessions.</p>

<p><strong>Attacker:</strong> Adversary with breach corpuses, password lists, and knowledge of common user behavior. Methods: credential stuffing, brute force, mutation guessing, phishing, session hijacking, social engineering of recovery flows.</p>

<hr />

<h3 id="uc-1-set-an-appropriate-secret">UC-1: Set an Appropriate Secret</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Set a password the subscriber can use to authenticate</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Subscriber creates an account or changes their password</li>
  <li><strong>Preconditions:</strong> Identity proofed (enrollment) or authenticated session (change)</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants a password they can use to get in</li>
      <li>Verifier — wants a password that resists guessing even if the hash database is stolen</li>
      <li>Attacker — wants subscribers to choose predictable passwords or reuse breached ones</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Subscriber enters a password</li>
      <li>Verifier validates the password length (15+ for single-factor, 8+ with MFA)</li>
      <li>Verifier validates the password against the blocklist (UC-2)</li>
      <li>Verifier hashes and stores the password (UC-3)</li>
      <li>Verifier confirms the password is set</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>1a. <em>Subscriber pastes from a password manager:</em>
Verifier accepts paste and autofill. The password is random and non-memorizable — the manager stores it. Continue step 2.</li>
      <li>2a. <em>Password is too short:</em>
Verifier rejects and provides guidance. Resume step 1.</li>
      <li>2b. <em>Verifier imposes composition rules (uppercase, digit, special):</em>
This forces predictable patterns — <code class="language-plaintext highlighter-rouge">Password1!</code>, <code class="language-plaintext highlighter-rouge">Company2025!</code>. Attacker exploits the pattern with mutation lists. Composition rules are prohibited. Verifier accepts any character mix.</li>
      <li>3a. <em>Password found in a breach corpus:</em>
Attacker already has this password. Verifier rejects and explains why. Resume step 1.</li>
      <li>3b. <em>Password is a dictionary word, sequential, or contains the username:</em>
Attacker tries these first. Verifier rejects. Resume step 1.</li>
      <li>3c. <em>Blocklist service unavailable:</em>
Accepting the password would leave the account vulnerable to credential stuffing. Verifier refuses the change and asks subscriber to retry later. Fail.</li>
      <li>4a. <em>Storage fails:</em>
No password stored. Resume step 1.</li>
      <li><em>a. *System requires periodic rotation (90-day policy):</em>
Subscriber mutates <code class="language-plaintext highlighter-rouge">Summer2024!</code> to <code class="language-plaintext highlighter-rouge">Fall2024!</code>. Attacker who has the old password guesses the new one in a handful of tries. Forced rotation is prohibited — change only on evidence of compromise.</li>
    </ul>
  </li>
  <li><strong>Technology &amp; Data Variations:</strong>
    <ul>
      <li>Password manager: subscriber generates a random, non-memorizable password. The secret is persisted, not memorized. Failure mode is lost manager, not forgotten password.</li>
      <li>Unicode normalization: NFKC or NFKD before hashing</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> No password is stored unless it passes all validation.</li>
  <li><strong>Success Guarantee:</strong> Password is stored as a salted hash; subscriber can authenticate with it.</li>
</ul>

<hr />

<h3 id="uc-2-validate-password-against-blocklist">UC-2: Validate Password Against Blocklist</h3>

<ul>
  <li><strong>Primary Actor:</strong> Verifier (automated)</li>
  <li><strong>Goal:</strong> Reject passwords an attacker already knows</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> Subfunction (called by UC-1)</li>
  <li><strong>Trigger:</strong> Subscriber submits a new password</li>
  <li><strong>Preconditions:</strong> Blocklist sources loaded</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants clear feedback if rejected</li>
      <li>Attacker — has breach corpuses with hundreds of millions of passwords; tries the top candidates first</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Verifier normalizes the password for comparison</li>
      <li>Verifier checks against breach corpuses, dictionary words, sequential/repetitive strings, and context-specific terms (service name, username)</li>
      <li>Password not found; verifier accepts it</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>Password found in breach corpus:</em>
This password is in the attacker’s list. Verifier rejects and explains why. UC-1 resumes at step 1.</li>
      <li>2b. <em>Password is a common dictionary word:</em>
Attacker tries dictionary words early. Verifier rejects. UC-1 resumes at step 1.</li>
      <li>2c. <em>Password is sequential or repetitive (<code class="language-plaintext highlighter-rouge">123456</code>, <code class="language-plaintext highlighter-rouge">aaaaaa</code>):</em>
Trivially guessable. Verifier rejects. UC-1 resumes at step 1.</li>
      <li>2d. <em>Password contains the username or service name:</em>
Attacker targets context-specific passwords. Verifier rejects. UC-1 resumes at step 1.</li>
      <li>2e. <em>Blocklist service unavailable, no cache:</em>
Verifier cannot ensure the password isn’t compromised. Rejects and asks subscriber to retry. Fail.</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> No password an attacker already has is accepted.</li>
  <li><strong>Success Guarantee:</strong> Only passwords absent from all blocklist sources proceed to storage.</li>
</ul>

<hr />

<h3 id="uc-3-store-a-password">UC-3: Store a Password</h3>

<ul>
  <li><strong>Primary Actor:</strong> Verifier (automated)</li>
  <li><strong>Goal:</strong> Store the password so it resists offline cracking if the database is stolen</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> Subfunction (called by UC-1)</li>
  <li><strong>Trigger:</strong> Password passed validation</li>
  <li><strong>Preconditions:</strong> Password in memory, not yet persisted</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants their credential safe even if the database is breached</li>
      <li>Attacker — has stolen the hash database and will attempt offline cracking with GPU clusters</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Verifier generates a random salt</li>
      <li>Verifier hashes the password using an approved hashing scheme with a high cost factor</li>
      <li>Verifier stores the hash and salt</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>Attacker steals the hash database:</em>
With a weak hash (MD5, SHA-1, fast PBKDF2), the attacker cracks most passwords in hours. With a memory-hard scheme and high cost factor, each guess is expensive. The cost factor should be as high as practical without degrading login performance.</li>
      <li>2b. <em>Pepper available:</em>
Verifier applies an additional keyed hash with a secret stored separately. Even if the database is stolen, the attacker also needs the pepper. Continue step 3.</li>
      <li>3a. <em>Database write fails:</em>
Password not stored. Subscriber informed. UC-1 may retry.</li>
    </ul>
  </li>
  <li><strong>Technology &amp; Data Variations:</strong>
    <ul>
      <li>Approved hashing schemes per NIST SP 800-132</li>
      <li>Salt: at least 32 bits from approved random source</li>
      <li>Pepper: optional, stored in HSM or separate key store</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Plaintext password is never persisted.</li>
  <li><strong>Success Guarantee:</strong> Password stored as salted hash that resists offline cracking.</li>
</ul>

<hr />

<h3 id="uc-4-authenticate-with-password">UC-4: Authenticate with Password</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Prove identity to the verifier</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Subscriber initiates login</li>
  <li><strong>Preconditions:</strong> Subscriber has a registered password; connection is encrypted</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants to log in quickly</li>
      <li>Verifier — wants to confirm identity without leaking information to attackers</li>
      <li>Attacker — has breached credential lists; wants to stuff, guess, or enumerate</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Subscriber submits username and password</li>
      <li>Verifier retrieves stored hash and salt</li>
      <li>Verifier validates the submitted password against the stored hash</li>
      <li>Verifier establishes an authenticated session (UC-7)</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>Account does not exist:</em>
Attacker is enumerating usernames. Verifier performs a dummy hash computation so response time is identical to a real account. Returns generic error. UC-5 applies. Resume step 1.</li>
      <li>3a. <em>Password does not match:</em>
Generic error — does not reveal whether the username or password was wrong. UC-5 rate limiting applies. Resume step 1.</li>
      <li>3b. <em>Account requires MFA (AAL2+):</em>
Password alone isn’t enough. Verifier prompts for second factor (UC-6). Session created after UC-6 succeeds.</li>
      <li>3c. <em>Account is temporarily locked (UC-5):</em>
Attacker triggered the lockout with repeated guesses. Verifier informs subscriber of recovery options. Fail.</li>
      <li>3d. <em>Attacker uses credential stuffing (username/password pairs from another breach):</em>
Rate limiting (UC-5) caps attempts per account. Attacker cannot scale beyond the threshold without triggering lockout or CAPTCHA.</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Failed attempts are logged and rate-limited. No information leaked about account existence or which factor failed.</li>
  <li><strong>Success Guarantee:</strong> Subscriber is authenticated; session established at the required AAL.</li>
</ul>

<hr />

<h3 id="uc-5-rate-limit-authentication-attempts">UC-5: Rate-Limit Authentication Attempts</h3>

<ul>
  <li><strong>Primary Actor:</strong> Verifier (automated)</li>
  <li><strong>Goal:</strong> Make online guessing impractical without permanently locking out legitimate subscribers</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> Subfunction (called by UC-4)</li>
  <li><strong>Trigger:</strong> Failed authentication attempt</li>
  <li><strong>Preconditions:</strong> Per-account failure counter maintained</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — does not want to be permanently locked out of their own account</li>
      <li>Attacker — wants unlimited guessing attempts; also wants to weaponize lockout as denial-of-service</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Verifier increments the per-account failure counter</li>
      <li>Verifier evaluates the counter against the threshold and allows the attempt</li>
      <li>Subscriber eventually authenticates; counter resets</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>Threshold reached (100 consecutive failures):</em>
Verifier applies throttling — escalating delays, CAPTCHA, or temporary lockout. Resume step 2 after throttle clears.</li>
      <li>2b. <em>Attacker uses lockout as denial-of-service:</em>
Permanent lockout would let the attacker lock out any account by failing 100 times. Account is never permanently locked. Recovery mechanism always available.</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Account is never permanently locked.</li>
  <li><strong>Success Guarantee:</strong> Online guessing is impractical within the rate limits.</li>
</ul>

<hr />

<h3 id="uc-6-authenticate-with-second-factor">UC-6: Authenticate with Second Factor</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Provide a second authentication factor for AAL2+ access</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Verifier requires MFA after password verification</li>
  <li><strong>Preconditions:</strong> First factor verified; second factor registered</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants convenient but secure second factor</li>
      <li>Attacker — wants to bypass the second factor via phishing, SIM swap, or device theft</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Verifier prompts for second factor</li>
      <li>Subscriber provides a cryptographic assertion, OTP code, or push approval</li>
      <li>Verifier validates the second factor</li>
      <li>Verifier confirms authentication intent — subscriber consciously approved</li>
      <li>Authentication succeeds; session established (UC-7)</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>Subscriber’s device is lost or broken:</em>
Subscriber uses an alternative registered factor or initiates recovery (UC-9). Fail for this UC.</li>
      <li>3a. <em>OTP code reused (replay):</em>
Attacker intercepted a valid code and replays it. Each code is single-use. Verifier rejects. Resume step 1.</li>
      <li>3b. <em>Attacker phishes the second factor:</em>
At AAL2, phishing may succeed with OTP codes. At AAL3, hardware cryptographic authenticators with verifier impersonation resistance make phishing structurally impossible.</li>
      <li>3c. <em>Attacker SIM-swaps to intercept SMS OTP:</em>
SMS OTP is permitted at AAL2 but restricted — should not be the sole option where alternatives exist. Prohibited at AAL3.</li>
      <li>4a. <em>No authentication intent:</em>
Subscriber must consciously approve, not just possess the device. Verifier rejects without intent. Resume step 1.</li>
    </ul>
  </li>
  <li><strong>Technology &amp; Data Variations:</strong>
    <ul>
      <li>AAL2: password + any second factor (TOTP, hardware key, push)</li>
      <li>AAL3: password + hardware cryptographic authenticator providing verifier impersonation resistance</li>
      <li>SMS OTP: permitted at AAL2 (restricted), prohibited at AAL3</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Authentication does not succeed without a valid second factor at AAL2+.</li>
  <li><strong>Success Guarantee:</strong> Two distinct factors verified; authentication intent confirmed.</li>
</ul>

<hr />

<h3 id="uc-7-use-an-authenticated-session">UC-7: Use an Authenticated Session</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Maintain authenticated access for the duration of a work session</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Successful authentication</li>
  <li><strong>Preconditions:</strong> Authentication completed at the required AAL</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants persistent access; wants to log out when done</li>
      <li>Attacker — wants to steal, replay, or fixate session tokens</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Verifier generates a session token with enough randomness to be unguessable</li>
      <li>Verifier delivers the token over an encrypted connection</li>
      <li>Subscriber makes authenticated requests</li>
      <li>Subscriber logs out</li>
      <li>Verifier invalidates the session server-side</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>3a. <em>Subscriber walks away (inactivity timeout):</em>
Session expires. Subscriber must reauthenticate (UC-4). Resume step 1.</li>
      <li>3b. <em>Absolute timeout reached (e.g., 12 hours):</em>
Session expires regardless of activity. Prevents stolen tokens from being useful indefinitely. Resume step 1.</li>
      <li>3c. <em>Attacker steals the session token:</em>
Token was embedded in a URL and leaked via referrer header, or extracted via XSS. Token must never be in URLs. Session tokens must be delivered only over encrypted connections.</li>
      <li>3d. <em>Attacker replays token from different context:</em>
Verifier flags anomalous IP or user-agent. May invalidate session or require reauthentication.</li>
      <li>5a. <em>Subscriber only deletes the cookie client-side:</em>
Session remains valid server-side. Attacker who obtained the token can still use it. Logout must invalidate server-side.</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Session is always invalidated on logout or timeout. Server-side invalidation.</li>
  <li><strong>Success Guarantee:</strong> Session is maintained while active, terminated cleanly on logout or timeout.</li>
</ul>

<hr />

<h3 id="uc-8-restore-account-security-after-compromise">UC-8: Restore Account Security After Compromise</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Replace a compromised password and restore the account to a secure state</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Subscriber is informed their password must be changed</li>
  <li><strong>Preconditions:</strong> Verifier has flagged the password as compromised</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants to regain security without losing access</li>
      <li>Attacker — wants to use the compromised credential before it’s changed; may have already changed it</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Subscriber attempts to log in</li>
      <li>Verifier authenticates the subscriber</li>
      <li>Verifier forces password change before granting session</li>
      <li>Subscriber chooses a new password (UC-1)</li>
      <li>Verifier invalidates the compromised password and prevents its reuse</li>
      <li>Verifier grants session with new password</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>1a. <em>Attacker already changed the password:</em>
Subscriber is locked out. Account recovery (UC-9) required. Fail for this UC.</li>
      <li>1b. <em>Subscriber doesn’t log in for weeks:</em>
Flag persists. Forced change applies whenever they return.</li>
      <li>4a. <em>Subscriber tries to reuse the compromised password:</em>
Attacker who obtained the old password could guess the subscriber would try to keep it. Reuse is prohibited. Resume step 4.</li>
      <li><em>a. *System triggers this change on a 90-day timer instead of breach evidence:</em>
This is forced rotation — it produces the mutation problem described in UC-1 ext *a. Change is forced only on evidence of compromise, never on a calendar.</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Compromised password cannot be used after the forced-change login.</li>
  <li><strong>Success Guarantee:</strong> New password set; compromised credential permanently invalidated.</li>
</ul>

<hr />

<h3 id="uc-9-recover-account">UC-9: Recover Account</h3>

<ul>
  <li><strong>Primary Actor:</strong> Subscriber</li>
  <li><strong>Goal:</strong> Regain access when the primary authenticator is lost or forgotten</li>
  <li><strong>Scope:</strong> Verifier</li>
  <li><strong>Level:</strong> User goal</li>
  <li><strong>Trigger:</strong> Subscriber cannot authenticate</li>
  <li><strong>Preconditions:</strong> Recovery mechanism registered</li>
  <li><strong>Stakeholders:</strong>
    <ul>
      <li>Subscriber — wants to regain access without excessive friction</li>
      <li>Attacker — wants to hijack the account by social-engineering the recovery flow</li>
    </ul>
  </li>
  <li><strong>Main Success Scenario:</strong>
    <ol>
      <li>Subscriber initiates recovery</li>
      <li>Verifier presents recovery challenge appropriate to the account’s AAL</li>
      <li>Subscriber provides recovery codes or alternative second factor</li>
      <li>Verifier validates and grants limited access (password change only)</li>
      <li>Subscriber sets new password (UC-1) and registers new authenticators if needed</li>
      <li>Verifier notifies subscriber that authenticators were changed</li>
    </ol>
  </li>
  <li><strong>Extensions:</strong>
    <ul>
      <li>2a. <em>AAL2+ account, attacker tries email-only recovery:</em>
Email alone would bypass the second factor. Recovery must match the account’s assurance level. AAL2 requires recovery codes or alternative MFA. Fail for email-only at AAL2+.</li>
      <li>3a. <em>Recovery code already used:</em>
Codes are single-use. Attacker who obtained one code cannot reuse it. Resume step 3 with another code.</li>
      <li>3b. <em>All recovery codes exhausted:</em>
Subscriber contacts support. Re-enrollment at original identity proofing level. Fail for automated recovery.</li>
      <li>3c. <em>Attacker attempts social-engineering:</em>
Recovery requires a registered mechanism, not human judgment. Automated flow rejects. Fail.</li>
      <li>6a. <em>Subscriber did not initiate the change:</em>
Notification alerts subscriber to potential takeover. Subscriber can lock account.</li>
    </ul>
  </li>
  <li><strong>Technology &amp; Data Variations:</strong>
    <ul>
      <li>AAL1: email-based recovery acceptable</li>
      <li>AAL2+: recovery codes or alternative MFA required</li>
    </ul>
  </li>
  <li><strong>Minimal Guarantee:</strong> Recovery never downgrades the account’s assurance level.</li>
  <li><strong>Success Guarantee:</strong> Subscriber regains access with fresh credentials at the original AAL.</li>
</ul>]]></content><author><name></name></author><category term="security" /><summary type="html"><![CDATA[In June 2017, NIST published SP 800-63B Rev 3 and told the world to stop requiring periodic password changes. Eight years later, most organizations still do it. In August 2025, NIST published Rev 4 and upgraded that guidance from “you should stop” to “you must stop.”]]></summary></entry><entry><title type="html">Why 95% Utilization Feels Broken: A Queue Demo, Three Review Rounds, and a Better Model</title><link href="https://binaryphile.github.io/development/2026/03/28/why-95-percent-utilization-feels-broken.html" rel="alternate" type="text/html" title="Why 95% Utilization Feels Broken: A Queue Demo, Three Review Rounds, and a Better Model" /><published>2026-03-28T00:00:00+00:00</published><updated>2026-03-28T00:00:00+00:00</updated><id>https://binaryphile.github.io/development/2026/03/28/why-95-percent-utilization-feels-broken</id><content type="html" xml:base="https://binaryphile.github.io/development/2026/03/28/why-95-percent-utilization-feels-broken.html"><![CDATA[<p>A queue at 95% target load is mathematically stable. A dashboard says fine.
Watch it run and your gut says broken. That gap is where queuing intuition
fails.</p>

<p>I built a terminal demo with Claude to show this. I designed the teaching
progression and the analogies. Claude wrote the implementation. The demo looked
right after the first draft. Three rounds of adversarial external review proved
it was teaching wrong lessons confidently.</p>

<h2 id="what-the-demo-teaches">What the demo teaches</h2>

<p>Target load is the ratio of arrival rate to service rate, written ρ (rho) in
queuing theory.</p>

<p>Three metrics tell you how a queue behaves. <strong>Throughput</strong> is how many
customers walk out the door per hour. <strong>Flow time</strong> is how long you’re on premises — from the moment you get in
line to the moment you leave with your order.
<strong>WIP</strong> (work in process) is everyone currently in the building — waiting in
line plus being served. Little’s Law ties them together: flow time = WIP /
throughput. When one gets worse, the others move with it.</p>

<p>The sparklines below show WIP over time. The number at the end is average flow
time. Those are the metrics to watch as we add complexity.</p>

<p>Each step removes one simplification: the gate, perfect regularity, randomness
on one side, both sides, the remaining headroom.</p>

<p><strong>Start with no randomness.</strong> A sushi boat. The chef places a plate, it
circles to you, you grab it, the empty spot comes back. Nobody arrives until
there’s room. No queue is possible because arrivals are gated by departures.
That’s lockstep — a gated handoff, not a standard open queue.</p>

<p>Now remove the gate. A merry-go-round: kids show up every 3.3 minutes whether
or not a horse is free, but each ride takes exactly 3. Arrivals are independent
of departures for the first time. A queue could form — arrivals no longer
wait for an opening. It doesn’t, because the timing is still perfectly regular.
Queuing theory calls this D/D/1 — deterministic arrivals, deterministic
service, one server. This system stays stable as long as arrivals come slower
than service completes. That condition — arrival rate below service rate, or
ρ &lt; 1 — is what makes any queuing model stable. When it holds, the queue
doesn’t grow without bound. When it doesn’t, no amount of buffering saves you.</p>

<p>In the sparklines below, the low bar (▁) is the baseline — zero WIP. Taller
blocks mean more customers in the system.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                         WIP over time                                TP      avg WIP  avg flow
Lockstep:               ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  20/hr   0.0      —
Fixed Schedule (D/D/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  16.5/hr 0.0      0.0min
</code></pre></div></div>

<p>Flat lines. No waiting. Simple and predictable, but nothing in production
looks like this.</p>

<p><strong>Add randomness to one side.</strong> A coffee shop. Every drink takes exactly 3
minutes. But customers arrive unpredictably — two walk in together, then
nobody for ten minutes. The server can’t absorb the bursts instantly. It forms
and drains. That’s variable arrivals, fixed service (M/D/1).</p>

<p>Flip it. A dentist with appointments every 30 minutes. Most visits take 25.
Some run to 40. The patient who arrives on time for the next slot waits because
the previous one ran over. That’s fixed arrivals, variable service (D/M/1).
Either source of variability alone creates queues, even when the server is fast
enough on average.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                          WIP over time                                TP      avg WIP  avg flow
Random Arrivals (M/D/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▂▁▁  16.1/hr 0.6      2.1min
Random Service (D/M/1):  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▁▁  17.0/hr 0.6      2.0min
</code></pre></div></div>

<p>Average demand is 10% below capacity. Occasional queuing is nevertheless
visible.</p>

<p><strong>Add randomness to both sides.</strong> A food truck. Customers show up whenever.
Some order a taco, some a custom burrito. Neither side is predictable.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                            WIP over time                                TP      avg WIP  avg flow
Random Everything (M/M/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▂▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▄▃▃▁▁  15.7/hr 0.8      3.2min
</code></pre></div></div>

<p>That’s M/M/1. Same target load. Average flow time jumped from ~2 min to 3.2.</p>

<p><strong>Push the load.</strong> Same model, target load raised from 0.90 to 0.95. Then past
capacity to 1.5 — demand exceeds service and the backlog grows.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                              WIP over time                                TP      avg WIP  avg flow
Near Full (M/M/1, ρ=0.95):  ▁▁▁▁▁▁▁▁▁▁▂▃▂▁▁▁▁▁▁▁▃▃▄▂▁▂▃▄▃▂▅▁▃▃▁▁▁▂▁▁  16.2/hr 1.6      5.8min
Overloaded (M/M/1, ρ=1.5):  ▁▂▂▂▃▃▃▂▁▂▂▂▁▁▁▂▂▂▃▅▅▅▃▃▃▃▃▃▃▂▄▅▆▇▇▇▅▅▅▇  21.5/hr 4.0      7.4min*
</code></pre></div></div>

<p>* Overloaded wait counts only completed customers. Those still queued at the
time horizon are excluded. This understates congestion.</p>

<p>Five percentage points of load. Nearly 2x the flow time. “95% utilized” sounds like
5% less headroom.</p>

<p>The overloaded sparkline climbs and doesn’t come back.</p>

<p>In steady state, near-full is far worse than this demo shows. M/M/1 theory
predicts about 57 minutes of average flow time at ρ=0.95 with 3-minute mean
service. The demo’s 5.8 minutes reflects a short cold-start run that never
reaches that regime. The nonlinear pain is real. The demo understates it.</p>

<p>Stable scenarios run all customers to completion before measuring. Overloaded
runs for a fixed time horizon. The full comparison:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Scenario                        │ target ρ │ peak WIP │ avg WIP │ avg flow
─────────────────────────────────────────────────────────────────────
Lockstep                        │      —   │      0 │   0.0 │        —
Fixed Schedule (D/D/1)          │    0.90  │      0 │   0.0 │   0.0min
Random Arrivals (M/D/1)         │    0.90  │      4 │   0.6 │   2.1min
Random Service (D/M/1)          │    0.90  │      4 │   0.6 │   2.0min
Random Everything (M/M/1)       │    0.90  │      5 │   0.8 │   3.2min
Near Full (M/M/1)               │    0.95  │      6 │   1.6 │   5.8min
Overloaded (M/M/1)              │    1.50  │     10 │   4.0 │   7.4min*
</code></pre></div></div>

<p>These lessons are only as trustworthy as the simulation behind them. The first
version looked plausible and was subtly wrong.</p>

<h2 id="three-review-rounds-that-made-it-trustworthy">Three review rounds that made it trustworthy</h2>

<p>Each round: I sent the current plan to an external AI reviewer for adversarial
grading, evaluated the feedback, decided what to change, and had Claude
implement the fix.</p>

<h3 id="round-1-target-load-10-has-no-steady-state">Round 1: target load 1.0 has no steady state</h3>

<p>I’d chosen target load 1.0 as baseline. Capacity equals demand. Natural
starting point.</p>

<p>M/M/1 at load 1.0 has no stationary distribution. Mean queue length is
infinite. In a 50-customer run, the specific random path dominates the results,
not the underlying process. We were demonstrating seed sensitivity, not queuing
theory.</p>

<p>I changed it to target load 0.9 for stochastic scenarios. Added the near-full
scenario at 0.95. Overloaded at 1.5, where the demo doesn’t claim steady
state.</p>

<p><strong>Principle:</strong> The obvious parameter made validation impossible.</p>

<h3 id="round-2-you-cant-verify-what-you-assumed">Round 2: you can’t verify what you assumed</h3>

<p>Two catches.</p>

<p><strong>Circular Little’s Law.</strong> The implementation computed flow time from
WIP / throughput, then “verified” that WIP = throughput * flow time. That’s
algebra, not verification.</p>

<p>The fix: timestamp each customer independently. Compute flow time from
timestamps. Compute average WIP from event-time integration. Check whether
WIP = throughput * flow time. The ratio is 1.00 (within rounding) for every
stable scenario:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Little's Law consistency check (WIP ≈ TP × FT):

Random Arrivals (M/D/1)          WIP=0.55  TP×FT=0.55  ratio=1.00
Random Service (D/M/1)           WIP=0.58  TP×FT=0.58  ratio=1.00
Random Everything (M/M/1)        WIP=0.84  TP×FT=0.84  ratio=1.00
Near Full (M/M/1, ρ=0.95)        WIP=1.57  TP×FT=1.57  ratio=1.00
</code></pre></div></div>

<p>A consistency check, not external validation. But when one side was derived
from the other, even this check was impossible.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Flow time -- filter completed, map to duration, average.</span>
<span class="n">completed</span> <span class="o">:=</span> <span class="n">slice</span><span class="o">.</span><span class="n">From</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">customers</span><span class="p">)</span><span class="o">.</span><span class="n">KeepIf</span><span class="p">(</span><span class="n">customer</span><span class="o">.</span><span class="n">IsCompleted</span><span class="p">)</span>
<span class="n">flowTimes</span> <span class="o">:=</span> <span class="n">completed</span><span class="o">.</span><span class="n">ToFloat64</span><span class="p">(</span><span class="n">customer</span><span class="o">.</span><span class="n">FlowTime</span><span class="p">)</span>
<span class="n">m</span><span class="o">.</span><span class="n">avgFlow</span> <span class="o">=</span> <span class="n">flowTimes</span><span class="o">.</span><span class="n">Sum</span><span class="p">()</span> <span class="o">/</span> <span class="kt">float64</span><span class="p">(</span><span class="n">completed</span><span class="o">.</span><span class="n">Len</span><span class="p">())</span>

<span class="c">// integrateWIP accumulates area under the WIP curve.</span>
<span class="k">type</span> <span class="n">wipState</span> <span class="k">struct</span><span class="p">{</span> <span class="n">area</span><span class="p">,</span> <span class="n">prevTime</span> <span class="kt">float64</span><span class="p">;</span> <span class="n">prevWIP</span> <span class="kt">int</span> <span class="p">}</span>
<span class="n">integrateWIP</span> <span class="o">:=</span> <span class="k">func</span><span class="p">(</span><span class="n">s</span> <span class="n">wipState</span><span class="p">,</span> <span class="n">e</span> <span class="n">logEntry</span><span class="p">)</span> <span class="n">wipState</span> <span class="p">{</span>
    <span class="n">dt</span> <span class="o">:=</span> <span class="n">e</span><span class="o">.</span><span class="n">time</span> <span class="o">-</span> <span class="n">s</span><span class="o">.</span><span class="n">prevTime</span>
    <span class="k">return</span> <span class="n">wipState</span><span class="p">{</span><span class="n">s</span><span class="o">.</span><span class="n">area</span> <span class="o">+</span> <span class="kt">float64</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">prevWIP</span><span class="p">)</span><span class="o">*</span><span class="n">dt</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">time</span><span class="p">,</span> <span class="n">e</span><span class="o">.</span><span class="n">systemSize</span><span class="p">}</span>
<span class="p">}</span>

<span class="c">// WIP -- fold over event log, then divide by total time.</span>
<span class="n">final</span> <span class="o">:=</span> <span class="n">slice</span><span class="o">.</span><span class="n">Fold</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">log</span><span class="p">,</span> <span class="n">wipState</span><span class="p">{},</span> <span class="n">integrateWIP</span><span class="p">)</span>
<span class="n">m</span><span class="o">.</span><span class="n">avgWIP</span> <span class="o">=</span> <span class="n">final</span><span class="o">.</span><span class="n">area</span> <span class="o">/</span> <span class="n">r</span><span class="o">.</span><span class="n">endTime</span>
</code></pre></div></div>

<p>Flow time from timestamps. WIP from integration. Neither derived from the
other.</p>

<p><strong>“Common seeds” aren’t matched traces.</strong> Different scenarios consume random
numbers differently. The fixed-schedule scenario uses none. The
random-arrivals scenario draws only from the arrival sequence. Sharing a seed
doesn’t mean scenarios see the same arrivals. Fix: pre-generate one
interarrival sequence and one service sequence. Each scenario slices what it
needs.</p>

<p><strong>Principle:</strong> Verification that travels the same code path as computation
isn’t verification.</p>

<h3 id="round-3-simulation-is-not-animation">Round 3: simulation is not animation</h3>

<p>The first implementation used real-time sleeps with 500ms terminal ticks. The
refresh rate was the simulation clock.</p>

<p>Two customers arriving 0.3 simulated minutes apart land in the same tick. We
weren’t simulating random arrivals. We were simulating whatever the tick
granularity permits.</p>

<p>I decided on discrete-event simulation in virtual time. Run instantly. Record
everything. Animate playback separately.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">runSim</span><span class="p">(</span><span class="n">cfg</span> <span class="n">simConfig</span><span class="p">)</span> <span class="n">simResult</span> <span class="p">{</span>
    <span class="k">var</span> <span class="p">(</span>
        <span class="n">customers</span> <span class="p">[]</span><span class="n">customer</span>
        <span class="n">log</span>       <span class="p">[]</span><span class="n">logEntry</span>
        <span class="n">eq</span>        <span class="n">eventQueue</span>
        <span class="n">queue</span>     <span class="p">[]</span><span class="kt">int</span> <span class="c">// FIFO</span>
        <span class="n">busy</span>      <span class="kt">bool</span>
    <span class="p">)</span>
    <span class="n">heap</span><span class="o">.</span><span class="n">Init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">eq</span><span class="p">)</span>

    <span class="n">record</span> <span class="o">:=</span> <span class="k">func</span><span class="p">(</span><span class="n">t</span> <span class="kt">float64</span><span class="p">,</span> <span class="n">typ</span> <span class="n">eventType</span><span class="p">,</span> <span class="n">custIdx</span><span class="p">,</span> <span class="n">qDepth</span> <span class="kt">int</span><span class="p">,</span> <span class="n">serverBusy</span> <span class="kt">bool</span><span class="p">)</span> <span class="p">{</span>
        <span class="n">log</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">log</span><span class="p">,</span> <span class="n">logEntry</span><span class="p">{</span>
            <span class="n">time</span><span class="o">:</span> <span class="n">t</span><span class="p">,</span> <span class="n">typ</span><span class="o">:</span> <span class="n">typ</span><span class="p">,</span> <span class="n">custIdx</span><span class="o">:</span> <span class="n">custIdx</span><span class="p">,</span>
            <span class="n">queueDepth</span><span class="o">:</span> <span class="n">qDepth</span><span class="p">,</span> <span class="n">serverBusy</span><span class="o">:</span> <span class="n">serverBusy</span><span class="p">,</span>
        <span class="p">})</span>
    <span class="p">}</span>
    <span class="c">// ... process events in simulated time, record everything</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Playback at 360x. All metrics in simulated units — “Avg wait: 5.8 min”
means simulated minutes, not wall-clock.</p>

<p><strong>Principle:</strong> Coupling simulation to rendering makes both unreliable.</p>

<hr />

<p>Three questions from these reviews. Is your baseline valid? Is your
verification independent of your computation? Is your clock decoupled from your
display? Believable output is not the same as a trustworthy model.</p>

<p><a href="https://github.com/binaryphile/toc/tree/master/examples/queue-demo">Source code</a></p>]]></content><author><name></name></author><category term="development" /><summary type="html"><![CDATA[A queue at 95% target load is mathematically stable. A dashboard says fine. Watch it run and your gut says broken. That gap is where queuing intuition fails.]]></summary></entry><entry><title type="html">Two Rules for Readable Density</title><link href="https://binaryphile.github.io/development/2026/03/26/two-rules-for-readable-density.html" rel="alternate" type="text/html" title="Two Rules for Readable Density" /><published>2026-03-26T00:00:00+00:00</published><updated>2026-03-26T00:00:00+00:00</updated><id>https://binaryphile.github.io/development/2026/03/26/two-rules-for-readable-density</id><content type="html" xml:base="https://binaryphile.github.io/development/2026/03/26/two-rules-for-readable-density.html"><![CDATA[<p>Most readability advice resists mechanical checking. “Use good names.” “Keep
functions short.” You need the whole function, maybe the whole module, to
evaluate those. These two rules you can check by reading a single line. The
examples are in Go, but the rules apply to any language with nested expressions.</p>

<h2 id="the-uniform-comma-rule">The uniform comma rule</h2>

<p>Every comma in an expression should belong to the same argument list.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result</span> <span class="o">:=</span> <span class="nb">append</span><span class="p">(</span><span class="nb">append</span><span class="p">(</span><span class="n">items</span><span class="p">,</span> <span class="n">extra</span><span class="p">),</span> <span class="n">overflow</span><span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>Two commas, but they belong to different calls. <code class="language-plaintext highlighter-rouge">items, extra</code> feed the inner
<code class="language-plaintext highlighter-rouge">append</code>. <code class="language-plaintext highlighter-rouge">append(items, extra)</code> and <code class="language-plaintext highlighter-rouge">overflow...</code> feed the outer. Your eye has
to match each comma to its call to parse this.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">combined</span> <span class="o">:=</span> <span class="nb">append</span><span class="p">(</span><span class="n">items</span><span class="p">,</span> <span class="n">extra</span><span class="p">)</span>
<span class="n">result</span> <span class="o">:=</span> <span class="nb">append</span><span class="p">(</span><span class="n">combined</span><span class="p">,</span> <span class="n">overflow</span><span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>Every comma on each line belongs to one call.</p>

<h2 id="the-shallow-nesting-rule">The shallow nesting rule</h2>

<p>No more than two opening delimiters — parentheses, brackets, or braces — before
a corresponding close.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">name</span> <span class="o">:=</span> <span class="n">strings</span><span class="o">.</span><span class="n">ToLower</span><span class="p">(</span><span class="n">strings</span><span class="o">.</span><span class="n">TrimSpace</span><span class="p">(</span><span class="n">strings</span><span class="o">.</span><span class="n">ReplaceAll</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span> <span class="s">"_"</span><span class="p">,</span> <span class="s">" "</span><span class="p">)))</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">strings.ToLower(</code> is one open. <code class="language-plaintext highlighter-rouge">strings.TrimSpace(</code> is two.
<code class="language-plaintext highlighter-rouge">strings.ReplaceAll(</code> is three. Three levels deep before anything resolves, all
to clean up a string.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">spaced</span> <span class="o">:=</span> <span class="n">strings</span><span class="o">.</span><span class="n">ReplaceAll</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span> <span class="s">"_"</span><span class="p">,</span> <span class="s">" "</span><span class="p">)</span>
<span class="n">name</span> <span class="o">:=</span> <span class="n">strings</span><span class="o">.</span><span class="n">ToLower</span><span class="p">(</span><span class="n">strings</span><span class="o">.</span><span class="n">TrimSpace</span><span class="p">(</span><span class="n">spaced</span><span class="p">))</span>
</code></pre></div></div>

<p>Neither line nests past two.</p>

<p>Brackets count. Map lookups are delimiter pairs:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">name</span> <span class="o">:=</span> <span class="n">users</span><span class="p">[</span><span class="n">groups</span><span class="p">[</span><span class="n">ids</span><span class="p">[</span><span class="n">index</span><span class="p">]]]</span>
</code></pre></div></div>

<p>Three opens.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">id</span> <span class="o">:=</span> <span class="n">groups</span><span class="p">[</span><span class="n">ids</span><span class="p">[</span><span class="n">index</span><span class="p">]]</span>
<span class="n">name</span> <span class="o">:=</span> <span class="n">users</span><span class="p">[</span><span class="n">id</span><span class="p">]</span>
</code></pre></div></div>

<h2 id="why-two-rules">Why two rules</h2>

<p>They catch different things.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">result</span> <span class="o">:=</span> <span class="n">process</span><span class="p">(</span><span class="n">transform</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">),</span> <span class="n">z</span><span class="p">)</span>
</code></pre></div></div>

<p>Two opens — nesting is fine. But <code class="language-plaintext highlighter-rouge">x, y</code> belongs to <code class="language-plaintext highlighter-rouge">transform</code> while
<code class="language-plaintext highlighter-rouge">transform(x, y), z</code> belongs to <code class="language-plaintext highlighter-rouge">process</code>. Commas at two levels. Only the
uniform comma rule flags this.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">value</span> <span class="o">:=</span> <span class="n">outer</span><span class="p">(</span><span class="n">middle</span><span class="p">(</span><span class="n">inner</span><span class="p">()))</span>
</code></pre></div></div>

<p>No commas. Three opens before the first close. Only the shallow nesting rule
flags this.</p>

<p>Some real offenders trip both:</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">parts</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">parts</span><span class="p">,</span> <span class="n">strconv</span><span class="o">.</span><span class="n">FormatFloat</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">Abs</span><span class="p">(</span><span class="n">val</span><span class="p">),</span> <span class="sc">'f'</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">64</span><span class="p">))</span>
</code></pre></div></div>

<p>Three opens and commas at two levels.</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">formatted</span> <span class="o">:=</span> <span class="n">strconv</span><span class="o">.</span><span class="n">FormatFloat</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">Abs</span><span class="p">(</span><span class="n">val</span><span class="p">),</span> <span class="sc">'f'</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">64</span><span class="p">)</span>
<span class="n">parts</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">parts</span><span class="p">,</span> <span class="n">formatted</span><span class="p">)</span>
</code></pre></div></div>

<p>One extraction and both rules are satisfied. The remaining lines are still
dense — but neither nests past two, and every comma belongs to one call. Judge
their legibility for yourself.</p>

<p>The fix is always the same: extract to a named variable. Naming the variable
documents what the expression computes. The outer expression reads in terms of a
word instead of a computation.</p>

<p>Both rules work at the smallest scale: one line, one expression. You can check
them in review without understanding what the program does. As far as I can
tell, no existing linter enforces either rule. Tools like <code class="language-plaintext highlighter-rouge">nestif</code>, <code class="language-plaintext highlighter-rouge">gocognit</code>,
and ESLint’s <code class="language-plaintext highlighter-rouge">max-depth</code> check control-flow nesting — <code class="language-plaintext highlighter-rouge">if</code> inside <code class="language-plaintext highlighter-rouge">if</code> inside
<code class="language-plaintext highlighter-rouge">if</code>. None check expression-level delimiter depth or mixed comma membership.</p>

<p>They came from an itch. Certain lines have always struck me as harder to read
than they should be, given how little they do. These rules are the closest I’ve
come to saying why.</p>]]></content><author><name></name></author><category term="development" /><summary type="html"><![CDATA[Most readability advice resists mechanical checking. “Use good names.” “Keep functions short.” You need the whole function, maybe the whole module, to evaluate those. These two rules you can check by reading a single line. The examples are in Go, but the rules apply to any language with nested expressions.]]></summary></entry><entry><title type="html">Bash Style Guide</title><link href="https://binaryphile.github.io/2026/02/27/bash-style-guide.html" rel="alternate" type="text/html" title="Bash Style Guide" /><published>2026-02-27T00:00:00+00:00</published><updated>2026-02-27T00:00:00+00:00</updated><id>https://binaryphile.github.io/2026/02/27/bash-style-guide</id><content type="html" xml:base="https://binaryphile.github.io/2026/02/27/bash-style-guide.html"><![CDATA[<h1 id="bash-style-guide">Bash Style Guide</h1>

<p>Prescriptive conventions for bash code under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>. Techniques are general; examples use standalone script style unless demonstrating library conventions.</p>

<h2 id="1-shebang-and-version">1. Shebang and Version</h2>

<p><code class="language-plaintext highlighter-rouge">#!/usr/bin/env bash</code>. Bash 4.4+ minimum (for <code class="language-plaintext highlighter-rouge">${var@Q}</code>).</p>

<p>File extensions: <code class="language-plaintext highlighter-rouge">.bash</code> for libraries, no extension for executables.</p>

<h2 id="2-safety-preamble">2. Safety Preamble</h2>

<p>Two tiers: libraries and scripts.</p>

<p><strong>Libraries</strong>: expect <code class="language-plaintext highlighter-rouge">IFS=$'\n'</code> and noglob from their callers, no <code class="language-plaintext highlighter-rouge">set -e</code> — callers own error policy. The library files themselves don’t set these; consumers do after sourcing (see boilerplate below). Some libraries handle IFS internally per-function with <code class="language-plaintext highlighter-rouge">IFS='' read -r</code>.</p>

<p>Consumers set this after sourcing:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">IFS</span><span class="o">=</span><span class="s1">$'</span><span class="se">\n</span><span class="s1">'</span>
<span class="nb">set</span> <span class="nt">-o</span> noglob
</code></pre></div></div>

<p><strong>Scripts</strong>: defer strict mode until after option parsing. Option parsing uses <code class="language-plaintext highlighter-rouge">$*</code> unquoted and tests <code class="language-plaintext highlighter-rouge">${1:-}</code>, which interact poorly with <code class="language-plaintext highlighter-rouge">set -eu</code> before args are validated.</p>

<p>Standard for new scripts: <code class="language-plaintext highlighter-rouge">set -euo pipefail</code>. Add <code class="language-plaintext highlighter-rouge">f</code> if noglob is not already set (<code class="language-plaintext highlighter-rouge">f</code> is equivalent to <code class="language-plaintext highlighter-rouge">set -o noglob</code>). Libraries should not force strict mode on their consumers.</p>

<p>The <code class="language-plaintext highlighter-rouge">return 2&gt;/dev/null</code> line before strict mode enables interactive debugging by sourcing the script without executing main.</p>

<p><strong>Library consumer boilerplate</strong>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">source</span> ~/.local/lib/mylib.bash 2&gt;/dev/null <span class="o">||</span> <span class="o">{</span> <span class="nb">echo</span> <span class="s1">'fatal: mylib.bash not found'</span> <span class="o">&gt;</span>&amp;2<span class="p">;</span> <span class="nb">exit </span>1<span class="p">;</span> <span class="o">}</span>

<span class="c"># enable safe expansion</span>
<span class="nv">IFS</span><span class="o">=</span><span class="s1">$'</span><span class="se">\n</span><span class="s1">'</span>
<span class="nb">set</span> <span class="nt">-o</span> noglob

<span class="k">return </span>2&gt;/dev/null    <span class="c"># stop if sourced, for interactive debugging</span>
main <span class="nv">$*</span>               <span class="c"># entry point — library consumers may strip parsed options first</span>
</code></pre></div></div>

<p><strong>Script bottom</strong>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># strict mode</span>
<span class="k">return </span>2&gt;/dev/null
<span class="nb">set</span> <span class="nt">-euo</span> pipefail
<span class="nb">set</span> <span class="nt">-o</span> noglob

main <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
</code></pre></div></div>

<h2 id="3-naming">3. Naming</h2>

<p>Every file has a Naming Policy header comment (see template below). The rules:</p>

<ul>
  <li><strong>Functions</strong> (libraries): <code class="language-plaintext highlighter-rouge">namespace.PascalCase</code> (public), <code class="language-plaintext highlighter-rouge">namespace.camelCase</code> (private). Namespace is the project name lowercase (e.g., <code class="language-plaintext highlighter-rouge">lib.</code>). Libraries are sourced by others and need namespace collision protection; standalone scripts use plain <code class="language-plaintext highlighter-rouge">PascalCase</code>/<code class="language-plaintext highlighter-rouge">camelCase</code> (see Standalone scripts below).</li>
  <li><strong>Locals</strong>: <code class="language-plaintext highlighter-rouge">camelCase</code> — begin with lowercase. Compound words that are single semantic concepts stay lowercase: <code class="language-plaintext highlighter-rouge">filename</code>, <code class="language-plaintext highlighter-rouge">testname</code>, <code class="language-plaintext highlighter-rouge">fieldname</code> (not <code class="language-plaintext highlighter-rouge">fileName</code>, <code class="language-plaintext highlighter-rouge">testName</code>, <code class="language-plaintext highlighter-rouge">fieldName</code>). Arrays use plural names (<code class="language-plaintext highlighter-rouge">testnames</code>, <code class="language-plaintext highlighter-rouge">filenames</code>, <code class="language-plaintext highlighter-rouge">requestedTests</code>); scalars use singular. Unpack positional parameters on one <code class="language-plaintext highlighter-rouge">local</code> line: <code class="language-plaintext highlighter-rouge">local got=$1 want=$2</code>, <code class="language-plaintext highlighter-rouge">local msg=$1 rc=${2:-$?}</code>.</li>
  <li><strong>Globals</strong>: <code class="language-plaintext highlighter-rouge">PascalCase</code> — begin with uppercase. Libraries append a randomly-chosen project-specific suffix letter (e.g., <code class="language-plaintext highlighter-rouge">DebugQ</code>, <code class="language-plaintext highlighter-rouge">ShowProgressQ</code>, <code class="language-plaintext highlighter-rouge">TimeFuncQ</code>) to prevent namespace collisions. Globals are not public — create accessor functions if consumers need them. Standalone scripts omit the suffix. <strong>Associative and indexed arrays</strong> that are global must use <code class="language-plaintext highlighter-rouge">declare -gA</code> or <code class="language-plaintext highlighter-rouge">declare -ga</code>, not <code class="language-plaintext highlighter-rouge">declare -A</code> or <code class="language-plaintext highlighter-rouge">declare -a</code>. Without <code class="language-plaintext highlighter-rouge">-g</code>, <code class="language-plaintext highlighter-rouge">declare</code> inside a function creates a local variable regardless of naming convention. This matters when a library is sourced inside a function (e.g., a convergence wrapper) — the arrays go out of scope when the sourcing function returns.</li>
  <li><strong>Namerefs</strong>: <code class="language-plaintext highlighter-rouge">local -n UPPERCASE=$1</code> — borrows the environment variable namespace (all-caps). Namerefs point to the caller’s variable, so they need names that won’t collide with any local. UPPERCASE is safe because locals are always camelCase.</li>
  <li><strong>“List” in names</strong>: functions that serialize arrays into newline-separated strings use “List” – <code class="language-plaintext highlighter-rouge">ListOf()</code>, <code class="language-plaintext highlighter-rouge">StreamList()</code>. Variables holding serialized lists use the <code class="language-plaintext highlighter-rouge">*List</code> suffix (e.g., <code class="language-plaintext highlighter-rouge">groupList</code>, <code class="language-plaintext highlighter-rouge">commandList</code>). The <code class="language-plaintext highlighter-rouge">*List</code> suffix signals multi-value content (implies IFS characters), so no trailing <code class="language-plaintext highlighter-rouge">_</code> is needed – the two conventions are mutually exclusive.</li>
  <li><strong>Standard globals</strong> (suffix exceptions): <code class="language-plaintext highlighter-rouge">NL=$'\n'</code> for string interpolation in double quotes. <code class="language-plaintext highlighter-rouge">Prog=$(basename "$0")</code> is standard in scripts that report their own name. These are conventional exceptions to the suffix rule.</li>
  <li><strong>Standalone scripts</strong>: no namespace prefix on functions, no suffix letter on globals — not sourced by others, so no collision risk.</li>
</ul>

<p>Example header (library):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Naming Policy:</span>
<span class="c">#</span>
<span class="c"># All function and variable names are camelCased.</span>
<span class="c">#</span>
<span class="c"># Private function names begin with lowercase letters.</span>
<span class="c"># Public function names begin with uppercase letters.</span>
<span class="c"># Function names are prefixed with "lib." (always lowercase) so they are namespaced.</span>
<span class="c">#</span>
<span class="c"># Local variable names begin with lowercase letters, e.g. localVariable.</span>
<span class="c">#</span>
<span class="c"># Global variable names begin with uppercase letters, e.g. GlobalVariable.</span>
<span class="c"># Since this is a library, global variable names are also namespaced by suffixing them with</span>
<span class="c"># the randomly-generated letter Q, e.g. GlobalVariableQ.</span>
<span class="c"># Global variables are not public.  Library consumers should not be aware of them.</span>
<span class="c"># If users need to interact with them, create accessor functions for the purpose.</span>
<span class="c">#</span>
<span class="c"># Variable declarations that are name references borrow the environment namespace, e.g.</span>
<span class="c"># "local -n ARRAY=$1".</span>
</code></pre></div></div>

<h2 id="4-namespace-suffix">4. Namespace Suffix</h2>

<p>Single letter per library appended to all globals and DI vars. Prevents collisions when libraries are sourced together. Choose a random letter per library — described as “randomly-generated” in headers.</p>

<p>Standalone scripts omit the suffix — not sourced by others, so no collision risk.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">TimeFuncQ</span><span class="o">=</span>UnixMilli   <span class="c"># DI variable</span>
<span class="nv">ShowProgressQ</span><span class="o">=</span>1       <span class="c"># global</span>
<span class="nv">DebugQ</span><span class="o">=</span>0              <span class="c"># global</span>
</code></pre></div></div>

<h2 id="5-quoting">5. Quoting</h2>

<p><code class="language-plaintext highlighter-rouge">_</code> suffix on variables means “must quote on expansion.” Two reasons qualify a variable for the suffix:</p>

<ol>
  <li><strong>Contains IFS characters</strong> (newlines under <code class="language-plaintext highlighter-rouge">IFS=$'\n'</code>) — unquoted expansion splits into multiple words.</li>
  <li><strong>Can be empty</strong> — unquoted expansion disappears entirely, breaking positional argument pairing.</li>
</ol>

<p>In practice: <code class="language-plaintext highlighter-rouge">commands_</code> (trap output), <code class="language-plaintext highlighter-rouge">content_</code> (user input, may contain newlines), <code class="language-plaintext highlighter-rouge">usage_</code> (multiline heredoc), <code class="language-plaintext highlighter-rouge">tags_</code> (optional flag, empty when not provided).</p>

<p>The <code class="language-plaintext highlighter-rouge">*List</code> suffix is an alternative convention for multi-value variables: <code class="language-plaintext highlighter-rouge">groupList</code>, <code class="language-plaintext highlighter-rouge">commandList</code>. The suffix signals IFS content (implies must-quote). <code class="language-plaintext highlighter-rouge">_</code> and <code class="language-plaintext highlighter-rouge">*List</code> are mutually exclusive on the same variable.</p>

<p>Variables without <code class="language-plaintext highlighter-rouge">_</code> or <code class="language-plaintext highlighter-rouge">*List</code> are safe unquoted under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>.</p>

<p>Nameref collision avoidance uses a separate strategy: UPPERCASE names (see Naming).</p>

<p><strong><code class="language-plaintext highlighter-rouge">printf %q</code></strong> escapes a value for shell re-evaluation (eval-safe):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">printf</span> <span class="nt">-v</span> output <span class="s1">'%q '</span> <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>    <span class="c"># output is safe to eval</span>
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">${var@Q}</code></strong> renders a human-readable quoted literal. Used for debug output and test copy-paste lines:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CMD</span><span class="o">=</span><span class="s2">"sudo -u </span><span class="k">${</span><span class="nv">RunAsUser</span><span class="p">@Q</span><span class="k">}</span><span class="s2"> bash -c </span><span class="k">${</span><span class="nv">CMD</span><span class="p">@Q</span><span class="k">}</span><span class="s2">"</span>    <span class="c"># readable in logs</span>
<span class="nb">echo</span> <span class="s2">"want=</span><span class="k">${</span><span class="nv">got</span><span class="p">@Q</span><span class="k">}</span><span class="s2">"</span>                                <span class="c"># tests — paste to update expected value</span>
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">read -r</code> discipline</strong>: always use <code class="language-plaintext highlighter-rouge">read -r</code> to avoid backslash interpretation. Prefer <code class="language-plaintext highlighter-rouge">IFS='' read -r</code> when consuming raw lines (see FP Pipeline Helpers for the canonical pattern).</p>

<p><strong>Avoid braces in expansion.</strong> <code class="language-plaintext highlighter-rouge">$var</code>, not <code class="language-plaintext highlighter-rouge">${var}</code> — braces add noise for no benefit when the variable name is unambiguous. For disambiguation when text follows the name, prefer quotes over braces: <code class="language-plaintext highlighter-rouge">"$var"Suffix</code> concatenates the quoted expansion with the literal. Use braces when the variable is embedded mid-string and quotes can’t delimit it: <code class="language-plaintext highlighter-rouge">"prefix${var}suffix"</code>.</p>

<p><strong>Array/positional expansion</strong>: <code class="language-plaintext highlighter-rouge">"${array[@]}"</code> and <code class="language-plaintext highlighter-rouge">"$@"</code> preserve element boundaries — each element stays a separate word. <code class="language-plaintext highlighter-rouge">"$*"</code> joins elements with the first character of IFS (useful for serialization). Unquoted, both <code class="language-plaintext highlighter-rouge">${array[@]}</code> and <code class="language-plaintext highlighter-rouge">$@</code> undergo word splitting on IFS, so elements containing newlines get broken apart. Under <code class="language-plaintext highlighter-rouge">set -u</code>, an empty array needs <code class="language-plaintext highlighter-rouge">${args[@]:-}</code> as fallback.</p>

<p><strong>Quoting decision tree.</strong> Walk this algorithm for any expansion you’re unsure about:</p>

<ol>
  <li><strong>No-split context?</strong> Assignment RHS, <code class="language-plaintext highlighter-rouge">[[ ]]</code> (except RHS of <code class="language-plaintext highlighter-rouge">==</code> and <code class="language-plaintext highlighter-rouge">=~</code>), <code class="language-plaintext highlighter-rouge">(( ))</code>, <code class="language-plaintext highlighter-rouge">case</code>, array subscripts, <code class="language-plaintext highlighter-rouge">${...}</code> operators, redirections, here-strings — quoting is unnecessary. These contexts never split or glob regardless of IFS/noglob settings.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">_</code>-suffixed or <code class="language-plaintext highlighter-rouge">*List</code> variable?</strong> Contains IFS characters (newlines). Must quote in non-assignment contexts: <code class="language-plaintext highlighter-rouge">echo "$usage_"</code>, <code class="language-plaintext highlighter-rouge">hasGroup "$groupList"</code>.</li>
  <li><strong>Required-quoting context?</strong> Array expansion (<code class="language-plaintext highlighter-rouge">"${arr[@]}"</code>), RHS of <code class="language-plaintext highlighter-rouge">==</code> in <code class="language-plaintext highlighter-rouge">[[</code> (for literal match), <code class="language-plaintext highlighter-rouge">eval</code> arguments, <code class="language-plaintext highlighter-rouge">trap</code> strings, external command arguments, process substitution with multi-line content — must quote. See the full list below.</li>
  <li><strong>Otherwise</strong> — safe unquoted under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>. The variable has no <code class="language-plaintext highlighter-rouge">_</code> suffix (newline-free by convention), and the context is a shell builtin or function call with scalar arguments.</li>
</ol>

<p><strong>Why not quote everything?</strong> Under IFS+noglob, selective quoting signals intent. Quotes mean “this value needs protection” — either it contains IFS characters, or the context demands exact word boundaries. Quoting every expansion adds noise without adding safety, and obscures which values actually require care. When a reviewer sees quotes, they should be able to trust that those quotes are there for a reason.</p>

<p><strong>When to quote.</strong> Under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>, most scalar expansions are safe unquoted. Quotes are required in these contexts:</p>

<ul>
  <li><strong>Trust boundaries and the <code class="language-plaintext highlighter-rouge">_</code> suffix</strong> — assigning a parameter to a non-<code class="language-plaintext highlighter-rouge">_</code> variable documents that it won’t contain IFS characters: <code class="language-plaintext highlighter-rouge">local command=$1</code> means “I expect single-line input.” If a parameter may contain newlines, assign to a <code class="language-plaintext highlighter-rouge">_</code>-suffixed variable and quote from there.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">"${array[@]}"</code> / <code class="language-plaintext highlighter-rouge">"$@"</code> / <code class="language-plaintext highlighter-rouge">"$*"</code></strong> — quote to preserve element boundaries (see above). Unquote only when IFS splitting is intentional (e.g., populating arrays from command output: <code class="language-plaintext highlighter-rouge">local arr=( $(command) )</code>).</li>
  <li><strong>RHS of <code class="language-plaintext highlighter-rouge">==</code> in <code class="language-plaintext highlighter-rouge">[[</code></strong> — <code class="language-plaintext highlighter-rouge">[[ $x == "$y" ]]</code> for literal match. Unquoted RHS is a glob pattern: <code class="language-plaintext highlighter-rouge">*</code>, <code class="language-plaintext highlighter-rouge">?</code>, <code class="language-plaintext highlighter-rouge">[</code> become wildcards. Leave unquoted for intentional pattern matching: <code class="language-plaintext highlighter-rouge">[[ $OSTYPE == darwin* ]]</code>.</li>
  <li><strong>RHS of <code class="language-plaintext highlighter-rouge">=~</code> in <code class="language-plaintext highlighter-rouge">[[</code></strong> — quoting disables regex metacharacter interpretation in bash 3.2+ (<code class="language-plaintext highlighter-rouge">.</code> becomes literal dot, <code class="language-plaintext highlighter-rouge">*</code> loses repetition meaning), though the regex engine is still in use. Leave unquoted for regex matching (the common case): <code class="language-plaintext highlighter-rouge">[[ $x =~ ^[0-9]+$ ]]</code>. For complex patterns, store in a variable: <code class="language-plaintext highlighter-rouge">local pattern='^[0-9]+$'; [[ $x =~ $pattern ]]</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">_</code>-suffixed variables</strong> in non-assignment contexts — contain IFS characters (newlines), must quote: <code class="language-plaintext highlighter-rouge">eval "$testSource_"</code>, <code class="language-plaintext highlighter-rouge">echo "$Usage_"</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">eval</code> arguments</strong> — <code class="language-plaintext highlighter-rouge">eval "$CMD"</code>. Without quotes, newlines become argument separators; <code class="language-plaintext highlighter-rouge">eval</code> joins arguments with spaces, changing multi-line code semantics.</li>
  <li><strong>Command substitution as argument</strong> — a judgment call. <code class="language-plaintext highlighter-rouge">func "$(command)"</code> when the result should be a single word. Unquoted <code class="language-plaintext highlighter-rouge">$(command)</code> splits on newlines, which is sometimes desired: <code class="language-plaintext highlighter-rouge">local arr=( $(listItems) )</code>.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">trap</code> command strings</strong> — <code class="language-plaintext highlighter-rouge">trap "$command$NL$(existing)" EXIT</code>. The string is stored for later eval; must be a single coherent argument.</li>
  <li><strong>Process substitution with multi-line content</strong> — <code class="language-plaintext highlighter-rouge">diff &lt;(echo "$got") &lt;(echo "$want")</code>. Unquoted <code class="language-plaintext highlighter-rouge">echo $var</code> splits on newlines into separate arguments; echo outputs them space-separated, destroying line structure.</li>
  <li><strong>External command arguments</strong> — <code class="language-plaintext highlighter-rouge">mkdir -p "$dir"</code>, <code class="language-plaintext highlighter-rouge">install -m "$mode"</code>, <code class="language-plaintext highlighter-rouge">ssh-keygen -f "$file"</code>. Without noglob, unquoted values undergo pathname expansion before the command sees them. Scripts using <code class="language-plaintext highlighter-rouge">set -euo pipefail</code> without <code class="language-plaintext highlighter-rouge">f</code> need this; code following these conventions quotes external command args consistently regardless.</li>
  <li><strong>Positional pairing arguments</strong> — APIs that consume arguments in key-value pairs (<code class="language-plaintext highlighter-rouge">jq --arg name value</code>, custom <code class="language-plaintext highlighter-rouge">key value key value</code> functions) break when an empty variable expands to nothing, shifting all subsequent pairs. Quote empty-possible values: <code class="language-plaintext highlighter-rouge">--arg t "$type_"</code>. Better: design pair-consuming APIs to accept <code class="language-plaintext highlighter-rouge">key=value</code> as single arguments so empty values produce <code class="language-plaintext highlighter-rouge">key=</code> (one word) rather than disappearing.</li>
</ul>

<p><strong>When quoting is unnecessary.</strong> These contexts never split or glob — quoting is harmless but adds no safety:</p>

<ul>
  <li><strong>Assignment RHS</strong> — <code class="language-plaintext highlighter-rouge">local var=$value</code>, <code class="language-plaintext highlighter-rouge">var=$(command)</code>, <code class="language-plaintext highlighter-rouge">var=${1:-default}</code>. Bash assigns the full expansion without splitting.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">[[ ]]</code> operands</strong> (except RHS of <code class="language-plaintext highlighter-rouge">==</code> and <code class="language-plaintext highlighter-rouge">=~</code>) — <code class="language-plaintext highlighter-rouge">[[ -e $file ]]</code>, <code class="language-plaintext highlighter-rouge">[[ $var == pattern ]]</code> (LHS). The conditional command suppresses splitting.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">(( ))</code> arithmetic</strong> — <code class="language-plaintext highlighter-rouge">(( rc == 0 ))</code>, <code class="language-plaintext highlighter-rouge">(( ${#array[@]} ))</code>. Arithmetic context, not string context.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">case</code> word</strong> — <code class="language-plaintext highlighter-rouge">case $var in</code>. No splitting.</li>
  <li><strong>Array subscripts</strong> — <code class="language-plaintext highlighter-rouge">${map[$key]}</code>, <code class="language-plaintext highlighter-rouge">array[$idx]=val</code>. Inside brackets, no splitting.</li>
  <li><strong>Inside <code class="language-plaintext highlighter-rouge">${...}</code> operators</strong> — <code class="language-plaintext highlighter-rouge">${1:-$default}</code>, <code class="language-plaintext highlighter-rouge">${var#$prefix}</code>. Nested expansions are protected.</li>
  <li><strong>Redirection targets</strong> — <code class="language-plaintext highlighter-rouge">&gt;$file</code>, <code class="language-plaintext highlighter-rouge">&lt;$file</code>, <code class="language-plaintext highlighter-rouge">&lt;&lt;&lt;$var</code>. Bash takes the single word.</li>
  <li><strong>Scalar command arguments</strong> — <code class="language-plaintext highlighter-rouge">func $simplevar</code>, <code class="language-plaintext highlighter-rouge">printf $fmt $val</code>. No word-splitting surprises for newline-free values under <code class="language-plaintext highlighter-rouge">IFS=$'\n'; set -o noglob</code>. This is the default assumption for variables without the <code class="language-plaintext highlighter-rouge">_</code> suffix. Note: commands still interpret values (printf parses its format string) — quoting controls splitting, not command semantics.</li>
</ul>

<h2 id="6-variable-scoping">6. Variable Scoping</h2>

<p>Bash has dynamic scoping: a function can read and modify variables in its caller’s scope, even <code class="language-plaintext highlighter-rouge">local</code> variables. This is the opposite of lexical scoping (C, Python, Go) where a function can only see its own locals and globals.</p>

<p><strong>Mechanism.</strong> When bash resolves a variable name, it walks up the call stack. A callee’s <code class="language-plaintext highlighter-rouge">local x</code> shadows the caller’s <code class="language-plaintext highlighter-rouge">x</code>, but without <code class="language-plaintext highlighter-rouge">local</code>, the callee accesses the caller’s variable directly. This applies to both reads and writes.</p>

<p><strong>Deliberate use — callback counting.</strong> A test runner can exploit dynamic scoping intentionally. The callback modifies <code class="language-plaintext highlighter-rouge">passCount</code> and <code class="language-plaintext highlighter-rouge">failCount</code>, which are locals in the calling function:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>passCount+<span class="o">=</span>1   <span class="c"># in caller's scope</span>
</code></pre></div></div>

<p>The comment <code class="language-plaintext highlighter-rouge"># in caller's scope</code> documents the intentional cross-scope access. Without this pattern, the runner would need to pass counters through return values or globals.</p>

<p><strong>Accidental shadowing — the collision risk.</strong> If a callee declares <code class="language-plaintext highlighter-rouge">local x</code> and the caller also has <code class="language-plaintext highlighter-rouge">local x</code>, the callee gets its own copy. But if the callee <em>doesn’t</em> declare <code class="language-plaintext highlighter-rouge">local</code> and uses <code class="language-plaintext highlighter-rouge">x</code>, it silently modifies the caller’s <code class="language-plaintext highlighter-rouge">x</code>. This is especially dangerous with namerefs: <code class="language-plaintext highlighter-rouge">local -n REF=$1</code> — if <code class="language-plaintext highlighter-rouge">$1</code> is <code class="language-plaintext highlighter-rouge">REF</code>, the nameref points to itself (circular reference).</p>

<p><strong>Defenses:</strong></p>

<ul>
  <li><strong>Naming conventions</strong> are the primary protection. <code class="language-plaintext highlighter-rouge">camelCase</code> locals and <code class="language-plaintext highlighter-rouge">PascalCase + suffix</code> globals occupy separate namespaces. Two callees in the same chain are unlikely to collide if they follow conventions.</li>
  <li><strong>UPPERCASE namerefs</strong> (<code class="language-plaintext highlighter-rouge">local -n ARRAY=$1</code>) borrow the environment variable namespace, which never collides with <code class="language-plaintext highlighter-rouge">camelCase</code> locals in the caller.</li>
  <li><strong>Subshell <code class="language-plaintext highlighter-rouge">()</code> function bodies</strong> provide hard isolation when dynamic scoping is unwanted. Changes to variables, working directory, and shell options are discarded when the subshell exits:</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>createCloneRepo<span class="o">()</span> <span class="o">(</span>     <span class="c"># () not {} — subshell isolates side effects</span>
  git init clone
  <span class="nb">cd </span>clone              <span class="c"># doesn't affect caller's pwd</span>
  <span class="nb">echo </span>hello <span class="o">&gt;</span>hello.txt
  git add hello.txt <span class="o">&amp;&amp;</span> git commit <span class="nt">-m</span> init
<span class="o">)</span> <span class="o">&gt;</span>/dev/null
</code></pre></div></div>

<p>Use <code class="language-plaintext highlighter-rouge">()</code> when a helper needs to <code class="language-plaintext highlighter-rouge">cd</code> or modify shell state; use <code class="language-plaintext highlighter-rouge">{}</code> (the default) when the caller needs to see the function’s side effects.</p>

<h2 id="7-conditionals">7. Conditionals</h2>

<p><code class="language-plaintext highlighter-rouge">[[</code> exclusively. <code class="language-plaintext highlighter-rouge">[[</code> is bash’s compound command with pattern matching, no word splitting, and <code class="language-plaintext highlighter-rouge">&amp;&amp;</code>/<code class="language-plaintext highlighter-rouge">||</code> inside.</p>

<p><strong><code class="language-plaintext highlighter-rouge">(( ))</code> for arithmetic and booleans.</strong> Boolean flags are 0/1 integers tested bare: <code class="language-plaintext highlighter-rouge">(( failed )) &amp;&amp; return 1</code>, <code class="language-plaintext highlighter-rouge">(( hasSubtests )) &amp;&amp; echo ...</code>. Numeric variables use explicit comparison: <code class="language-plaintext highlighter-rouge">(( rc == 0 ))</code>, <code class="language-plaintext highlighter-rouge">(( pid != 0 ))</code>. Arithmetic expansion: <code class="language-plaintext highlighter-rouge">$(( endTime - startTime ))</code>.</p>

<h2 id="8-error-handling">8. Error Handling</h2>

<p>Two patterns coexist.</p>

<p><strong><code class="language-plaintext highlighter-rouge">fatal()</code> with message + optional exit code.</strong> Default rc is <code class="language-plaintext highlighter-rouge">$?</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fatal<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local </span><span class="nv">msg</span><span class="o">=</span><span class="nv">$1</span> <span class="nv">rc</span><span class="o">=</span><span class="k">${</span><span class="nv">2</span><span class="k">:-</span><span class="nv">$?</span><span class="k">}</span>
  <span class="nb">echo</span> <span class="s2">"fatal: </span><span class="nv">$msg</span><span class="s2">"</span>
  <span class="nb">exit</span> <span class="nv">$rc</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Libraries namespace this (e.g., <code class="language-plaintext highlighter-rouge">lib.Fatal</code>) and typically print to stderr.</p>

<p><strong>Return code 128 as fatal signal.</strong> A test framework can detect 128 and report “fatal” distinct from regular failure:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">case</span> <span class="nv">$rc</span> <span class="k">in
  </span>0   <span class="p">)</span> <span class="nb">printf</span> <span class="nv">$columns</span> <span class="nv">$Pass</span> <span class="nv">$duration</span> <span class="nv">$testname</span><span class="p">;</span> passCount+<span class="o">=</span>1<span class="p">;;</span>
  128 <span class="p">)</span> <span class="nb">printf</span> <span class="nv">$columns</span> <span class="nv">$Fatal</span> <span class="nv">$duration</span> <span class="nv">$Yellow$testname$Reset</span><span class="p">;;</span>
  <span class="k">*</span>   <span class="p">)</span> <span class="nb">printf</span> <span class="nv">$columns</span> <span class="nv">$Fail</span> <span class="nv">$duration</span> <span class="nv">$Yellow$testname$Reset</span><span class="p">;;</span>
<span class="k">esac</span>
</code></pre></div></div>

<p><strong>RC capture</strong>: <code class="language-plaintext highlighter-rouge">cmd &amp;&amp; rc=$? || rc=$?</code> preserves exit code that <code class="language-plaintext highlighter-rouge">set -e</code> would otherwise lose. Safe under <code class="language-plaintext highlighter-rouge">set -e</code> because the <code class="language-plaintext highlighter-rouge">||</code> makes the overall compound command always succeed; <code class="language-plaintext highlighter-rouge">set -e</code> only triggers on unchecked failures.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">output</span><span class="o">=</span><span class="si">$(</span><span class="nb">eval</span> <span class="s2">"</span><span class="nv">$cmd</span><span class="s2">"</span> 2&gt;&amp;1<span class="si">)</span> <span class="o">&amp;&amp;</span> <span class="nv">rc</span><span class="o">=</span><span class="nv">$?</span> <span class="o">||</span> <span class="nv">rc</span><span class="o">=</span><span class="nv">$?</span>
</code></pre></div></div>

<p><strong>Trailing <code class="language-plaintext highlighter-rouge">&amp;&amp;</code> at end of function</strong>: a function whose last command is <code class="language-plaintext highlighter-rouge">[[ test ]] &amp;&amp; cmd</code> returns the test’s exit code when the test is false. Under <code class="language-plaintext highlighter-rouge">set -e</code> at the call site, that propagates as a non-zero return and terminates the caller — even when the function did exactly what it was meant to do (skip the conditional action).</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Bug: when $stashRef is empty, the [[ -n ]] test fails (rc 1), the</span>
<span class="c"># function returns 1, and a caller running under `set -e` aborts.</span>
gitUpdate<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local </span>stashRef
  <span class="nv">stashRef</span><span class="o">=</span><span class="si">$(</span>git stash list | <span class="nb">head</span> <span class="nt">-1</span><span class="si">)</span>
  <span class="o">[[</span> <span class="nt">-n</span> <span class="nv">$stashRef</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> git stash drop <span class="nv">$stashRef</span>
<span class="o">}</span>

<span class="c"># Fix 1 (preferred): invert the test so the no-op branch returns success.</span>
<span class="o">[[</span> <span class="nt">-z</span> <span class="nv">$stashRef</span> <span class="o">]]</span> <span class="o">||</span> git stash drop <span class="nv">$stashRef</span>

<span class="c"># Fix 2: explicit conditional.</span>
<span class="k">if</span> <span class="o">[[</span> <span class="nt">-n</span> <span class="nv">$stashRef</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then </span>git stash drop <span class="nv">$stashRef</span><span class="p">;</span> <span class="k">fi</span>

<span class="c"># Fix 3: catch-all trailing return.</span>
<span class="o">[[</span> <span class="nt">-n</span> <span class="nv">$stashRef</span> <span class="o">]]</span> <span class="o">&amp;&amp;</span> git stash drop <span class="nv">$stashRef</span>
<span class="k">return </span>0
</code></pre></div></div>

<p>The gotcha generalizes: any compound where the failure branch is “do nothing” needs the function to still return zero. Inverting the test with <code class="language-plaintext highlighter-rouge">||</code> is usually the cleanest form — the conditional reads as “skip unless” rather than “do if.”</p>

<p><strong><code class="language-plaintext highlighter-rouge">pipefail</code></strong>: standard for new scripts. <code class="language-plaintext highlighter-rouge">set -euo pipefail</code>.</p>

<p><strong>Strict mode escape</strong>: <code class="language-plaintext highlighter-rouge">loosely()</code> for sourcing optional configs that may not exist or may fail benignly:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>loosely<span class="o">()</span> <span class="o">{</span>
  <span class="nb">set</span> +euo pipefail
  <span class="s2">"</span><span class="nv">$@</span><span class="s2">"</span>
  <span class="nb">set</span> <span class="nt">-euo</span> pipefail
<span class="o">}</span>
loosely <span class="nb">source</span> /etc/profile.d/optional-tool.sh
</code></pre></div></div>

<h2 id="9-dependency-injection">9. Dependency Injection</h2>

<p>Assign function names to <code class="language-plaintext highlighter-rouge">PascalCase + suffix</code> variables. Override in tests:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># production default</span>
<span class="nv">TimeFuncQ</span><span class="o">=</span>UnixMilli

<span class="c"># in test</span>
<span class="nv">TimeFuncQ</span><span class="o">=</span>mockUnixMilli
</code></pre></div></div>

<h2 id="10-code-organization">10. Code Organization</h2>

<p><strong>Cuddling</strong>: group related lines together, separate concepts with blank lines. One concept per group — similar to golangci-lint’s wsl rules.</p>

<p><strong>Scripts</strong>: option parsing near bottom, <code class="language-plaintext highlighter-rouge">return 2&gt;/dev/null</code> (debug hook), strict mode, then main call as last line.</p>

<p><strong>Libraries</strong>: function definitions only, no main call. Consumer scripts call the entry point.</p>

<p>Library consumers follow boilerplate: source → IFS → noglob → return → entry point.</p>

<p>Standard flags: <code class="language-plaintext highlighter-rouge">-h</code>/<code class="language-plaintext highlighter-rouge">--help</code>, <code class="language-plaintext highlighter-rouge">-v</code>/<code class="language-plaintext highlighter-rouge">--version</code>, <code class="language-plaintext highlighter-rouge">-x</code>/<code class="language-plaintext highlighter-rouge">--trace</code> (<code class="language-plaintext highlighter-rouge">set -x</code> for debugging). Libraries typically provide an option handler for these.</p>

<h2 id="11-comments">11. Comments</h2>

<p>Three placements.</p>

<p><strong>Function docs</strong> go directly above the definition, no blank line between. Start with the function name:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># lib.Main runs any test functions in the files given as arguments.</span>
<span class="c"># It outputs success or failure.</span>
lib.Main<span class="o">()</span> <span class="o">{</span>
</code></pre></div></div>

<p><strong>Inline comments</strong> explain non-obvious flags, return codes, or surprising behavior:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">local </span><span class="nv">tmpname</span><span class="o">=</span><span class="si">$(</span><span class="nb">mktemp</span> <span class="nt">-u</span><span class="si">)</span>   <span class="c"># -u doesn't create a file, just a name</span>
<span class="o">((</span> <span class="nv">$?</span> <span class="o">==</span> 128 <span class="o">))</span> <span class="o">&amp;&amp;</span> <span class="k">return </span>128 <span class="c"># fatal</span>
<span class="nb">local </span><span class="nv">NL</span><span class="o">=</span><span class="s1">$'</span><span class="se">\n</span><span class="s1">'</span> <span class="c"># newline works with backgrounding (&amp;) and legal semicolons, semicolon doesn't</span>
</code></pre></div></div>

<p><strong>Section markers</strong> use a hierarchical style like inverted markdown headers: <code class="language-plaintext highlighter-rouge">#</code> is the lowest level, <code class="language-plaintext highlighter-rouge">##</code> is a level up. Rarely more than <code class="language-plaintext highlighter-rouge">##</code> in practice. Preceded by a blank line:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># strict mode          ← low-level annotation</span>

<span class="c">## library functions   ← major section</span>

<span class="c">## logging             ← major section</span>
</code></pre></div></div>

<h2 id="12-testing">12. Testing</h2>

<p>Test framework conventions.</p>

<p><strong>Associative array cases</strong> define test data:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">local</span> <span class="nt">-A</span> <span class="nv">case1</span><span class="o">=(</span>
  <span class="o">[</span>name]<span class="o">=</span><span class="s1">'not run when ok'</span>
  <span class="o">[</span><span class="nb">command</span><span class="o">]=</span><span class="s2">"cmd 'echo hello'"</span>
  <span class="o">[</span>ok]<span class="o">=</span><span class="nb">true</span>
  <span class="o">[</span>wants]<span class="o">=</span><span class="s2">"(ok 'not run when ok')"</span>
<span class="o">)</span>
</code></pre></div></div>

<p><strong>Unpack with <code class="language-plaintext highlighter-rouge">Inherit</code></strong>. Unset optional fields first so missing keys don’t carry over:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">unset</span> <span class="nt">-v</span> ok shortrun prog unchg want wanterr
<span class="nb">eval</span> <span class="s2">"</span><span class="si">$(</span>Inherit <span class="s2">"</span><span class="nv">$casename</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span>
</code></pre></div></div>

<p><strong>Run with <code class="language-plaintext highlighter-rouge">RunCases ${!case@}</code></strong> — pass all case variables at once:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RunCases <span class="k">${</span><span class="p">!case@</span><span class="k">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">RunCases</code> iterates its arguments internally and returns 1 if any case failed, 128 on fatal. For per-case error handling (e.g., early return on fatal), use a loop:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">local </span><span class="nv">failed</span><span class="o">=</span>0 casename
<span class="k">for </span>casename <span class="k">in</span> <span class="k">${</span><span class="p">!case@</span><span class="k">}</span><span class="p">;</span> <span class="k">do
  </span>RunCases <span class="nv">$casename</span> <span class="o">||</span> <span class="o">{</span>
    <span class="o">((</span> <span class="nv">$?</span> <span class="o">==</span> 128 <span class="o">))</span> <span class="o">&amp;&amp;</span> <span class="k">return </span>128   <span class="c"># fatal</span>
    <span class="nv">failed</span><span class="o">=</span>1
  <span class="o">}</span>
<span class="k">done
return</span> <span class="nv">$failed</span>
</code></pre></div></div>

<p><strong>Assertion failure output</strong> shows a diff and a copy-paste line for easy test updates:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[[</span> <span class="nv">$got</span> <span class="o">==</span> <span class="nv">$want</span> <span class="o">]]</span> <span class="o">||</span> <span class="o">{</span>
  <span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">NL</span><span class="k">}</span><span class="s2">cmd: got doesn't match want:</span><span class="nv">$NL</span><span class="si">$(</span>Diff <span class="s2">"</span><span class="nv">$got</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$want</span><span class="s2">"</span><span class="si">)</span><span class="nv">$NL</span><span class="s2">"</span>
  <span class="nb">echo</span> <span class="s2">"use this line to update want to match this output:</span><span class="k">${</span><span class="nv">NL</span><span class="k">}</span><span class="s2">want=</span><span class="k">${</span><span class="nv">got</span><span class="p">@Q</span><span class="k">}</span><span class="s2">"</span>
  <span class="k">return </span>1
<span class="o">}</span>
</code></pre></div></div>

<p><strong>Assertion helpers</strong> — the preferred pattern (replaces the manual version above):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>AssertGot <span class="s2">"</span><span class="nv">$got</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$want</span><span class="s2">"</span>
AssertRC <span class="nv">$rc</span> 0
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">AssertGot</code> compares strings, shows a diff and copy-paste update line on mismatch. <code class="language-plaintext highlighter-rouge">AssertRC</code> compares return codes. Both return 1 on failure.</p>

<p><strong>Subshell <code class="language-plaintext highlighter-rouge">()</code></strong> for directory isolation in setup helpers — changes to working directory don’t leak:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>createCloneRepo<span class="o">()</span> <span class="o">(</span>
  git init clone
  <span class="nb">cd </span>clone
  <span class="nb">echo </span>hello <span class="o">&gt;</span>hello.txt
  git add hello.txt
  git commit <span class="nt">-m</span> init
<span class="o">)</span> <span class="o">&gt;</span>/dev/null
</code></pre></div></div>

<p><strong><code class="language-plaintext highlighter-rouge">MktempDir</code></strong> with deferred cleanup (cleanup is registered automatically via <code class="language-plaintext highlighter-rouge">Defer</code>; see Section 14 for the implementation):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MktempDir <span class="nb">dir</span> <span class="o">||</span> <span class="k">return </span>128
</code></pre></div></div>

<p><strong>AAA structure</strong>: <code class="language-plaintext highlighter-rouge">## arrange</code>, <code class="language-plaintext highlighter-rouge">## act</code>, <code class="language-plaintext highlighter-rouge">## assert</code> comment sections in each subtest.</p>

<h2 id="13-fp-pipeline-helpers">13. FP Pipeline Helpers</h2>

<p>Stdin-based composition: command name as first arg, applied to each line via <code class="language-plaintext highlighter-rouge">eval</code>. Core trio: <code class="language-plaintext highlighter-rouge">Each</code> (side effects), <code class="language-plaintext highlighter-rouge">Map</code> (transform), <code class="language-plaintext highlighter-rouge">KeepIf</code>/<code class="language-plaintext highlighter-rouge">RemoveIf</code> (filter). The <code class="language-plaintext highlighter-rouge">eval "$command $arg"</code> pattern assumes trusted input — callers are responsible for escaping with <code class="language-plaintext highlighter-rouge">printf %q</code> if values originate from untrusted sources.</p>

<p>The pattern:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>each<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local command</span><span class="o">=</span><span class="nv">$1</span> arg
  <span class="k">while </span><span class="nv">IFS</span><span class="o">=</span><span class="s1">''</span> <span class="nb">read</span> <span class="nt">-r</span> arg<span class="p">;</span> <span class="k">do
    </span><span class="nb">eval</span> <span class="s2">"</span><span class="nv">$command</span><span class="s2"> </span><span class="nv">$arg</span><span class="s2">"</span>
  <span class="k">done</span>
<span class="o">}</span>

keepIf<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local command</span><span class="o">=</span><span class="nv">$1</span> arg
  <span class="k">while </span><span class="nv">IFS</span><span class="o">=</span><span class="s1">''</span> <span class="nb">read</span> <span class="nt">-r</span> arg<span class="p">;</span> <span class="k">do
    </span><span class="nb">eval</span> <span class="s2">"</span><span class="nv">$command</span><span class="s2"> </span><span class="nv">$arg</span><span class="s2">"</span> <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">"</span><span class="nv">$arg</span><span class="s2">"</span>
  <span class="k">done
  return </span>0
<span class="o">}</span>

map<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local </span><span class="nv">VARNAME</span><span class="o">=</span><span class="nv">$1</span> <span class="nv">EXPRESSION</span><span class="o">=</span><span class="nv">$2</span>
  <span class="nb">local</span> <span class="s2">"</span><span class="nv">$VARNAME</span><span class="s2">"</span>
  <span class="k">while </span><span class="nv">IFS</span><span class="o">=</span><span class="s1">''</span> <span class="nb">read</span> <span class="nt">-r</span> <span class="s2">"</span><span class="nv">$VARNAME</span><span class="s2">"</span><span class="p">;</span> <span class="k">do
    </span><span class="nb">eval</span> <span class="s2">"echo </span><span class="se">\"</span><span class="nv">$EXPRESSION</span><span class="se">\"</span><span class="s2">"</span>
  <span class="k">done</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Call site:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>each Ln &lt;&lt;<span class="s1">'  END'</span>
  .config         ~/config
  .local          ~/local
  .ssh            ~/ssh
  secrets/netrc   ~/.netrc
END
</code></pre></div></div>

<p>Inline versions are common in standalone scripts; a shared library consolidates them with <code class="language-plaintext highlighter-rouge">return 0</code> guards to prevent error propagation from the last iteration.</p>

<h2 id="14-trap-handling">14. Trap Handling</h2>

<p>EXIT traps only — ERR, DEBUG, RETURN, and signal handlers are not used.</p>

<p><strong>Two patterns coexist</strong>: single assignment (scripts) and stacked (libraries).</p>

<p><strong>Single assignment</strong> — scripts and test functions that control their own trap:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">dir</span><span class="o">=</span><span class="si">$(</span><span class="nb">mktemp</span> <span class="nt">-d</span><span class="si">)</span>
<span class="nb">trap</span> <span class="s2">"rm -rf </span><span class="nv">$dir</span><span class="s2">"</span> EXIT
</code></pre></div></div>

<p>Direct <code class="language-plaintext highlighter-rouge">trap "..." EXIT</code> overwrites any previous handler. Safe when the function or script owns its entire trap lifecycle.</p>

<p><strong>Stacked/deferred</strong> — libraries that must not overwrite the caller’s trap:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Defer<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local command</span><span class="o">=</span><span class="nv">$1</span>
  <span class="nb">local </span><span class="nv">NL</span><span class="o">=</span><span class="s1">$'</span><span class="se">\n</span><span class="s1">'</span>
  <span class="nb">trap</span> <span class="s2">"</span><span class="nv">$command$NL</span><span class="si">$(</span>existingDeferlist<span class="si">)</span><span class="s2">"</span> EXIT
<span class="o">}</span>
</code></pre></div></div>

<p>New handlers prepend to the existing chain. <code class="language-plaintext highlighter-rouge">existingDeferlist</code> extracts the current handler via <code class="language-plaintext highlighter-rouge">trap -p EXIT</code> and strips the wrapper syntax. Commands execute in FIFO order. Use newlines (not semicolons) as separators — semicolons interact poorly with backgrounding (<code class="language-plaintext highlighter-rouge">&amp;</code>).</p>

<p><strong>Temp directory cleanup</strong> — the canonical pattern:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MktempDir<span class="o">()</span> <span class="o">{</span>
  <span class="nb">local</span> <span class="nt">-n</span> <span class="nv">DIR</span><span class="o">=</span><span class="nv">$1</span>
  <span class="nv">DIR</span><span class="o">=</span><span class="si">$(</span><span class="nb">mktemp</span> <span class="nt">-d</span> /tmp/bash.XXXXXX<span class="si">)</span> <span class="o">||</span> <span class="o">{</span> <span class="nb">echo</span> <span class="s1">'could not create temporary directory'</span><span class="p">;</span> <span class="k">return </span>1<span class="p">;</span> <span class="o">}</span>
  <span class="o">[[</span> <span class="nv">$DIR</span> <span class="o">==</span> /<span class="k">*</span>/<span class="k">*</span> <span class="o">]]</span> <span class="o">||</span> <span class="o">{</span> <span class="nb">echo</span> <span class="s1">'temporary directory does not comply with naming requirements'</span><span class="p">;</span> <span class="k">return </span>1<span class="p">;</span> <span class="o">}</span>
  <span class="o">[[</span> <span class="nt">-d</span> <span class="nv">$DIR</span> <span class="o">]]</span> <span class="o">||</span> <span class="o">{</span> <span class="nb">echo</span> <span class="s1">'temporary directory was made but does not exist now'</span><span class="p">;</span> <span class="k">return </span>1<span class="p">;</span> <span class="o">}</span>
  Defer <span class="s2">"rm -rf </span><span class="nv">$DIR</span><span class="s2">"</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Validates the path before registering cleanup. The <code class="language-plaintext highlighter-rouge">/*/* </code> guard prevents <code class="language-plaintext highlighter-rouge">rm -rf /</code> if <code class="language-plaintext highlighter-rouge">mktemp</code> returns something unexpected.</p>

<h2 id="15-risks-and-limitations">15. Risks and Limitations</h2>

<p><code class="language-plaintext highlighter-rouge">IFS=$'\n'</code> + noglob + naming conventions eliminate most bash footguns, but not all. Each risk below describes the bash mechanism, how it bites, and the mitigation.</p>

<p><strong>1. Dynamic scoping collision.</strong> A callee that omits <code class="language-plaintext highlighter-rouge">local</code> silently modifies the caller’s variable. A nameref whose name matches its target creates a circular reference:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>outer<span class="o">()</span> <span class="o">{</span> <span class="nb">local </span><span class="nv">x</span><span class="o">=</span>before<span class="p">;</span> inner<span class="p">;</span> <span class="nb">echo</span> <span class="nv">$x</span><span class="p">;</span> <span class="o">}</span>   <span class="c"># prints "after" — inner modified outer's x</span>
inner<span class="o">()</span> <span class="o">{</span> <span class="nv">x</span><span class="o">=</span>after<span class="p">;</span> <span class="o">}</span>                           <span class="c"># no local — writes to caller's scope</span>

wrapper<span class="o">()</span> <span class="o">{</span> <span class="nb">local</span> <span class="nt">-n</span> <span class="nv">REF</span><span class="o">=</span><span class="nv">$1</span><span class="p">;</span> <span class="nv">REF</span><span class="o">=</span>value<span class="p">;</span> <span class="o">}</span>
wrapper REF   <span class="c"># circular reference — bash emits "circular name reference" error</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> follow naming conventions (Section 3) — <code class="language-plaintext highlighter-rouge">camelCase</code> locals, <code class="language-plaintext highlighter-rouge">UPPERCASE</code> namerefs. Document intentional cross-scope access with <code class="language-plaintext highlighter-rouge"># in caller's scope</code>. See Section 6 for the full explanation.</p>

<p><strong>2. Eval injection.</strong> The FP helpers execute <code class="language-plaintext highlighter-rouge">eval "$command $arg"</code> where <code class="language-plaintext highlighter-rouge">$arg</code> is a line from stdin. If <code class="language-plaintext highlighter-rouge">arg</code> contains shell metacharacters, they execute as code:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s1">'; rm -rf /tmp/important'</span> | each processLine   <span class="c"># eval runs: processLine ; rm -rf /tmp/important</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> only pass trusted input through FP pipelines. For untrusted values, escape with <code class="language-plaintext highlighter-rouge">printf -v safe '%q' "$untrusted"</code> before piping. The trust boundary is the <code class="language-plaintext highlighter-rouge">eval</code> call — everything reaching it must be safe to execute as shell words.</p>

<p><strong>3. <code class="language-plaintext highlighter-rouge">[[</code> RHS pattern matching.</strong> In <code class="language-plaintext highlighter-rouge">[[ $x == $y ]]</code>, the unquoted RHS is a glob pattern — <code class="language-plaintext highlighter-rouge">*</code>, <code class="language-plaintext highlighter-rouge">?</code>, and <code class="language-plaintext highlighter-rouge">[</code> are wildcards. This is independent of <code class="language-plaintext highlighter-rouge">set -o noglob</code>, which only affects pathname expansion in command arguments. <code class="language-plaintext highlighter-rouge">[[</code> has its own pattern-matching rules:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">want</span><span class="o">=</span><span class="s1">'file[1]'</span>
<span class="o">[[</span> <span class="s1">'file[1]'</span> <span class="o">==</span> <span class="nv">$want</span> <span class="o">]]</span>    <span class="c"># false — [1] is a character class matching the single character 1</span>
<span class="o">[[</span> <span class="s1">'file[1]'</span> <span class="o">==</span> <span class="s2">"</span><span class="nv">$want</span><span class="s2">"</span> <span class="o">]]</span>  <span class="c"># true — literal comparison</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> quote the RHS for literal comparison: <code class="language-plaintext highlighter-rouge">[[ $x == "$y" ]]</code>. Leave unquoted only for intentional pattern matching: <code class="language-plaintext highlighter-rouge">[[ $OSTYPE == darwin* ]]</code>.</p>

<p><strong>4. Trailing newline stripping.</strong> Command substitution <code class="language-plaintext highlighter-rouge">$(command)</code> always strips trailing newlines from the output. This is a POSIX requirement, not a bash quirk:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">output</span><span class="o">=</span><span class="si">$(</span><span class="nb">printf</span> <span class="s1">'hello\n\n'</span><span class="si">)</span>   <span class="c"># output is "hello" — both trailing newlines stripped</span>
<span class="nv">content</span><span class="o">=</span><span class="si">$(</span><span class="nb">cat</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span><span class="si">)</span>          <span class="c"># file's trailing newline(s) silently lost</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> if trailing newlines matter, append a sentinel and strip it: <code class="language-plaintext highlighter-rouge">output=$(command; echo x); output=${output%x}</code>. In practice, this rarely matters — most values are single-line identifiers or paths.</p>

<p><strong>5. <code class="language-plaintext highlighter-rouge">set -e</code> propagation.</strong> In bash versions before 4.4, <code class="language-plaintext highlighter-rouge">set -e</code> does not propagate into command substitutions <code class="language-plaintext highlighter-rouge">$(...)</code>, so failures inside are silently swallowed. Bash 4.4 introduced <code class="language-plaintext highlighter-rouge">shopt -s inherit_errexit</code> to fix this, but it is <strong>off by default</strong> — you must enable it explicitly. Even with <code class="language-plaintext highlighter-rouge">inherit_errexit</code>, compound commands inside <code class="language-plaintext highlighter-rouge">$(...)</code> can still behave unexpectedly. Process substitutions <code class="language-plaintext highlighter-rouge">&lt;(...)</code> never inherit <code class="language-plaintext highlighter-rouge">set -e</code> in any version:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">set</span> <span class="nt">-e</span>
<span class="nv">result</span><span class="o">=</span><span class="si">$(</span><span class="nb">false</span><span class="p">;</span> <span class="nb">echo</span> <span class="s2">"still runs"</span><span class="si">)</span>    <span class="c"># "still runs" executes — errexit not inherited without inherit_errexit</span>
<span class="k">while </span><span class="nb">read</span> <span class="nt">-r</span> line<span class="p">;</span> <span class="k">do
  </span>process <span class="s2">"</span><span class="nv">$line</span><span class="s2">"</span>
<span class="k">done</span> &lt; &lt;<span class="o">(</span>failing_command<span class="o">)</span>              <span class="c"># failure undetected — process substitution ignores set -e</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> don’t rely on <code class="language-plaintext highlighter-rouge">set -e</code> inside command substitutions. Use explicit RC capture: <code class="language-plaintext highlighter-rouge">result=$(command) &amp;&amp; rc=$? || rc=$?</code>. For critical operations, check <code class="language-plaintext highlighter-rouge">$?</code> after every command substitution. Alternatively, add <code class="language-plaintext highlighter-rouge">shopt -s inherit_errexit</code> to the preamble (bash 4.4+) to propagate <code class="language-plaintext highlighter-rouge">set -e</code> into command substitutions — but process substitutions remain unaffected.</p>

<p><strong>6. Pipeline subshell variable loss.</strong> Each stage of a pipeline runs in a subshell. Variables modified inside a pipeline stage are lost when it exits:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">count</span><span class="o">=</span>0
<span class="nb">command</span> | <span class="k">while </span><span class="nb">read</span> <span class="nt">-r</span> line<span class="p">;</span> <span class="k">do </span>count+<span class="o">=</span>1<span class="p">;</span> <span class="k">done
</span><span class="nb">echo</span> <span class="nv">$count</span>   <span class="c"># still 0 — the while loop ran in a subshell</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> use process substitution instead: <code class="language-plaintext highlighter-rouge">while read -r line; do count+=1; done &lt; &lt;(command)</code>. This runs the loop in the current shell while the command runs in the subshell. Code following these conventions avoids piping into loops.</p>

<p><strong>7. <code class="language-plaintext highlighter-rouge">loosely()</code> hardcoded restore.</strong> The <code class="language-plaintext highlighter-rouge">loosely()</code> wrapper does <code class="language-plaintext highlighter-rouge">set +euo pipefail</code> then <code class="language-plaintext highlighter-rouge">set -euo pipefail</code> after the command. It doesn’t capture the previous shell options — it assumes the caller always uses <code class="language-plaintext highlighter-rouge">-euo pipefail</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">set</span> <span class="nt">-eu</span>              <span class="c"># no pipefail yet</span>
loosely <span class="nb">source </span>lib   <span class="c"># sets +euo pipefail, then -euo pipefail</span>
<span class="c"># now pipefail is ON even though caller never set it</span>
</code></pre></div></div>

<p><strong>Mitigation:</strong> <code class="language-plaintext highlighter-rouge">loosely()</code> is safe only after <code class="language-plaintext highlighter-rouge">set -euo pipefail</code> is set. For library code that needs to temporarily relax options, save and restore with <code class="language-plaintext highlighter-rouge">set +o</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">local </span>prevOpts
<span class="nv">prevOpts</span><span class="o">=</span><span class="si">$(</span><span class="nb">set</span> +o<span class="si">)</span>        <span class="c"># captures restore commands for all options</span>
<span class="nb">set</span> +eu<span class="p">;</span> <span class="nb">set</span> +o pipefail
<span class="nb">command
eval</span> <span class="s2">"</span><span class="nv">$prevOpts</span><span class="s2">"</span>           <span class="c"># restores exact previous state</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">set +o</code> outputs <code class="language-plaintext highlighter-rouge">set -o</code>/<code class="language-plaintext highlighter-rouge">set +o</code> commands that reproduce the current option state. This handles all options including <code class="language-plaintext highlighter-rouge">pipefail</code> without fragile string matching.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Bash Style Guide]]></summary></entry><entry><title type="html">Breadcrumbs for Humans and AI: How Pattern Docs Guide Developers to Correct Code</title><link href="https://binaryphile.github.io/development/2026/02/02/breadcrumbs-for-humans-and-ai-how-pattern-docs-guide-developers-to-correct-code.html" rel="alternate" type="text/html" title="Breadcrumbs for Humans and AI: How Pattern Docs Guide Developers to Correct Code" /><published>2026-02-02T00:00:00+00:00</published><updated>2026-02-02T00:00:00+00:00</updated><id>https://binaryphile.github.io/development/2026/02/02/breadcrumbs-for-humans-and-ai-how-pattern-docs-guide-developers-to-correct-code</id><content type="html" xml:base="https://binaryphile.github.io/development/2026/02/02/breadcrumbs-for-humans-and-ai-how-pattern-docs-guide-developers-to-correct-code.html"><![CDATA[<p>A backend returns 200 OK with a JSON error body when downloads fail. This may seem unexpected at first. 200 indicates success. Arguably this is a protocol adherence issue, but it remains. Every new developer that works on downloads must learn this—one way or another. Every code review catches someone checking response.ok. The knowledge exists—in some developers’ heads.</p>

<p>This is tribal knowledge. It doesn’t scale. People leave, context-switch, or just forget. Code review becomes an oral tradition.</p>

<p>Pattern docs fix this. They externalize institutional knowledge into structured documentation that lives alongside the code. And because they’re structured, AI assistants benefit too—but that’s a bonus, not the point.</p>

<h2 id="the-problem-knowledge-that-doesnt-scale">The Problem: Knowledge That Doesn’t Scale</h2>

<p>Every codebase has conventions that aren’t obvious from the code:</p>

<ul>
  <li>Why we check Content-Type instead of response.ok</li>
  <li>When to use the cache freshness indicator (and when not to)</li>
  <li>Which ESLint rules we wrote ourselves and why</li>
</ul>

<p>This knowledge lives in people’s heads. It transfers through:</p>

<ul>
  <li>Code review comments (repeated endlessly)</li>
  <li>Slack threads (unsearchable after a month)</li>
  <li>Onboarding conversations (different every time)</li>
  <li>Trial and error (expensive)</li>
</ul>

<p>The result: inconsistent code, repeated mistakes, slow onboarding, and knowledge that walks out the door when people leave.</p>

<h2 id="the-solution-pattern-documentation">The Solution: Pattern Documentation</h2>

<p>Pattern docs capture the “why” behind conventions. They live in <code class="language-plaintext highlighter-rouge">docs/patterns/</code> alongside the codebase.</p>

<p>Each pattern doc answers:</p>

<ul>
  <li><strong>What’s the problem?</strong> Code example of what fails</li>
  <li><strong>What’s the solution?</strong> Working code with comments</li>
  <li><strong>When do I use this?</strong> Decision criteria</li>
  <li><strong>How do I find existing usages?</strong> Grep command</li>
</ul>

<h3 id="example-defensive-file-download">Example: Defensive File Download</h3>

<p><strong>Problem:</strong></p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// PROBLEMATIC - Don't use</span>
<span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="nx">downloadPath</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">response</span><span class="p">.</span><span class="nx">ok</span><span class="p">)</span> <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="dl">'</span><span class="s1">Download failed</span><span class="dl">'</span><span class="p">);</span>
<span class="c1">// This misses errors! The backend returns 200 OK with JSON error body</span>
</code></pre></div></div>

<p><strong>Solution:</strong></p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Check Content-Type, not status code</span>
<span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="nx">downloadPath</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">contentType</span> <span class="o">=</span> <span class="nx">response</span><span class="p">.</span><span class="nx">headers</span><span class="p">.</span><span class="kd">get</span><span class="p">(</span><span class="dl">'</span><span class="s1">Content-Type</span><span class="dl">'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">contentType</span><span class="p">?.</span><span class="nx">includes</span><span class="p">(</span><span class="dl">'</span><span class="s1">application/json</span><span class="dl">'</span><span class="p">))</span> <span class="p">{</span>
    <span class="kd">const</span> <span class="nx">errorData</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">response</span><span class="p">.</span><span class="nx">json</span><span class="p">();</span>
    <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="nx">errorData</span><span class="p">.</span><span class="nx">error</span> <span class="o">||</span> <span class="dl">'</span><span class="s1">Failed to download file</span><span class="dl">'</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>When to use:</strong> User-initiated downloads needing error feedback</p>

<p><strong>When NOT to use:</strong> Static CDN files, streaming large files (&gt;100MB)</p>

<h2 id="human-benefits">Human Benefits</h2>

<p><strong>Onboarding and knowledge preservation:</strong> New developers read the pattern doc instead of discovering conventions through trial and error. When someone leaves, the knowledge stays. “Why do we do it this way?” has a documented answer that doesn’t depend on who’s in the room.</p>

<p><strong>Code review:</strong> Instead of explaining the same convention repeatedly, link to the pattern doc. Review comments become “See docs/patterns/defensive-file-download.md” instead of a paragraph of explanation.</p>

<p><strong>Consistency:</strong> When the pattern is documented, people follow it. When it’s tribal knowledge, they reinvent it—differently each time.</p>

<p><strong>Discoverability:</strong> Comments in code point to pattern docs:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// See: docs/patterns/defensive-file-download.md</span>
<span class="kd">const</span> <span class="nx">response</span> <span class="o">=</span> <span class="k">await</span> <span class="nx">fetch</span><span class="p">(</span><span class="nx">downloadPath</span><span class="p">);</span>
</code></pre></div></div>

<p>Developers see the comment, follow the link, understand the context. The breadcrumb is right where they need it.</p>

<h2 id="ai-benefits-the-bonus">AI Benefits (The Bonus)</h2>

<p>If you document patterns for humans, AI assistants benefit automatically.</p>

<p>When an AI coding assistant reads code with a <code class="language-plaintext highlighter-rouge">// See: docs/patterns/...</code> comment, it follows the path. LLMs gather context before suggesting changes—a file path is an unambiguous signal.</p>

<p>The pattern doc answers what the AI implicitly asks: “Why is this code written this way? What constraints apply?”</p>

<p><strong>Before pattern docs:</strong> AI suggests <code class="language-plaintext highlighter-rouge">if (!response.ok)</code>—correct generically, wrong for this codebase. Developer corrects it manually.</p>

<p><strong>After pattern docs:</strong> AI reads the pattern doc, suggests the Content-Type check. No correction needed.</p>

<p>Same docs, two audiences. Write once, benefit twice.</p>

<h2 id="ai-assists-the-accelerator">AI Assists (The Accelerator)</h2>

<p>AI assistants don’t just consume pattern docs—they help create them.</p>

<p><strong>The grade/improve loop:</strong></p>

<ol>
  <li>Describe the problem to the AI, show examples, let it draft</li>
  <li>Ask the AI: “Grade this pattern doc—is it clear? Complete? Are the examples concrete?”</li>
  <li>Prompt: “Improve” → the AI addresses its own critique</li>
  <li>Repeat until satisfied</li>
  <li>Apply your codebase knowledge, deploy, refine when reality reveals gaps</li>
</ol>

<p>The AI handles the structure; you provide the institutional knowledge. Documentation that used to get postponed indefinitely now gets written.</p>

<h2 id="patterns-evolve">Patterns Evolve</h2>

<p>Pattern docs aren’t static. They evolve as real-world use reveals gaps.</p>

<p><strong>Example:</strong> A custom ESLint rules pattern evolved over a few days:</p>

<ul>
  <li>Initial version flagged a specific accessor option</li>
  <li>Refined to “all accessors should be suspect”—the initial scope was too narrow</li>
</ul>

<p><strong>The update workflow:</strong></p>

<ol>
  <li>Discovery: Real-world use reveals the pattern is incomplete</li>
  <li>Update the doc (source of truth)</li>
  <li>Run Find References: <code class="language-plaintext highlighter-rouge">grep -rn "docs/patterns/your-pattern" src/</code></li>
  <li>Update code comments if needed</li>
</ol>

<p>Bidirectional traceability—code points to docs, docs find code—makes updates systematic rather than “hope everyone got the memo.”</p>

<h2 id="when-this-doesnt-work">When This Doesn’t Work</h2>

<p><strong>Patterns requiring judgment:</strong> “Choose appropriate log level” doesn’t help anyone—human or AI. You need: “Use ERROR for user-facing failures, WARN for recoverable issues, DEBUG for everything else.”</p>

<p><strong>Unstable conventions:</strong> Patterns that change weekly create maintenance churn. Start with stable, mechanical conventions.</p>

<p><strong>Overhead:</strong> Doc renames require updating all reference sites. Worth it for stable patterns; consider this before frequent reorganization.</p>

<h2 id="getting-started">Getting Started</h2>

<p><strong>Start with work you just finished:</strong> You just fixed a bug or implemented a feature. Was there something non-obvious? A gotcha you discovered? Document it now while the context is fresh. That’s your first pattern doc.</p>

<p><strong>Template:</strong></p>

<ul>
  <li><strong>Problem Statement</strong> - code example of what fails (and why)</li>
  <li><strong>Solution</strong> - working code with comments</li>
  <li><strong>When to Use / When NOT to Use</strong> - decision criteria</li>
  <li><strong>Find References</strong> - grep command to locate usages</li>
</ul>

<p><strong>Add the breadcrumb:</strong> Put <code class="language-plaintext highlighter-rouge">// See: docs/patterns/your-pattern.md</code> in the relevant code. Now it’s discoverable.</p>

<p><strong>Use AI to draft:</strong> Describe the problem, let AI draft, grade/improve until satisfied.</p>

<h2 id="the-payoff">The Payoff</h2>

<p>Document conventions for humans. AI assistants benefit automatically. AI assistants help you write the docs faster.</p>

<p>The knowledge that used to exist only in people’s heads—now it scales.</p>]]></content><author><name></name></author><category term="development" /><summary type="html"><![CDATA[A backend returns 200 OK with a JSON error body when downloads fail. This may seem unexpected at first. 200 indicates success. Arguably this is a protocol adherence issue, but it remains. Every new developer that works on downloads must learn this—one way or another. Every code review catches someone checking response.ok. The knowledge exists—in some developers’ heads.]]></summary></entry><entry><title type="html">The G/I Cycle: How Specific Deductions Beat ‘Try Harder’</title><link href="https://binaryphile.github.io/development/2026/02/02/the-g-i-cycle-iterative-refinement-with-ai-assistants.html" rel="alternate" type="text/html" title="The G/I Cycle: How Specific Deductions Beat ‘Try Harder’" /><published>2026-02-02T00:00:00+00:00</published><updated>2026-02-02T00:00:00+00:00</updated><id>https://binaryphile.github.io/development/2026/02/02/the-g-i-cycle-iterative-refinement-with-ai-assistants</id><content type="html" xml:base="https://binaryphile.github.io/development/2026/02/02/the-g-i-cycle-iterative-refinement-with-ai-assistants.html"><![CDATA[<p>You write something with AI. It’s 70% right. Now what?</p>

<p>Most people accept it. That leaves quality on the table — wins that need only a little effort to tease out, but are typically much more expensive to defer to implementation.</p>

<p>The G/I cycle fixes this.</p>

<h2 id="the-gi-cycle">The G/I Cycle</h2>

<p>G/I stands for Grade/Improve. The cycle is simple:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Work → Grade → Improve → Re-grade → Repeat until stuck
</code></pre></div></div>

<p><strong>Grade</strong> means assigning a letter grade with specific point deductions. Not “this is pretty good” — that tells you nothing. Instead: “B+ (86/100). Deductions: -5 for not checking X, -4 for missing baseline, -3 for unverified assumption.”</p>

<p><strong>Improve</strong> means addressing those deductions. Each “-5 for X” becomes a task. Do the task, then grade again.</p>

<p><strong>Repeat</strong> until you can’t identify concrete improvements, or remaining deductions total less than 5 points.</p>

<p><strong>The test:</strong> “If asked to improve right now, what would I do?” If you have an answer, you’re not done.</p>

<h2 id="why-it-works">Why It Works</h2>

<p>Three mechanisms:</p>

<p><strong>1. Provides attention bandwidth.</strong> Each iteration lets the model focus on concerns it couldn’t address earlier. It genuinely improves itself across passes. These are free wins — you just say “improve” and the LLM follows its own judgment based on its grade. Most G/I cycles are just this: low-effort extraction of quality the model already knows how to deliver.</p>

<p><strong>2. Exposes thinking for course correction.</strong> Grading externalizes the model’s assessment. You can see what it thinks is wrong. Most of the time, you let it run. But occasionally you notice something off — a wrong assumption, a misguided priority. That’s when you redirect. A single course correction can prevent entire avenues of wasted inquiry.</p>

<p><strong>3. Surfaces unknown unknowns.</strong> Grading forces the model to ask “what didn’t I check?” — questions it wouldn’t ask if just told to “improve.” For deeper blind spots, use “grade your analysis” to grade at a meta level: the thinking process, not just the output.</p>

<p><strong>A note on self-grading:</strong> LLMs grade themselves leniently. If you find gaps after an A, the A was wrong. B is not “acceptable” — B is incomplete work. Push past it.</p>

<h2 id="the-economics">The Economics</h2>

<p><strong>Stand on the LLM’s shoulders, not vice versa.</strong></p>

<p>Your attention is expensive. The LLM’s iterations are cheap. Let it do its best work first — then invest your attention in evaluating the result.</p>

<p>Wrong: You guide every step → LLM executes → you fix gaps
Right: LLM iterates to its best → you evaluate final output → you build on that foundation</p>

<p><strong>When to step in:</strong> Remaining deductions under 5 points, grade stabilizes across iterations, or gaps require information you have and it doesn’t. Don’t stop just because you “improved once” or it “feels complete.” Use the point threshold.</p>

<h2 id="one-caveat">One Caveat</h2>

<p>Self-run G/I cycles in a single response aren’t worthwhile — except that they expose thinking for course correction. The value is in the separate prompts: you see the thinking, you can redirect if needed, then you say “improve.” Ignore the grade itself — focus on the deductions. If there are actionable deductions you find valuable, it’s not done, even if it gave itself an A+. It wanted to be done, but shouldn’t be. For deeper blind spots, say “grade your analysis” to surface unknown unknowns.</p>

<h2 id="when-gi-works">When G/I Works</h2>

<p>Structured content, documentation, analysis, code review prep.</p>

<p>Why: These domains have verifiable criteria. You can objectively assess completeness, accuracy, and coverage. The grade has meaning.</p>

<h2 id="when-gi-doesnt-work">When G/I Doesn’t Work</h2>

<ul>
  <li><strong>Creative work</strong> — no objective grading standard</li>
  <li><strong>Unstable requirements</strong> — criteria change faster than iterations</li>
  <li><strong>Time pressure under 5 minutes</strong> — overhead exceeds benefit</li>
</ul>

<h2 id="getting-started">Getting Started</h2>

<p>Try it on your next draft:</p>

<ol>
  <li>Ask the AI: “grade the plan” when planning, or “grade your work” after implementation</li>
  <li>Glance at the deductions — redirect only if something looks off</li>
  <li>Ask it, “improve” (nothing specific)</li>
  <li>Repeat until deductions total less than 5 points</li>
  <li>Now invest your attention in the result</li>
</ol>

<p>Most cycles, step 2 is just a glance — you barely have to look. The AI follows its own judgment, and that’s usually fine. Just say “improve” (or configure a shortcut like <code class="language-plaintext highlighter-rouge">/i</code>). The value is in the accumulated improvement across iterations, plus the occasional checkpoint where you catch something before it goes sideways.</p>

<h2 id="example-catching-a-fabrication">Example: Catching a Fabrication</h2>

<p>A coaching report claimed “Research supports iteration for exploration and idea generation” — citing “Zhang et al. (2024).”</p>

<p>Grading would have caught:</p>
<ul>
  <li><strong>-10:</strong> Citation mismatch — actual source says TDD remediation for local errors, not “exploration”</li>
  <li><strong>-5:</strong> Phantom citation — “Zhang et al. (2024)” doesn’t exist</li>
</ul>

<p>Without G/I, the claim survived to the final report as unsourced “common wisdom.” With G/I, it would have been flagged and fixed in iteration 1.</p>

<h2 id="the-payoff">The Payoff</h2>

<p>The G/I cycle lets you extract the LLM’s best work before investing your attention. You stand on its shoulders rather than having it stand on yours.</p>

<p>The resulting plan stands alone — the synthesis baked in the dependencies. That’s how you free attention for implementation: you’re not carrying unresolved planning concerns forward.</p>

<h2 id="the-reference">The Reference</h2>

<p>Copy this into your LLM’s system prompt or project instructions:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># G/I Cycle Reference</span>

<span class="gu">## The Cycle</span>

Work → Grade → Improve → Re-grade → Repeat until stuck

<span class="gs">**Grade:**</span> Assign a letter grade with specific point deductions.
<span class="gs">**Improve:**</span> Address the deductions (or just say "improve" and let the LLM follow its judgment).
<span class="gs">**Repeat:**</span> Until remaining deductions &lt;5 points or you hit a wall.

<span class="gu">## Why It Works (Practical)</span>

<span class="gu">### 1. Attention Bandwidth (Primary Benefit)</span>

Each iteration lets the model focus on concerns it couldn't address earlier. Most G/I cycles are just this: low-effort wins you'd otherwise defer to implementation.

<span class="gu">### 2. Course Correction (Occasional)</span>

Grading externalizes the model's thinking. Most of the time, you let it run. Occasionally you notice something off and redirect. A single course correction can prevent entire avenues of wasted inquiry.

<span class="gu">### 3. Surfaces Unknown Unknowns</span>

Grading forces the model to ask "what didn't I check?" — questions it wouldn't ask if just told to "improve." For deeper blind spots, use "grade your analysis" to grade at a meta level.

<span class="gu">## Why Complexity Requires G/I (Theory)</span>

One theory that aligns with observed results: LLMs have limited coherent attention for evaluating plans. Single-shot has enough budget for trivial changes but not complex ones. G/I works around this limit through:
<span class="p">
1.</span> <span class="gs">**Output extends thinking**</span> — writing the grade surfaces concerns that wouldn't fit in the attention window otherwise
<span class="p">2.</span> <span class="gs">**Synthesis reduces dependencies**</span> — evaluation collapses conceptual complexity (like substituting y for f(x) — the evaluation happens once, not repeatedly)
<span class="p">3.</span> <span class="gs">**Addressed concerns free capacity**</span> — each iteration doesn't re-attend to what's already fixed
<span class="p">4.</span> <span class="gs">**Surfaces what the LLM doesn't know it doesn't know**</span> — LLMs have blind spots they can't see. Grading at a meta level (grading the thinking process, not just the output) can knock these loose

<span class="gs">**The phasing effect:**</span> G/I shifts planning work to the planning phase, where it belongs. Without G/I, unresolved planning concerns bleed into implementation, competing for attention and context needed for implementation details.

<span class="gs">**Self-contained plans:**</span> Planning evaluation produces a plan that stands alone — it no longer requires the context of the dependencies you evaluated to create it. The synthesis baked them in.

This reframes the economics: it's not just that fixing things later costs more effort. Unresolved planning work <span class="ge">*actively degrades*</span> implementation by consuming resources needed for implementation details.

<span class="gu">## Grading Format</span>

<span class="gs">**Weak:**</span> "I did a good job but could have done better."

<span class="gs">**Strong:**</span> "B+ (86/100). Deductions: -5 for not checking X, -4 for no baseline, -3 for unverified assumption."

<span class="gu">## Watch for Inflated Grades</span>

LLMs grade themselves leniently. If you find gaps after an A, the A was wrong. B is not "acceptable" — B is incomplete work. Push past it.

If you're getting As but the deductions feel real, they are real. Address them.

<span class="gu">## The Test</span>
<span class="gt">
&gt; "If asked to improve right now, what would I do?"</span>

If you have an answer, you're not done.

<span class="gu">## When to Stop (Valid)</span>

| Condition | Action |
|-----------|--------|
| Remaining deductions &lt;5 points | Stop — diminishing returns |
| Gaps require unavailable data | Stop — document as limitation |
| Next iteration would repeat searches | Stop — exhausted the approach |
| Grade stabilizes across 2 iterations | Stop — no new gaps surfacing |

<span class="gu">## When NOT to Stop (Invalid)</span>
<span class="p">
-</span> "I improved once already" — one iteration is minimum, not maximum
<span class="p">-</span> "Feels complete" — subjective; use point threshold
<span class="p">-</span> "This is taking too long" — time estimates unreliable
<span class="p">-</span> "User hasn't complained" — user doesn't know what you didn't check

<span class="gu">## Economics</span>

<span class="gs">**Stand on the LLM's shoulders, not vice versa.**</span>

LLM iterations are cheap. Your attention is expensive. Let the LLM do its best work first — then invest your attention.

<span class="gs">**When to step in:**</span> Remaining deductions &lt;5 points, grade stabilizes, or gaps require data you have and it doesn't.

<span class="gu">## Observed Limitation</span>

Self-run G/I cycles in a single response aren't worthwhile — except that they expose thinking for course correction. The value is in the separate prompts: you see the thinking, you can redirect if needed, then you say "improve." Ignore the grade — focus on the deductions. If there are actionable deductions you find valuable, it's not done, even with an A+. It wanted to be done, but shouldn't be. For deeper blind spots, "grade your analysis" can surface unknown unknowns.

<span class="gu">## When G/I Works</span>
<span class="p">
-</span> Structured content
<span class="p">-</span> Documentation
<span class="p">-</span> Analysis
<span class="p">-</span> Code review prep

Why: Verifiable criteria exist. You can objectively assess completeness, accuracy, coverage.

<span class="gu">## When G/I Doesn't Work</span>
<span class="p">
-</span> <span class="gs">**Creative work**</span> — no objective grading standard
<span class="p">-</span> <span class="gs">**Unstable requirements**</span> — criteria change faster than iterations
<span class="p">-</span> <span class="gs">**Time pressure &lt;5 minutes**</span> — overhead exceeds benefit

<span class="gu">## Quick Start</span>
<span class="p">
1.</span> "grade the plan" (when planning) or "grade your work" (after implementation)
<span class="p">2.</span> Glance at deductions — redirect only if something looks off
<span class="p">3.</span> "improve" (nothing specific)
<span class="p">4.</span> Repeat until &lt;5 points remaining
<span class="p">5.</span> Invest your attention in the final result
</code></pre></div></div>]]></content><author><name></name></author><category term="development" /><summary type="html"><![CDATA[You write something with AI. It’s 70% right. Now what?]]></summary></entry></feed>