binary.phile

Codifying a Bash Style Guide as ShellCheck Plugins

2026-05-19T14:30:00+00:00

A style guide is just text. An enforced check is a tool that catches mistakes.

I have a bash style guide that I keep in a repo and re-read when I forget which way around the *List convention goes. I also have a shellcheck fork with a plugin system. The natural next step is to translate the guide into checks. That’s shellcheck-convention-plugin, and it ships nine checks codifying nine rules.

This post is the catalog plus two lessons from building it. The lessons are the value; the catalog is reference.

The catalog

Check	Rule	Guide section
SC9001	Taint flows from unquoted parameter expansion to test/cmdsub contexts	§5 quoting
SC9002	Command substitution result is tainted; quote it before using	§5 quoting
SC9003	Quoting an already-quoted-by-context value is noise	§5 quoting
SC9004	A variable cannot end in both `_` and `List` (the two mutually exclusive suffixes)	§3 naming
SC9005	Numeric variables don’t belong inside `[[ ... ]]` — use `(( ... ))`	§11 conditionals
SC9006	Inclusive language in identifiers and comments	§3 naming
SC9007	Function docstring shape: first body statement is a `# description` comment	§6 functions
SC9008	`*List` is an IFS-newline-serialized string, not an array — disallow array operations on it	§3 naming + §7 arrays
SC9009	A `local` declaration without initialization followed by an append (`x+=...`, `printf -v x`, `read x`) reads from outer scope	§6 functions + §15 FP-style

Each check has positive (should fire) and negative (should not fire) test fixtures. The plugin ships as one .so and reports Loaded plugin: libconvention-checks.so (9 check(s)) at startup. Each check has its own SC code so users can disable individuals with --disable=SC9008.

The codes are in the SC9xxx range. Upstream uses SC1xxx (parser), SC2xxx (analytics), SC3xxx (shell-dialect). SC9xxx is a convention I picked for plugins — it doesn’t collide with anything upstream is likely to issue, and a future reader can tell at a glance that an SC9xxx warning is from a plugin, not from shellcheck core.

Lesson 1: when the task and the guide disagree, the guide wins

SC9008 shipped backwards.

The task description said “warn on array operations applied to *List variables.” I read that, wrote the check, shipped it. The fixtures passed. The check fired on octopiList[0] and didn’t fire on octopi[0]. Looked correct.

It was inverted.

*List in my style guide means an IFS-serialized string — newline-separated values you read with while IFS= read -r line. Arrays use plural names: octopi, requestedTests, filenames. The task had been filed months earlier, when the convention was still in flux, and the wording reflected the older form where *List meant “array.” By the time I implemented it, the convention had inverted. The clarification lived in a separate task I didn’t read. I followed the task wording, not the guide.

The lesson: when implementing a rule, read the guide section, not the task description. Tasks describe what to do; guides describe what’s true. If they disagree, the guide wins, because the guide is what users will be checked against.

The fix: git revert, file a corrected task, re-implement against the guide, write a process retro. The retro is the part that mattered — it’s the reason I’ll catch this class of mistake next time.

Lesson 2: scope-aware checks are hard, and they’re worth the trouble

SC9009 is the only check in the catalog that requires reasoning about variable scope and order of operations within a function. Everything else can be decided from the AST node in isolation.

The rule sounds simple:

A local x declaration followed by an append (x+=..., printf -v x ..., read x, (( x += ... ))) without an intervening initialization is a bug. The append reads from outer scope before assigning, so the function silently captures and mutates a global.

Implementing it took 7 grade/improve cycles past the plan’s approval, each finding a new defect class:

read -p prompt var — the -p value got treated as a write target. Fix: extract a extractReadTargets helper that knows which read flags take values.
mapfile -t arr — same flag-value bug for mapfile. Fix: shared extractFlagAwareTargets helper.
declare -p name — the -p form is a query, not a declaration. Fix: skip declare when -p/-f/-F is present.
declare -n alias=... — the -n form is a nameref, not a value. Fix: skip when -n is present.
(( x )) — TA_Variable LHS of an arithmetic expression was being indexed as a read. Fix: track arith LHS IDs in a separate set, exclude from read positions.
(( x = y = 1 )) — chained arithmetic only registered the outer write. Fix: recurse into matched TA_Assignment for chained writes.
printf -v var fmt — the -v form is a write, but only when the flag is actually present. Fix: detect the -v flag explicitly rather than assuming any printf invocation with a variable arg is a write.

Each of these passed the previous round’s fixtures. Each surfaced when I added one more real-world script to the negative-fixture set.

The check is still not CFG-path-sensitive. It’s a lexical heuristic: walk the AST in order, build a per-scope index of (variable, first-write-kind, first-read-or-write-position), flag when the first write is an append and there’s no preceding initialization. A real CFG analysis would handle conditional initialization — if foo; then x=1; fi; x+=more — without flagging it. The lexical version flags it. That’s a known false positive and it’s documented in the check.

I shipped the lexical version because it catches the bug class — uninitialized-then-appended — without the implementation cost of a CFG. If I see real false positives in real scripts, I’ll revisit. So far, the rate is low enough that the lexical heuristic is the right cost/benefit point.

What this experiment proved

Before this work, my bash style guide was a document. People who read it (mostly me) tried to apply it; mistakes were caught in code review, when caught at all.

After this work, the guide is a tool. The same shellcheck I already run on save now refuses to let me declare userList=( inky blinky ), refuses to let me write local count; count+=1, refuses to let me write a function whose first body statement isn’t a docstring comment.

The translation isn’t perfect. SC9009 has known false positives. SC9007 fires on section-header comments that aren’t intended as docstrings. SC9006 can’t tell that master as a git branch context is allowed where master as a deployment role isn’t. These are tradeoffs — false positives are cheaper to suppress than false negatives are to find by hand.

The repo: binaryphile/shellcheck-convention-plugin. The catalog with full per-check rationale: docs/design.md in that repo. The host fork: binaryphile/shellcheck, covered in the previous post.

If you’ve written a style guide for any language and wish it were enforced, write a plugin for whichever linter your team already runs. The ROI is real. The first check costs a day; the second costs an hour.

Adding a Plugin System to ShellCheck

2026-05-19T14:00:00+00:00

I wanted shellcheck to catch a class of mistakes it wasn’t designed to catch — conventions specific to my bash style. Naming rules. Quoting under IFS=$'\n'; set -o noglob. Docstring shape. Things upstream would (rightly) never accept as core checks, because they’re house rules, not bash mistakes.

ShellCheck has no plugin system. The options are: fork it, vendor a patch, or stop wanting the thing.

So I forked it. The fork is binaryphile/shellcheck and it now loads .so files at startup. This post is about how the plugin loader works and the one parser change I had to make to keep my docstring checks honest.

The plugin shape

A plugin is a shared library exporting two C entry points:

foreign export ccall plugin_api_version :: IO CInt
foreign export ccall plugin_init        :: IO (StablePtr [CustomCheck])

plugin_api_version returns an integer. The host (the shellcheck binary) refuses to load a plugin whose version doesn’t match. plugin_init returns a list of CustomCheck values — each is a function Parameters -> Token -> Writer [TokenComment] (), the same type as a built-in check.

At startup, shellcheck scans $XDG_DATA_HOME/shellcheck/plugins/ for *.so files, dlopens each one, calls plugin_api_version, then plugin_init, then registers the returned checks alongside the built-ins. They run as part of the same analysis pass. The error reporter has no idea they came from a plugin.

$ shellcheck script.bash
Loaded plugin: libconvention-checks.so (9 check(s))
script.bash:3:1: warning: SC9001: ...

The plugin can use any of the AST helpers shellcheck exports — getLiteralString, the sugared pattern aliases like T_Literal id str, the whole shape-matching kit. From the plugin’s perspective, it’s writing the same code as a built-in check. It just lives in a separate package.

The catch: same compiler, careful linking

The plugin and the host are both Haskell. Haskell linking is not stable across GHC versions, so the plugin and host must be built with the same compiler. The plugin must not link the runtime (the host already has one), and the host must build with -rdynamic so the plugin can see its symbols.

# host: shellcheck
ghc-options: -threaded -rdynamic

# plugin: convention-checks
ghc-options: -shared -fPIC -dynamic
ld-options:  -Wl,--unresolved-symbols=ignore-all

The ignore-all says the plugin’s references to host symbols don’t have to resolve at link time — they’ll resolve at dlopen time, when the host is loaded in the same process.

For nix users this is straightforward — both packages pin the same GHC and the lockfile keeps them in sync. For everyone else: build the host and the plugin from the same machine on the same day.

The wrinkle: shellcheck’s parser drops comments

I was building a docstring-shape check — flag a function whose first body statement isn’t a # description comment. Standard convention check. Trivial to write.

Except shellcheck’s parser drops comments. The lexer matches them, the parser discards them, and the AST has no T_Comment node. Comments simply do not exist downstream of parsing.

This is fine for shellcheck’s purposes — comments don’t affect shell behavior, so a static analyzer that produces warnings about behavior can ignore them. It’s not fine for a plugin author writing a docstring check.

The fix is a splice: keep comments around, attach them to their nearest following AST node, and expose them through an accessor for plugin authors.

The splice

Three pieces:

A new AST node, T_Comment id text, with all the standard Token machinery (positions, IDs).
A post-parse pass — attachComments — that walks the comment list and the AST in parallel and slips T_Comment nodes into the body lists they belong to.
An accessor — getDocCommentsBefore :: Token -> [Token] — that returns the comments immediately preceding a given token, with no blank line separating them from the token.

The splice is post-parse rather than mid-parse because the parser is Parsec-based and rewiring the existing rules to thread comments around would touch hundreds of productions. A post-pass that walks the AST once is cheap and isolated.

Two bugs in the splice

The first version of the splice passed all unit tests but produced reordered output for any function with more than one statement.

-- buggy: collisions combine new-on-left
Map.fromListWith (++) [(parent, [a]), (parent, [b])]
-- result: parent → [b, a]

fromListWith f applies f new old on key collision, so (++) runs as [b] ++ [a] = [b, a]. Two siblings inserted in order ended up reversed in the output.

-- fix: flip the combine so old-on-left
Map.fromListWith (flip (++))

Order preserved.

The second bug was sneakier. The splice descended through the AST looking for nodes whose source range contained a comment, and stopped when it found a containing node. But some node types report a point range (start == end) for nodes whose children span a larger region — T_Redirecting is one. The check posInRange pos node returned false at the point-range node, so descent stopped, and the comment never reached its real target.

The fix was to remove the range filter entirely. Descend unconditionally, attach the comment at the deepest matching child, and let the absence of a matching child be the stop condition.

Both bugs survived the unit tests I wrote first. They surfaced when I ran the splice against real fixtures — a function body with three statements and a comment before the second one. The first time I saw the comment land before the wrong sibling, I knew the data structure was wrong. The second time I saw a comment disappear entirely, I knew the descent was wrong.

It took me longer to root-cause than to fix. That’s the usual ratio for problems in code you wrote yesterday.

Where this leaves the fork

ShellCheck-the-fork now has:

A pluginApiVersion constant the host and plugin agree on (currently 2; bumped from 1 when getDocCommentsBefore was added).
Dynamic loading from $XDG_DATA_HOME/shellcheck/plugins/.
Docs at docs/use-cases.md, docs/design.md, and docs/plugins.md covering the three personas: plugin author, plugin user, fork maintainer.
A worked example plugin in a separate repo — binaryphile/shellcheck-convention-plugin. That plugin is the subject of the next post.

I haven’t pitched any of this upstream. ShellCheck’s value to most users is its curated check set, and a plugin ecosystem fragments that — I’d be asking the maintainers to take on a maintenance surface that benefits a minority of users. The fork is fine. It exists so I can write checks for my conventions without convincing anyone else they’re worth maintaining.

If your conventions look like mine, both repos are on GitHub. If they don’t — write your own plugin. The ABI is two functions.

Cockburn Use Cases Guide

2026-05-10T17:00:00+00:00

A practical reference for writing use cases per Alistair Cockburn’s Writing Effective Use Cases (2001). Template, goal levels, and step-writing guidelines distilled for software teams that want to capture behavior without designing the UI.

Originally authored as a working guide; published here on 2026-05-10 as part of the binaryphile.com compliance-references set.

I keep returning to Cockburn’s framework when a team needs to write down what the system actually does, in a form that survives implementation changes. This is the version I reach for when I’m reviewing requirements drafts.

Template (Fully Dressed)

### UC-N: Active Verb Phrase (Goal)

- **Primary Actor:** Role name (singular, capitalized)
- **Goal:** What the actor wants to achieve
- **Scope:** System under design (the black box)
- **Level:** User goal | Summary | Subfunction
- **Secondary Actors:** External systems the SUD calls upon
- **Trigger:** Event that starts the use case
- **Preconditions:** What must already be true (not tested within the UC)
- **Stakeholders:**
  - Role — what they need from this use case (drives MSS, extensions, guarantees)
- **Main Success Scenario:**
  1. Triggering event / first interaction
  2. Actor does X; System responds Y
  ...
  N. Goal is achieved
- **Extensions:**
  - 3a. Condition detected as fact:
    1. Recovery step
    2. Resume step N / Fail / Separate success
- **Technology & Data Variations:** Sub-variations in how a step may be executed
- **Minimal Guarantee:** Promise to all stakeholders even on failure
- **Success Guarantee:** What must be true on completion

Goal Levels

Level	Test	Size
Summary	“That’s not just one thing” — encompasses multiple user goals	Hours
User Goal	Boss test: “Would your boss accept you did this all day?” EBP test: one person, one place, one time, measurable value	3-9 steps, minutes
Subfunction	Needed to support a user-goal UC; not independently valuable	Seconds

The Three Kinds of Action Steps

Every step must be one of:

Interaction between two actors
Validation protecting a stakeholder’s interest
Internal state change satisfying a stakeholder

Twelve Step-Writing Guidelines

Simple grammar. Subject-verb-object.
Who has the ball. Name the actor explicitly in every step.
Bird’s-eye view. Describe from above, not inside any actor’s head.
Process moves forward. Each step advances toward the goal. No step leaves the scenario unchanged.
Intent, not movements. “Customer provides address” not “Customer clicks field and types.”
Reasonable transaction size. Actor sends request+data, system validates, system updates state, system responds. One step or decomposed — use judgment.
“Validate,” don’t “check whether.” “System validates credentials” moves forward; “System checks whether credentials are valid” requires an if/else branch. Validation failures go in extensions.
Mention timing when it matters. “System responds within 3 seconds.”
“Actor has System A kick System B.” When the primary actor causes inter-system communication.
“Do steps x-y until condition.” For loops.
Condition says what was detected. Extensions state facts, not questions. “Invalid card number:” not “Is the card valid?”
Indent condition handling. Extension handling indented under the condition.

Extension Rules

Keyed to MSS step numbers: 3a, 3b, *a (any step)
State conditions as detected facts, not questions
Each extension ends one of three ways:
1. Rejoins MSS at a specific step
2. Reaches a separate success exit
3. Ends in failure
Brainstorm exhaustively — completeness comes from extensions, not the MSS
Complex extensions can be extracted into sub-use cases

Stakeholder Interests

Ask: “Who cares, and what do they want?”
The system responds to the actor while protecting the interests of all stakeholders
Every interest must be addressed somewhere in the MSS, extensions, or guarantees
This section is the key mechanism for preventing missing requirements
Stakeholder interests drive MSS steps, guarantees, and extensions

Preconditions and Guarantees

Preconditions: Assumed true, not tested. Only state what’s worth telling the reader.
Minimal Guarantee: Fewest promises even on failure (e.g., “audit trail preserved”)
Success Guarantee: What must be true on completion, meeting all stakeholder interests

Quality Tests

Boss Test: Would your boss accept you doing this all day? (user goal level)
EBP Test: One person, one place, one time, measurable value, consistent state?
Size Test: MSS has 3-9 steps. 20+ means decompose.
Purpose-content alignment: Does the goal match what the steps accomplish?

Common Mistakes

Designing the UI — intent, not widgets
Wrong goal level — apply Boss/EBP/Size tests
No primary actor — every UC needs one
Missing stakeholder interests — leads to gaps
CRUD explosion — use “Manage X” and only extract complex operations
Excessive precision — rigor beyond what’s needed wastes time
Goal-content mismatch — stated goal doesn’t match steps

Process

Find system boundary (scope)
Find actors — characterize each (technical skill, constraints, behavior patterns)
Find goals — exhaustive brainstorm per actor; produce actor-goal list table
Write stakeholder interests — the key mechanism for preventing missing requirements
Write preconditions and guarantees (minimal + success)
Write MSS (3-9 steps meeting all interests)
Brainstorm extension conditions exhaustively — completeness comes from here
Write extension handling — each ends in rejoin, separate success, or failure
Extract/merge sub-use cases as needed
Readjust the set

Shostack Threat Modeling Guide

2026-05-10T17:00:00+00:00

A practical guide to threat modeling principles, extracted from Adam Shostack’s Threat Modeling: Designing for Security (2014).

Originally authored as a working guide; published here on 2026-05-10 as part of the binaryphile.com compliance-references set.

Threat modeling replaces reactive security (“whack-a-mole”) with systematic, focused defense. This guide distills Shostack’s comprehensive framework into actionable patterns for software teams.

What this guide covers:

The four-question framework for all threat models
STRIDE mnemonic for systematic threat discovery
Data flow diagrams for visualizing systems
Mitigations mapped to each threat category
Practical worked examples and checklists

What it doesn’t cover:

Extended case studies (Acme-DB)
Full appendices and attack trees
STRIDE variants in detail (STRIDE-per-interaction, DESIST)
Extended privacy framework coverage
Historical context

1. The Goal: Focused Defense Over Whack-a-Mole

Security without structure is firefighting. You patch one vulnerability, another appears. You chase the latest exploit, missing the architectural flaw. Threat modeling breaks this cycle.

“Threat modeling is the key to a focused defense. Without threat models, you can never stop playing whack-a-mole.”

“In short, threat modeling is the use of abstractions to aid in thinking about risks.”

What threat modeling accomplishes:

Outcome	How It Helps
Find bugs early	Design issues found before code is written
Clarify requirements	“Is that really a requirement?” becomes answerable
Better products	Fewer redesigns, predictable schedules
Unique discoveries	Finds issues other tools miss (omissions, novel threats)

“If you think about building a house, decisions you make early will have dramatic effects on security. Wooden walls and lots of ground-level windows expose you to more risks than brick construction. Once you’ve chosen, changes will be expensive.”

Who it’s for: Software developers, architects, operations, security professionals. You don’t need to be a security expert to benefit.

The real value: Threat modeling finds issues other techniques won’t find—errors of omission like forgetting to authenticate a connection. Code analysis tools can’t find these. Your unique design may have unique threats that only systematic analysis will reveal.

2. The Four Questions

Every threat model answers four questions:

┌─────────────────────────────────────────┐
│ 1. What are you building?               │
│    → Draw diagrams, identify components │
├─────────────────────────────────────────┤
│ 2. What can go wrong?                   │
│    → Use STRIDE, attack trees, etc.     │
├─────────────────────────────────────────┤
│ 3. What should you do about it?         │
│    → Mitigate, accept, transfer         │
├─────────────────────────────────────────┤
│ 4. Did you do a decent job?             │
│    → Validate completeness              │
└─────────────────────────────────────────┘

You start and end with familiar tasks: drawing on a whiteboard and managing bugs. Everything in between is structured analysis.

Why these four questions work:

Question 1 (what are you building?) forces shared understanding
Question 2 (what can go wrong?) finds threats systematically
Question 3 (what to do?) produces actionable bugs
Question 4 (did we do a good job?) validates completeness

The framework is recursive: you can apply it to a whole system, a component, a feature, or even a single function.

3. Drawing Your System (Data Flow Diagrams)

“All models are wrong. Some models are useful.”

Data flow diagrams (DFDs) are the foundation. They show:

Element	Symbol	Description
External Entity	Rectangle	People, systems outside your control
Process	Circle/Rounded	Code that transforms data
Data Store	Parallel lines	Databases, files, caches
Data Flow	Arrow	Movement of data
Trust Boundary	Dashed line	Where privilege changes

Trust boundaries are critical—they show where threats concentrate. A trust boundary exists wherever:

Privilege levels change
Different principals interact
Data crosses network/machine/process limits

Trust boundaries and attack surfaces are very similar views of the same thing. An attack surface is a trust boundary plus a direction from which an attacker could launch an attack.

Diagram rules:

Number each process, data flow, and data store
Data can’t move itself—show the process that moves it
If a component has a trust boundary, it’s a candidate for its own diagram
Don’t draw an eye chart—break complex systems into sub-diagrams
The diagram should tell a story and support you telling stories while pointing at it

Updating diagrams (validation questions):

Can we tell a story without changing the diagram?
Can we tell that story without using “sometimes” or “also”?
Can we see exactly where the software makes security decisions?
Does the diagram show all trust boundaries (UIDs, roles, network interfaces)?
Does it reflect current or planned reality?
Can we see where all data goes and who uses it?

4. Where to Start: Three Approaches

What drives your analysis?
  │
  ├─ ASSETS → "What are we protecting?"
  │           Best when: Clear valuable targets
  │           Risk: May miss stepping-stone assets
  │
  ├─ ATTACKERS → "Who's attacking us?"
  │              Best when: Known threat actors
  │              Risk: Attackers not on list still attack
  │
  └─ SOFTWARE → "What are we building?"
                Best when: Development teams
                Risk: May miss operational context

Recommendation: Start with software (what you’re building), use STRIDE to find threats, then validate against known attacker motivations. This combines the benefits of all three.

The Cautionary Tale of Zero-Knowledge Systems

“Zero-Knowledge Systems didn’t have a clear answer to ‘what’s your threat model?’ Because there was no clear answer, there wasn’t consistency in what security features were built.”

Without a clear threat model, the company invested heavily in preventing governments from spying—a fun technical challenge but one that had significant performance impacts. The emotional appeal of fighting government surveillance made it hard to make practical business decisions. Eventually, a clearer threat model let them invest in mitigations that all addressed the same subset of threats.

The lesson: Without answering “what’s your threat model?”, you may build elaborate defenses against unlikely attacks while ignoring common ones.

Standard Answers to “What’s Your Threat Model?”

Answer	Meaning
“A thief who could steal your money”	Financial motivation, external
“Untrusted network”	Assume network traffic can be read/modified
“Malicious insiders”	Employees, contractors with access
“An attacker who could steal your cookie”	Session hijacking, web app threats
“Script kiddie”	Low-skill attacker using automated tools
“Nation-state actor”	High-skill, well-resourced attacker

Having a clear answer focuses your defense investments.

5. STRIDE: The Six Threat Categories

STRIDE is a mnemonic for finding threats. It was developed at Microsoft and has been refined over more than a decade of use. Each letter represents a threat that violates a security property:

Threat	Property Violated	Definition	Typical Victims
Spoofing	Authentication	Pretending to be something/someone else	Processes, external entities, people
Tampering	Integrity	Modifying data (disk, network, memory)	Data stores, data flows, processes
Repudiation	Non-repudiation	Claiming you didn’t do something	Processes
Info Disclosure	Confidentiality	Exposing data to unauthorized parties	Processes, data stores, data flows
Denial of Service	Availability	Absorbing resources needed for service	Processes, data stores, data flows
Elevation of Privilege	Authorization	Doing things you’re not authorized to do	Processes

“STRIDE is a tool to guide you to threats, not to ask you to categorize what you’ve found; it makes a lousy taxonomy, anyway.”

Usage: Walk through each element in your diagram and ask “How could an attacker achieve S? T? R? I? D? E?” Don’t worry about categorization—if you find a threat, record it.

Detailed Threat Examples

Spoofing:

Spoofing a process on the same machine (creating a file before the real process)
Spoofing a file (creating in local directory, changing links)
Spoofing a machine (ARP, IP, DNS spoofing)
Spoofing a person (phishing, account takeover)
Spoofing a role (declaring themselves to be that role)

Tampering:

Tampering with a file (modify files on disk, servers, or remote includes)
Tampering with memory (modify running code or API data by reference)
Tampering with a network (redirect traffic, modify packets, especially wireless)

Repudiation:

Claiming to have not clicked/received/ordered
Claiming to be a fraud victim
Attacking the logs (no logs, filling logs, injecting attacks into logs)

Information Disclosure:

Extracting secrets from error messages
Reading files with inappropriate ACLs
Finding crypto keys on disk or in memory
Reading network traffic (sniffing)
Analyzing traffic metadata (DNS, social network connections)

Denial of Service:

Absorbing memory (RAM or disk)
Absorbing CPU
Using process as an amplifier
Filling data stores
Consuming network resources

Elevation of Privilege:

Sending inputs the code doesn’t handle properly (buffer overflow, injection)
Gaining inappropriate memory access
Bypassing authorization checks
Data/code confusion (treating data as executable code)

Focus on Feasible Threats

“Along the way, you might come up with threats like ‘someone might insert a back door at the chip factory.’ These are real possibilities but not very likely compared to using an exploit to attack a vulnerability for which you haven’t applied the patch.”

Good threat modeling focuses on threats you can actually address. If you can’t do anything about motherboard backdoors, acknowledge them and move on.

6. STRIDE-per-Element

Not all threats apply to all elements. This matrix focuses your analysis:

Element	S	T	R	I	D	E
External Entity	✓		✓
Process	✓	✓	✓	✓	✓	✓
Data Flow		✓		✓	✓
Data Store		✓	?	✓	✓

(? = Logs are data stores involved in addressing repudiation)

Exit criteria: You have at least one threat per checked cell in your diagram.

Customization: This matrix is somewhat Microsoft-specific. Adapt it to your context. For example, if privacy matters, add “Information Disclosure by External Entity.”

STRIDE-per-element weaknesses:

Similar issues crop up repeatedly in a given threat model
The chart may not represent your specific issues

“If you want to be comprehensive, this is helpful; if you want to focus on the most likely issues, it may be a distraction.”

Variants:

STRIDE-per-interaction: Consider (origin, destination, interaction) tuples. Same number of threats but may be easier to understand.
DESIST: Dispute, Elevation, Spoofing, Information disclosure, Service denial, Tampering. Same concepts, different acronym.

7. Attack Trees

Attack trees decompose a goal into sub-goals:

Goal: Steal credentials
├─ [OR] Phish user
│   ├─ [AND] Create fake login page
│   └─ [AND] Send convincing email
├─ [OR] Compromise database
│   ├─ [OR] SQL injection
│   └─ [OR] Stolen backup
└─ [OR] Intercept network traffic
    └─ [AND] Man-in-the-middle attack

OR nodes: Any child achieves the goal AND nodes: All children required

When to use:

Organizing threats found with STRIDE
Deep-diving a specific attack scenario
Communicating threats to stakeholders

Trees can be created per-project or reused across similar systems.

Creating an attack tree:

Decide on a representation (AND or OR tree, most are OR)
Create a root node (the attacker’s goal)
Create subnodes (ways to achieve that goal)
Consider completeness (are there other paths?)
Prune the tree (remove irrelevant branches)
Check the presentation (is it understandable?)

Exit criteria: When you have threats for each leaf node that applies to your system.

8. Attack Libraries (CAPEC, OWASP)

Attack libraries provide pre-built threat catalogs:

Library	Scope	Best For
CAPEC	475+ attack patterns	Comprehensive coverage, training
OWASP Top Ten	Web application risks	Web projects, quick reference

CAPEC trade-off: Comprehensive but time-intensive (40+ hours for full review). Consider category-level review instead of entry-by-entry.

CAPEC exit criteria: At least one issue per categories 1-11:

Data Leakage
Resource Depletion
Injection
Spoofing
Time and State
Abuse of Functionality
Probabilistic Techniques
Exploitation of Authentication
Exploitation of Privilege/Trust
Data Structure Attacks
Resource Manipulation

Categories 12-15 (Network Reconnaissance, Social Engineering, Physical Security, Supply Chain) may be relevant depending on your system.

OWASP Top Ten (2013 example):

Injection
Broken Authentication/Session Management
Cross-Site Scripting
Insecure Direct Object References
Security Misconfiguration
Sensitive Data Exposure
Missing Function-Level Access Control
Cross-Site Request Forgery
Components with Known Vulnerabilities
Unvalidated Redirects and Forwards

“CAPEC is a classification of common attacks, whereas STRIDE is a set of security properties. CAPEC may have more promise than STRIDE for many populations of threat modelers.”

Using OWASP for threat modeling:

The OWASP Top Ten works well as an adjunct to STRIDE for web projects. To turn it into a methodology:

Create a “Top Ten per Element” approach (like STRIDE-per-element)
Look for risks at each point where data crosses a trust boundary

Trade-off: Cross-site scripting and CSRF may be overly specific for threat modeling—better as input to test planning. The Top Ten changes yearly based on volunteer input, so its value varies over time.

When to Use Which

Situation	Approach
New system design	STRIDE (comprehensive, principle-based)
Web application	OWASP Top Ten + STRIDE
Deep-dive on specific attack	Attack trees
Unknown domain	CAPEC categories (structured exploration)
Privacy-sensitive	LINDDUN or Solove taxonomy
Quick review	STRIDE-per-element on key components

9. Privacy Threats (Brief Overview)

Privacy threat modeling is an emergent field. Key frameworks:

LINDDUN (mirror of STRIDE for privacy):

Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness, Non-compliance

Solove’s Taxonomy:

Information collection (surveillance, interrogation)
Information processing (aggregation, identification, secondary use)
Information dissemination (disclosure, breach)
Invasion (intrusion, decisional interference)

Practical approach: Treat privacy as complementary to security threat modeling. Focus on data flows involving personal information.

The nymity slider (Ian Goldberg):

Less Privacy ←────────────────────────────→ More Privacy
Verinymity    Persistent    Linkable    Unlinkable
(Gov't ID,    Pseudonym     Anonymity   Anonymity
Credit Card)  (Pen name)    (Prepaid    (Tor, mixnets)
                            phone)

Key insight: It’s easy to move toward more nymity (more identifying), extremely difficult to move toward less. Design for privacy from the start.

10. From Threats to Bugs

Every threat needs action. Track them as bugs in your existing system. The key question: “Did I do something with each unique threat I found?”

“You really don’t want to drop stuff on the floor. This is ‘turning the crank’ sort of work. It’s rarely glamorous or exciting until you find the thing you overlooked.”

Bug template:

Title: [STRIDE category] [Element] - [Threat description]
Description: [How the attack works]
Mitigation: [Proposed defense]
Priority: [Based on impact and likelihood]

Prioritization approaches:

Method	Complexity	Best For
Simple triage	Low	Most teams
DREAD scoring	Medium	Quantitative comparison
Bug bars	Medium	Consistent thresholds
Risk matrices	High	Compliance requirements

Shostack recommends simple approaches. Elaborate risk scoring often provides false precision.

Validation checklist:

Have we written down or filed a bug for each threat?
Is there a proposed/planned/implemented way to address each threat?
Do we have a test case per threat?
Has the software passed the test?

11. The Three Responses

How do you respond to a threat?
  │
  ├─ MITIGATE → Make attack harder
  │             Your go-to approach
  │             Example: Add authentication
  │
  ├─ ACCEPT → Acknowledge the risk
  │           When: Low probability OR low impact
  │           Warning: Can't accept on behalf of users
  │
  └─ TRANSFER → Let someone else handle it
                To: OS, framework, customer, insurer
                Warning: Transferred risk still exists

Anti-pattern: IGNORE

“A traditional approach to risk in information security is to ignore it… This approach is becoming less effective as contracts, lawsuits, and laws increase the risk of ignoring risks.”

Decision guidance:

If there’s an easy fix, just fix it (skip strategizing)
Mitigation is generally easiest and best for customers
Document accepted risks explicitly

The “ignoring risks” trap:

“A traditional approach to risk in information security is to ignore it… This approach is becoming less effective as contracts, lawsuits, and laws increase the risk of ignoring risks.”

If you create a list of security problems you decide not to address, be aware:

Breach disclosure laws may require action
Whistleblowers may expose the list
Legal discovery in lawsuits may reveal it
Regulatory requirements continue to increase

“If you are threat modeling and create a list of security problems that you decide not to address, please send a copy of the list to the author, care of the publisher. There will be quarterly auctions to sell them to plaintiff’s attorneys.”

12. Mitigations Mapped to STRIDE

Threat	Mitigation Strategy	Techniques
Spoofing	Authentication	Passwords, tokens, biometrics, digital signatures, HTTPS/SSL
Tampering	Integrity protection	ACLs, digital signatures, MACs, HTTPS/SSL
Repudiation	Logging/Auditing	Comprehensive logs, protected log storage, log over TCP/SSL
Info Disclosure	Confidentiality	Encryption (SSL, IPsec), ACLs, careful API design
Denial of Service	Availability	Elastic resources, rate limiting, quotas
Elevation	Authorization	Type-safe languages, sandboxing, input validation, prepared statements

Detailed Mitigation Techniques

Addressing Spoofing:

Spoofing a person → Unique usernames + authentication (passwords, tokens, biometrics)
Spoofing a file → Use full paths (not ./file), check ACLs after opening
Spoofing a network address → DNSSEC, SSL, IPsec
Spoofing a program → Leverage OS application identifiers

Addressing Tampering:

Tampering with a file → ACLs, digital signatures, keyed MACs
Racing to create a file → Protected directories, private directory structures
Tampering with network packets → HTTPS/SSL, IPsec
Anti-pattern: Network isolation doesn’t work long-term
- “The isolated United States SIPRNet was thoroughly infested with malware, and the operation to clean it up took 14 months.”

Addressing Repudiation:

No logs → Log all security-relevant information
Logs under attack → Send over network (TCP/SSL, not UDP), use ACLs
Logs as attack channel → Tightly specify log format early in development

Addressing Information Disclosure:

Network monitoring → Encryption (HTTPS/SSL, IPsec)
Sensitive filenames → Create innocuous parent directory with ACLs
File contents → ACLs or file/disk encryption
APIs revealing info → Be selective about what you return

Addressing Denial of Service:

Network flooding → Elastic resources, ensure attacker effort ≥ yours, network ACLs
Program resources → Careful design, proof of work, require work before expensive operations
System resources → Use OS quotas and limits

Addressing Elevation of Privilege:

Data/code confusion → Prepared statements, clear separators, late validation
Memory corruption → Type-safe languages, ASLR, sandboxes (AppArmor, AppContainer)
Command injection → Validate input size and form; don’t sanitize—log and discard weird input

Key principles:

“Validate, don’t sanitize. Know what you expect to see, how much you expect to see, and validate that that’s what you’re receiving. If you get something else, throw it away.”

“Trust the operating system. The OS provides security features so you can focus on your unique value proposition.”

13. ⚠️ Taking It Too Far

Over-modeling

Threat modeling every component of a well-understood framework wastes effort. Focus on your unique code and architecture, not commodity components.

Paralysis by Analysis

Don’t wait for the “complete” threat model. Start with what you know, iterate as you learn. An 80% threat model today beats a 100% model never delivered.

Category Obsession

“If you’ve already come up with the attack, why bother putting it in a category? The goal of STRIDE is to help you find attacks. Categorizing them might help you figure out the right defenses, or it may be a waste of effort.”

If you find yourself debating whether “unauthorized database access” is spoofing or information disclosure, stop. Record the threat and move on. STRIDE is a finding tool, not a taxonomy.

Security That Creates Insecurity

Shostack dedicates an entire chapter (Chapter 15) to human factors because cumbersome security creates its own vulnerabilities.

“People are not, as is often claimed, the weakest link, or beyond help. The weakest link is almost always a vulnerability in Internet-facing code.”

The compliance budget: Angela Sasse’s research found that workers allocate a limited “budget” to security tasks. They spend time and energy until exhausted, then move on. Exceed the budget, and compliance drops.

“People do listen. They don’t act on security advice because it’s often bizarre, time consuming, and sometimes followed by, ‘Of course, you’ll still be at risk.’ You need to craft advice that works for the people who are listening to you.”

Warning fatigue:

“Given a choice between ignoring a warning that they’ve clicked through a thousand times before without apparent ill effects and without being entertained, people will bypass a warning every time.”

The fix: Minimize what you ask of people. They should only be involved when they have information the system can’t determine (e.g., “Is this a home or coffee shop network?”).

“You can also transfer risk to customers, for example, by asking them to click through lots of hard-to-understand dialogs before they can do the work they need to do. That’s obviously not a great solution.”

Ignoring Easy Fixes

“When there is an easy way to address a problem, you should skip strategizing and just address it.”

“The diagram is intended to help ensure that you understand and can discuss the system. Don’t ask ‘Is this the right way to do it?’ Ask ‘Does this help me think about what might go wrong?’”

Letting Perfect Be the Enemy of Good

Start practicing now. You’re not going to get good at threat modeling by reading—you have to do it.

“You’re not going to get to Carnegie Hall if you don’t practice, practice, practice.”

Pick a system you’re working on and threat model it:

Draw a diagram
Use STRIDE to find threats
Address each threat in some way
Check your work with checklists
Celebrate and share your work

What to threat model next:

What you’re working on now (if it has trust boundaries)
Something not too simple (trivial systems won’t be satisfying)
Something not too complex (don’t chew off more than you can handle)
Something you can collaborate on with trusted colleagues

Starting small: If you’re working on a large team or across organizational boundaries, start with a component you own. Build your skills before tackling complex cross-team systems.

Context: Web application login endpoint

Step 1: Draw the diagram

[Browser] --(credentials)--> [Login Process] --(query)--> [User DB]
                                    |
                                    v
                             [Session Store]

Trust Boundary: -------- Internet --------

Step 2: Apply STRIDE to Login Process

Threat	Question	Finding
S	Can someone pretend to be a legitimate user?	Yes—stolen credentials, session hijacking
T	Can data be modified?	Yes—MITM attack on credentials
R	Can user deny actions?	Yes—if no session logging
I	Can credentials leak?	Yes—error messages, timing attacks
D	Can login be blocked?	Yes—flood attacks, account lockout abuse
E	Can attacker gain admin?	Yes—SQL injection in query

Step 3: Prioritize and mitigate

Threat	Priority	Mitigation
Credential theft	High	HTTPS, MFA, session timeouts
SQL injection	High	Prepared statements
Session hijacking	High	Secure cookies, session binding
Account lockout abuse	Medium	Captcha, IP rate limiting
Credential timing	Low	Constant-time comparison

Step 4: Validate

Did we address every STRIDE threat for every element?
Do we have tests for each mitigation?
Is anything still concerning?

Why this worked:

The diagram made the system concrete and discussable
STRIDE provided systematic coverage (no guessing what to look for)
Each threat got a specific mitigation (not “improve security generally”)
Tests will verify mitigations work

What could go wrong with this threat model:

Missing trust boundaries (are there admin roles we didn’t show?)
Missing data flows (are there logs, metrics, or debugging interfaces?)
Assumptions about network security (is HTTPS really used everywhere?)

15. Quick Reference

The Four Questions

What are you building?
What can go wrong?
What should you do about it?
Did you do a decent job?

STRIDE Threats

Letter	Threat	Property	Defense
S	Spoofing	Authentication	Auth tokens, signatures
T	Tampering	Integrity	MACs, ACLs
R	Repudiation	Non-repudiation	Logging
I	Info Disclosure	Confidentiality	Encryption, ACLs
D	Denial of Service	Availability	Rate limits, quotas
E	Elevation	Authorization	Sandboxing, validation

STRIDE-per-Element Quick Check

Element	Check For
External Entity	S, R
Process	All (S, T, R, I, D, E)
Data Flow	T, I, D
Data Store	T, I, D (R for logs)

Threat Response Checklist

Can we eliminate the feature?
Can we mitigate with standard patterns?
Is the risk acceptable? (Document why)
Can we transfer to a trusted component?
Is our mitigation testable?

DFD Validation

Validation Checklist

Diagram tells a story without “sometimes” or “also”
All trust boundaries, data flows, and stores visible
STRIDE checked for each element
Bug filed for each threat
Test case per threat

16. Connection to Go Development Guide

Shostack (Threat Modeling)	Go Development Guide
Tampering with memory	Value semantics prevent unexpected mutation
Data/code confusion (EoP)	Type safety, prepared statements
Input validation	“Validate, don’t sanitize”
Trust the OS	Use Go’s standard library security features
Information disclosure	Careful API design, minimal return values
Denial of service	Bounded resources, context timeouts

Shared insight: Both emphasize leveraging existing, trusted infrastructure rather than custom solutions.

Why trust the OS:

The OS provides security features so you can focus on your unique value proposition
The OS runs with privileges not available to your program or attacker
If the attacker controls the OS, you’re in a world of hurt anyway

STRIDE maps directly to defensive coding:

S → Authentication handled by OS/framework, not custom code
T → Integrity through immutability (value semantics)
I → Confidentiality through minimal exposure (return only needed data)
E → Authorization through type safety and sandboxing

Example: Context timeouts and DoS:

Go’s context.Context with deadlines directly addresses denial-of-service threats:

// Without timeout: vulnerable to slow clients
func handleRequest(r *Request) {
    result := expensiveOperation(r.Data)
    // ...
}

// With timeout: bounded resource consumption
func handleRequest(ctx context.Context, r *Request) error {
    ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    result, err := expensiveOperationWithContext(ctx, r.Data)
    if err != nil {
        return err // context deadline exceeded = DoS mitigated
    }
    // ...
}

17. Glossary

Term	Definition
Attack surface	Trust boundary + direction of potential attack
Attack tree	Hierarchical decomposition of attack goals
DFD	Data Flow Diagram—visual model showing data movement
STRIDE	Spoofing, Tampering, Repudiation, Info Disclosure, DoS, Elevation
Trust boundary	Where more than one principal interacts
Principal	Entity that can take action (user, process, system)
Mitigation	Action that makes an attack harder
Threat	Potential violation of a security property
Vulnerability	Specific weakness that enables a threat
CAPEC	Common Attack Pattern Enumeration and Classification
LINDDUN	Privacy threat framework (STRIDE mirror for privacy)
Elevation of Privilege	Both a STRIDE threat and a card game for threat modeling

18. Key Quotes

“Threat modeling is the key to a focused defense. Without threat models, you can never stop playing whack-a-mole.”

“In short, threat modeling is the use of abstractions to aid in thinking about risks.”

“Your instincts are insufficient, and you’d need tools to help tackle the questions.”

“If you think about building a house, decisions you make early will have dramatic effects on security.”

“STRIDE is a tool to guide you to threats, not to ask you to categorize what you’ve found.”

“Validate, don’t sanitize. Know what you expect to see… If you get something else, throw it away.”

“Trust the operating system. The OS provides security features so you can focus on your unique value proposition.”

“When there is an easy way to address a problem, you should skip strategizing and just address it.”

“Any technical professional can learn to threat model. Threat modeling involves the intersection of two models: a model of what can go wrong (threats), applied to a model of the software you’re building.”

“With a whiteboard diagram and a copy of Elevation of Privilege, developers can threat model software that they’re building, systems administrators can threat model software they’re deploying, and security professionals can introduce threat modeling to those with skillsets outside of security.”

“The question ‘what’s your threat model?’ is a great one because in just four words, it can slice through many conundrums to determine what you are worried about.”

It’s Been Eight Years Since NIST Said to Stop Rotating Passwords

2026-04-07T00:00:00+00:00

In June 2017, NIST published SP 800-63B Rev 3 and told the world to stop requiring periodic password changes. Eight years later, most organizations still do it. In August 2025, NIST published Rev 4 and upgraded that guidance from “you should stop” to “you must stop.”

This is the story of what changed, what it means for systems you build, and what the actual requirements look like when you play them out as scenarios.

The old world

Before 2017, password policy was a checklist everyone knew by heart:

Change your password every 90 days
Must contain uppercase, lowercase, digit, and special character
Minimum 8 characters
Can’t reuse any of your last 12 passwords

Security teams enforced it. Auditors checked for it. Users hated it. And it made passwords worse, not better.

Why it made passwords worse

Every one of those rules has a specific failure mode. Here’s what actually happens when you enforce them.

Forced rotation breeds predictable mutations

A company requires 90-day password changes. Sarah, an account manager, has been through this twelve times. Her current password is Summer2024!. In October, the system forces a change. She types Fall2024!. In January, Winter2025!.

An attacker obtains Summer2024! from a breach. They don’t try it directly — they try the obvious seasonal mutations. Fall2024!, Winter2024!, Summer2025!. They’re in within a handful of guesses.

But the damage starts before the breach. Sarah chose Summer2024! in the first place because she knew it would expire. Why invest in memorizing something strong when it’s gone in 90 days? Rotation discourages the upfront investment in password quality that NIST is now explicitly trying to protect.

There’s a subtler cost too. Each rotation produces a “retired” password the subscriber considers spent. At scale, retired passwords get recycled on personal accounts, shared with colleagues, or written on sticky notes that outlive the rotation window. This sounds like an edge case — and for any one user it is. But this is security, where edge cases become certainties across ten thousand accounts. Every rotation cycle produces a fresh crop of unmanaged credentials floating in the wild. That exposure exists solely because of the rotation policy.

NIST’s response: SHALL NOT require periodic password changes. Change only on evidence of compromise.

(NIST uses RFC 2119 requirement keywords: SHALL, SHALL NOT, SHOULD, SHOULD NOT, MAY. Uppercase indicates a formal requirement level, not emphasis.)

Composition rules produce a monoculture

A site requires uppercase, lowercase, digit, and special character. The minimum is 8 characters. What does the average user type?

Password1!

Or Welcome1!. Or Company1!. Composition rules don’t increase entropy — the randomness that makes a password hard to guess — they constrain the search space into a predictable shape. Attackers know the shape. They try [Word][Digit][Special] patterns first.

NIST’s response: SHALL NOT impose composition rules.

Short minimums invite brute force

An 8-character password using the full ASCII printable set has about 52 bits of entropy. That sounds like a lot until you consider that a modern GPU cluster can test billions of password guesses per second against a stolen password database. 8 characters falls in hours.

NIST’s response: SHALL require minimum 15 characters for single-factor authentication. 8 characters only if a second factor is also required.

Blocking paste punishes the right behavior

A site disables paste in the password field “for security.” The subscriber who was about to paste a 40-character random string from their password manager now has to type something they can remember. The security outcome gets worse, not better.

NIST’s response: SHALL allow password managers and autofill. SHOULD permit paste.

No blocklist means the attacker’s job is easy

A subscriber picks 123456 or password or qwerty. The system accepts it because it meets the 8-character minimum (well, password does) and the composition rules (it doesn’t, but many systems don’t actually enforce them consistently).

Meanwhile, an attacker with a collection of 500 million passwords leaked from previous breaches tries the top 10,000. Most systems have at least a few accounts using them.

NIST’s response: SHALL compare prospective passwords against a blocklist of breached passwords, dictionary words, sequential characters, and context-specific terms.

Rev 3 vs Rev 4: from recommendation to mandate

Rev 3 (June 2017) said “SHOULD NOT” — recommended unless you have a documented reason. Rev 4 (August 2025) says “SHALL NOT” — prohibited, no exceptions.

Requirement	Rev 3 (2017)	Rev 4 (2025)
Periodic rotation	SHOULD NOT	SHALL NOT
Composition rules	SHOULD NOT	SHALL NOT
Minimum length (single-factor)	8 characters	15 characters
Password managers	SHOULD permit paste	SHALL allow managers + autofill
Blocklist checking	SHALL	SHALL
Strength guidance	SHOULD offer	SHALL offer

The progression: “stop doing harmful things” became “you must stop doing harmful things.”

What the requirements look like as scenarios

I turned the Rev 4 guidance into use cases to see what a team actually needs to build. Not a checklist of SHALLs — a set of scenarios showing what happens when things go right and wrong, driven by how real subscribers and real attackers behave.

NIST defines three Authentication Assurance Levels. AAL1 is password-only. AAL2 requires two factors — a password plus something like a time-based one-time-password (TOTP) app or a hardware security key. AAL3 requires two factors where one is a hardware cryptographic device that resists phishing.

Setting a password

The happy path: A subscriber opens the password field and pastes a 64-character random string from their password manager. The system accepts it, hashes it, stores the hash. Done.

The attacker’s path: A different subscriber types Company2025! — a predictable pattern that satisfies every legacy composition rule. The system checks it against a blocklist of breached passwords. Found. Rejected. The system explains why and suggests trying a passphrase. The subscriber tries correct horse battery staple (16 characters, no special characters, no uppercase). The system accepts it — length and unpredictability matter more than character variety.

The edge case: A subscriber tries to set a 6-character password. Rejected — below the 15-character minimum for single-factor, or 8-character minimum with MFA. They try aaaaaaaaaaaaaaa — 15 characters but sequential. Rejected. They try their username with digits appended. Rejected — context-specific.

The infrastructure failure: The blocklist service is down. The system cannot verify the password against breached corpuses. Rather than accept a potentially compromised password (fail-open), the system refuses the change and asks the subscriber to try again later.

Authentication

The happy path: Subscriber submits username and password. The system runs the submitted password through the same one-way hashing process used when the password was stored, and compares the results. Match. Session created.

The attacker’s path — credential stuffing: An attacker has a list of username/password pairs from a breach at another service. They try each one. After 100 consecutive failures on a single account, the system requires additional verification — a CAPTCHA, a temporary lockout with recovery, or escalating delays. The account is never permanently locked, because permanent lockout is a denial-of-service weapon the attacker can use against legitimate users.

The attacker’s path — user enumeration: The attacker tries a username that doesn’t exist. The system performs a dummy hash computation so the response time is identical to a real account. The error message is generic — “invalid username or password.” The attacker learns nothing about whether the account exists.

The MFA path: Account is AAL2. Password validates. The system prompts for a second factor. The subscriber provides a TOTP code from their authenticator app. Valid. Session created. If the subscriber’s device is lost, they use a recovery code or alternative factor — the system doesn’t fall back to password-only.

Sessions

The happy path: After authentication, the system generates a session token — a random identifier that proves “this browser is logged in” — with enough randomness to be unguessable. It’s delivered over an encrypted connection, never embedded in URLs. The subscriber works. When done, they log out. The system invalidates the session server-side — not just deleting the cookie.

The absent subscriber: The subscriber walks away. After 30 minutes of inactivity, the session expires. After 12 hours regardless of activity, the session expires. Both timeouts are adjustable by assurance level — higher-risk systems use shorter windows.

The attacker’s path — session hijacking: An attacker obtains a session token (perhaps through a compromised network or XSS vulnerability). They replay it from a different IP and user-agent. The system flags the anomaly and may invalidate the session or require reauthentication.

Compromise response

The detection path: A breach monitoring service flags a subscriber’s password as appearing in a newly published breach corpus. The system marks the account for mandatory password change.

The subscriber’s path: Next login, the subscriber authenticates (the compromised password works this one last time), then is forced to choose a new password before getting a session. They cannot reuse the compromised password. The system does not just suggest a change — it requires one.

The absent subscriber: The subscriber doesn’t log in for weeks. The account stays flagged. Whenever they return, the forced change applies. The system doesn’t age out the flag.

The worst case: The attacker already used the compromised password to change it. The subscriber can’t log in. Account recovery kicks in — and recovery must not bypass the account’s assurance level. An AAL2 account requires two-factor recovery, not just an email link.

Why rotation doesn’t appear here

Notice what’s absent from every scenario: periodic expiration. No 90-day timer. No “your password is about to expire” banner. The only forced change is on evidence of compromise — a specific, concrete signal that the current password is no longer secret.

Rotation is absent because it makes every other scenario worse. It makes subscribers choose weaker passwords. It makes their passwords more predictable. It trains them to make minimal changes. And it provides zero protection against the actual threat — an attacker who already has the password.

What’s still missing from most organizations

Eight years after Rev 3, here’s what I still see:

90-day rotation policies
Composition rules (uppercase + digit + special)
Paste disabled in password fields
8-character minimums with no blocklist checking
“Security questions” as account recovery

Every one of these is now explicitly prohibited or deprecated by the current NIST standard. Not “not recommended.” Prohibited.

If your organization follows NIST — and if you’re a federal agency or contractor, you must — Rev 4 leaves no room for interpretation. If you don’t follow NIST but use it as a reference, Rev 4 is still the strongest signal available that these practices are counterproductive.

The standard is free and online. The password verifier section is the part that matters most. Read it. Then go check what your systems actually enforce.

References

NIST SP 800-63B Rev 4 (August 2025) — the current standard
NIST SP 800-63B Rev 3 (June 2017) — the paradigm shift
Password Verifiers section — the specific requirements

Appendix: formal use cases

The scenarios above, formalized as Cockburn-style use cases. These are designed to be cut and pasted as a standalone requirements document. Each NIST requirement appears as the scenario that motivated it — an attacker exploiting a weakness, a subscriber hitting a wall, or a system failing to protect its users.

Derived from NIST SP 800-63B Rev 4 (August 2025).

System Scope

System: Verifier — the authentication subsystem that validates subscriber credentials, manages sessions, and enforces credential policy.

Actors

Subscriber: End user who authenticates. May memorize passwords or use a password manager.

Verifier: The system under design. Validates credentials, manages sessions.

Attacker: Adversary with breach corpuses, password lists, and knowledge of common user behavior. Methods: credential stuffing, brute force, mutation guessing, phishing, session hijacking, social engineering of recovery flows.

UC-1: Set an Appropriate Secret

Primary Actor: Subscriber
Goal: Set a password the subscriber can use to authenticate
Scope: Verifier
Level: User goal
Trigger: Subscriber creates an account or changes their password
Preconditions: Identity proofed (enrollment) or authenticated session (change)
Stakeholders:
- Subscriber — wants a password they can use to get in
- Verifier — wants a password that resists guessing even if the hash database is stolen
- Attacker — wants subscribers to choose predictable passwords or reuse breached ones
Main Success Scenario:
1. Subscriber enters a password
2. Verifier validates the password length (15+ for single-factor, 8+ with MFA)
3. Verifier validates the password against the blocklist (UC-2)
4. Verifier hashes and stores the password (UC-3)
5. Verifier confirms the password is set
Extensions:
- 1a. Subscriber pastes from a password manager: Verifier accepts paste and autofill. The password is random and non-memorizable — the manager stores it. Continue step 2.
- 2a. Password is too short: Verifier rejects and provides guidance. Resume step 1.
- 2b. Verifier imposes composition rules (uppercase, digit, special): This forces predictable patterns — Password1!, Company2025!. Attacker exploits the pattern with mutation lists. Composition rules are prohibited. Verifier accepts any character mix.
- 3a. Password found in a breach corpus: Attacker already has this password. Verifier rejects and explains why. Resume step 1.
- 3b. Password is a dictionary word, sequential, or contains the username: Attacker tries these first. Verifier rejects. Resume step 1.
- 3c. Blocklist service unavailable: Accepting the password would leave the account vulnerable to credential stuffing. Verifier refuses the change and asks subscriber to retry later. Fail.
- 4a. Storage fails: No password stored. Resume step 1.
- a. *System requires periodic rotation (90-day policy): Subscriber mutates Summer2024! to Fall2024!. Attacker who has the old password guesses the new one in a handful of tries. Forced rotation is prohibited — change only on evidence of compromise.
Technology & Data Variations:
- Password manager: subscriber generates a random, non-memorizable password. The secret is persisted, not memorized. Failure mode is lost manager, not forgotten password.
- Unicode normalization: NFKC or NFKD before hashing
Minimal Guarantee: No password is stored unless it passes all validation.
Success Guarantee: Password is stored as a salted hash; subscriber can authenticate with it.

UC-2: Validate Password Against Blocklist

Primary Actor: Verifier (automated)
Goal: Reject passwords an attacker already knows
Scope: Verifier
Level: Subfunction (called by UC-1)
Trigger: Subscriber submits a new password
Preconditions: Blocklist sources loaded
Stakeholders:
- Subscriber — wants clear feedback if rejected
- Attacker — has breach corpuses with hundreds of millions of passwords; tries the top candidates first
Main Success Scenario:
1. Verifier normalizes the password for comparison
2. Verifier checks against breach corpuses, dictionary words, sequential/repetitive strings, and context-specific terms (service name, username)
3. Password not found; verifier accepts it
Extensions:
- 2a. Password found in breach corpus: This password is in the attacker’s list. Verifier rejects and explains why. UC-1 resumes at step 1.
- 2b. Password is a common dictionary word: Attacker tries dictionary words early. Verifier rejects. UC-1 resumes at step 1.
- 2c. Password is sequential or repetitive (123456, aaaaaa): Trivially guessable. Verifier rejects. UC-1 resumes at step 1.
- 2d. Password contains the username or service name: Attacker targets context-specific passwords. Verifier rejects. UC-1 resumes at step 1.
- 2e. Blocklist service unavailable, no cache: Verifier cannot ensure the password isn’t compromised. Rejects and asks subscriber to retry. Fail.
Minimal Guarantee: No password an attacker already has is accepted.
Success Guarantee: Only passwords absent from all blocklist sources proceed to storage.

UC-3: Store a Password

Primary Actor: Verifier (automated)
Goal: Store the password so it resists offline cracking if the database is stolen
Scope: Verifier
Level: Subfunction (called by UC-1)
Trigger: Password passed validation
Preconditions: Password in memory, not yet persisted
Stakeholders:
- Subscriber — wants their credential safe even if the database is breached
- Attacker — has stolen the hash database and will attempt offline cracking with GPU clusters
Main Success Scenario:
1. Verifier generates a random salt
2. Verifier hashes the password using an approved hashing scheme with a high cost factor
3. Verifier stores the hash and salt
Extensions:
- 2a. Attacker steals the hash database: With a weak hash (MD5, SHA-1, fast PBKDF2), the attacker cracks most passwords in hours. With a memory-hard scheme and high cost factor, each guess is expensive. The cost factor should be as high as practical without degrading login performance.
- 2b. Pepper available: Verifier applies an additional keyed hash with a secret stored separately. Even if the database is stolen, the attacker also needs the pepper. Continue step 3.
- 3a. Database write fails: Password not stored. Subscriber informed. UC-1 may retry.
Technology & Data Variations:
- Approved hashing schemes per NIST SP 800-132
- Salt: at least 32 bits from approved random source
- Pepper: optional, stored in HSM or separate key store
Minimal Guarantee: Plaintext password is never persisted.
Success Guarantee: Password stored as salted hash that resists offline cracking.

UC-4: Authenticate with Password

Primary Actor: Subscriber
Goal: Prove identity to the verifier
Scope: Verifier
Level: User goal
Trigger: Subscriber initiates login
Preconditions: Subscriber has a registered password; connection is encrypted
Stakeholders:
- Subscriber — wants to log in quickly
- Verifier — wants to confirm identity without leaking information to attackers
- Attacker — has breached credential lists; wants to stuff, guess, or enumerate
Main Success Scenario:
1. Subscriber submits username and password
2. Verifier retrieves stored hash and salt
3. Verifier validates the submitted password against the stored hash
4. Verifier establishes an authenticated session (UC-7)
Extensions:
- 2a. Account does not exist: Attacker is enumerating usernames. Verifier performs a dummy hash computation so response time is identical to a real account. Returns generic error. UC-5 applies. Resume step 1.
- 3a. Password does not match: Generic error — does not reveal whether the username or password was wrong. UC-5 rate limiting applies. Resume step 1.
- 3b. Account requires MFA (AAL2+): Password alone isn’t enough. Verifier prompts for second factor (UC-6). Session created after UC-6 succeeds.
- 3c. Account is temporarily locked (UC-5): Attacker triggered the lockout with repeated guesses. Verifier informs subscriber of recovery options. Fail.
- 3d. Attacker uses credential stuffing (username/password pairs from another breach): Rate limiting (UC-5) caps attempts per account. Attacker cannot scale beyond the threshold without triggering lockout or CAPTCHA.
Minimal Guarantee: Failed attempts are logged and rate-limited. No information leaked about account existence or which factor failed.
Success Guarantee: Subscriber is authenticated; session established at the required AAL.

UC-5: Rate-Limit Authentication Attempts

Primary Actor: Verifier (automated)
Goal: Make online guessing impractical without permanently locking out legitimate subscribers
Scope: Verifier
Level: Subfunction (called by UC-4)
Trigger: Failed authentication attempt
Preconditions: Per-account failure counter maintained
Stakeholders:
- Subscriber — does not want to be permanently locked out of their own account
- Attacker — wants unlimited guessing attempts; also wants to weaponize lockout as denial-of-service
Main Success Scenario:
1. Verifier increments the per-account failure counter
2. Verifier evaluates the counter against the threshold and allows the attempt
3. Subscriber eventually authenticates; counter resets
Extensions:
- 2a. Threshold reached (100 consecutive failures): Verifier applies throttling — escalating delays, CAPTCHA, or temporary lockout. Resume step 2 after throttle clears.
- 2b. Attacker uses lockout as denial-of-service: Permanent lockout would let the attacker lock out any account by failing 100 times. Account is never permanently locked. Recovery mechanism always available.
Minimal Guarantee: Account is never permanently locked.
Success Guarantee: Online guessing is impractical within the rate limits.

UC-6: Authenticate with Second Factor

Primary Actor: Subscriber
Goal: Provide a second authentication factor for AAL2+ access
Scope: Verifier
Level: User goal
Trigger: Verifier requires MFA after password verification
Preconditions: First factor verified; second factor registered
Stakeholders:
- Subscriber — wants convenient but secure second factor
- Attacker — wants to bypass the second factor via phishing, SIM swap, or device theft
Main Success Scenario:
1. Verifier prompts for second factor
2. Subscriber provides a cryptographic assertion, OTP code, or push approval
3. Verifier validates the second factor
4. Verifier confirms authentication intent — subscriber consciously approved
5. Authentication succeeds; session established (UC-7)
Extensions:
- 2a. Subscriber’s device is lost or broken: Subscriber uses an alternative registered factor or initiates recovery (UC-9). Fail for this UC.
- 3a. OTP code reused (replay): Attacker intercepted a valid code and replays it. Each code is single-use. Verifier rejects. Resume step 1.
- 3b. Attacker phishes the second factor: At AAL2, phishing may succeed with OTP codes. At AAL3, hardware cryptographic authenticators with verifier impersonation resistance make phishing structurally impossible.
- 3c. Attacker SIM-swaps to intercept SMS OTP: SMS OTP is permitted at AAL2 but restricted — should not be the sole option where alternatives exist. Prohibited at AAL3.
- 4a. No authentication intent: Subscriber must consciously approve, not just possess the device. Verifier rejects without intent. Resume step 1.
Technology & Data Variations:
- AAL2: password + any second factor (TOTP, hardware key, push)
- AAL3: password + hardware cryptographic authenticator providing verifier impersonation resistance
- SMS OTP: permitted at AAL2 (restricted), prohibited at AAL3
Minimal Guarantee: Authentication does not succeed without a valid second factor at AAL2+.
Success Guarantee: Two distinct factors verified; authentication intent confirmed.

UC-7: Use an Authenticated Session

Primary Actor: Subscriber
Goal: Maintain authenticated access for the duration of a work session
Scope: Verifier
Level: User goal
Trigger: Successful authentication
Preconditions: Authentication completed at the required AAL
Stakeholders:
- Subscriber — wants persistent access; wants to log out when done
- Attacker — wants to steal, replay, or fixate session tokens
Main Success Scenario:
1. Verifier generates a session token with enough randomness to be unguessable
2. Verifier delivers the token over an encrypted connection
3. Subscriber makes authenticated requests
4. Subscriber logs out
5. Verifier invalidates the session server-side
Extensions:
- 3a. Subscriber walks away (inactivity timeout): Session expires. Subscriber must reauthenticate (UC-4). Resume step 1.
- 3b. Absolute timeout reached (e.g., 12 hours): Session expires regardless of activity. Prevents stolen tokens from being useful indefinitely. Resume step 1.
- 3c. Attacker steals the session token: Token was embedded in a URL and leaked via referrer header, or extracted via XSS. Token must never be in URLs. Session tokens must be delivered only over encrypted connections.
- 3d. Attacker replays token from different context: Verifier flags anomalous IP or user-agent. May invalidate session or require reauthentication.
- 5a. Subscriber only deletes the cookie client-side: Session remains valid server-side. Attacker who obtained the token can still use it. Logout must invalidate server-side.
Minimal Guarantee: Session is always invalidated on logout or timeout. Server-side invalidation.
Success Guarantee: Session is maintained while active, terminated cleanly on logout or timeout.

UC-8: Restore Account Security After Compromise

Primary Actor: Subscriber
Goal: Replace a compromised password and restore the account to a secure state
Scope: Verifier
Level: User goal
Trigger: Subscriber is informed their password must be changed
Preconditions: Verifier has flagged the password as compromised
Stakeholders:
- Subscriber — wants to regain security without losing access
- Attacker — wants to use the compromised credential before it’s changed; may have already changed it
Main Success Scenario:
1. Subscriber attempts to log in
2. Verifier authenticates the subscriber
3. Verifier forces password change before granting session
4. Subscriber chooses a new password (UC-1)
5. Verifier invalidates the compromised password and prevents its reuse
6. Verifier grants session with new password
Extensions:
- 1a. Attacker already changed the password: Subscriber is locked out. Account recovery (UC-9) required. Fail for this UC.
- 1b. Subscriber doesn’t log in for weeks: Flag persists. Forced change applies whenever they return.
- 4a. Subscriber tries to reuse the compromised password: Attacker who obtained the old password could guess the subscriber would try to keep it. Reuse is prohibited. Resume step 4.
- a. *System triggers this change on a 90-day timer instead of breach evidence: This is forced rotation — it produces the mutation problem described in UC-1 ext *a. Change is forced only on evidence of compromise, never on a calendar.
Minimal Guarantee: Compromised password cannot be used after the forced-change login.
Success Guarantee: New password set; compromised credential permanently invalidated.

UC-9: Recover Account

Primary Actor: Subscriber
Goal: Regain access when the primary authenticator is lost or forgotten
Scope: Verifier
Level: User goal
Trigger: Subscriber cannot authenticate
Preconditions: Recovery mechanism registered
Stakeholders:
- Subscriber — wants to regain access without excessive friction
- Attacker — wants to hijack the account by social-engineering the recovery flow
Main Success Scenario:
1. Subscriber initiates recovery
2. Verifier presents recovery challenge appropriate to the account’s AAL
3. Subscriber provides recovery codes or alternative second factor
4. Verifier validates and grants limited access (password change only)
5. Subscriber sets new password (UC-1) and registers new authenticators if needed
6. Verifier notifies subscriber that authenticators were changed
Extensions:
- 2a. AAL2+ account, attacker tries email-only recovery: Email alone would bypass the second factor. Recovery must match the account’s assurance level. AAL2 requires recovery codes or alternative MFA. Fail for email-only at AAL2+.
- 3a. Recovery code already used: Codes are single-use. Attacker who obtained one code cannot reuse it. Resume step 3 with another code.
- 3b. All recovery codes exhausted: Subscriber contacts support. Re-enrollment at original identity proofing level. Fail for automated recovery.
- 3c. Attacker attempts social-engineering: Recovery requires a registered mechanism, not human judgment. Automated flow rejects. Fail.
- 6a. Subscriber did not initiate the change: Notification alerts subscriber to potential takeover. Subscriber can lock account.
Technology & Data Variations:
- AAL1: email-based recovery acceptable
- AAL2+: recovery codes or alternative MFA required
Minimal Guarantee: Recovery never downgrades the account’s assurance level.
Success Guarantee: Subscriber regains access with fresh credentials at the original AAL.

Why 95% Utilization Feels Broken: A Queue Demo, Three Review Rounds, and a Better Model

2026-03-28T00:00:00+00:00

A queue at 95% target load is mathematically stable. A dashboard says fine. Watch it run and your gut says broken. That gap is where queuing intuition fails.

I built a terminal demo with Claude to show this. I designed the teaching progression and the analogies. Claude wrote the implementation. The demo looked right after the first draft. Three rounds of adversarial external review proved it was teaching wrong lessons confidently.

What the demo teaches

Target load is the ratio of arrival rate to service rate, written ρ (rho) in queuing theory.

Three metrics tell you how a queue behaves. Throughput is how many customers walk out the door per hour. Flow time is how long you’re on premises — from the moment you get in line to the moment you leave with your order. WIP (work in process) is everyone currently in the building — waiting in line plus being served. Little’s Law ties them together: flow time = WIP / throughput. When one gets worse, the others move with it.

The sparklines below show WIP over time. The number at the end is average flow time. Those are the metrics to watch as we add complexity.

Each step removes one simplification: the gate, perfect regularity, randomness on one side, both sides, the remaining headroom.

Start with no randomness. A sushi boat. The chef places a plate, it circles to you, you grab it, the empty spot comes back. Nobody arrives until there’s room. No queue is possible because arrivals are gated by departures. That’s lockstep — a gated handoff, not a standard open queue.

Now remove the gate. A merry-go-round: kids show up every 3.3 minutes whether or not a horse is free, but each ride takes exactly 3. Arrivals are independent of departures for the first time. A queue could form — arrivals no longer wait for an opening. It doesn’t, because the timing is still perfectly regular. Queuing theory calls this D/D/1 — deterministic arrivals, deterministic service, one server. This system stays stable as long as arrivals come slower than service completes. That condition — arrival rate below service rate, or ρ < 1 — is what makes any queuing model stable. When it holds, the queue doesn’t grow without bound. When it doesn’t, no amount of buffering saves you.

In the sparklines below, the low bar (▁) is the baseline — zero WIP. Taller blocks mean more customers in the system.

                         WIP over time                                TP      avg WIP  avg flow
Lockstep:               ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  20/hr   0.0      —
Fixed Schedule (D/D/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁  16.5/hr 0.0      0.0min

Flat lines. No waiting. Simple and predictable, but nothing in production looks like this.

Add randomness to one side. A coffee shop. Every drink takes exactly 3 minutes. But customers arrive unpredictably — two walk in together, then nobody for ten minutes. The server can’t absorb the bursts instantly. It forms and drains. That’s variable arrivals, fixed service (M/D/1).

Flip it. A dentist with appointments every 30 minutes. Most visits take 25. Some run to 40. The patient who arrives on time for the next slot waits because the previous one ran over. That’s fixed arrivals, variable service (D/M/1). Either source of variability alone creates queues, even when the server is fast enough on average.

                          WIP over time                                TP      avg WIP  avg flow
Random Arrivals (M/D/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▂▁▁  16.1/hr 0.6      2.1min
Random Service (D/M/1):  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▁▁  17.0/hr 0.6      2.0min

Average demand is 10% below capacity. Occasional queuing is nevertheless visible.

Add randomness to both sides. A food truck. Customers show up whenever. Some order a taco, some a custom burrito. Neither side is predictable.

                            WIP over time                                TP      avg WIP  avg flow
Random Everything (M/M/1): ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▂▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▂▄▃▃▁▁  15.7/hr 0.8      3.2min

That’s M/M/1. Same target load. Average flow time jumped from ~2 min to 3.2.

Push the load. Same model, target load raised from 0.90 to 0.95. Then past capacity to 1.5 — demand exceeds service and the backlog grows.

                              WIP over time                                TP      avg WIP  avg flow
Near Full (M/M/1, ρ=0.95):  ▁▁▁▁▁▁▁▁▁▁▂▃▂▁▁▁▁▁▁▁▃▃▄▂▁▂▃▄▃▂▅▁▃▃▁▁▁▂▁▁  16.2/hr 1.6      5.8min
Overloaded (M/M/1, ρ=1.5):  ▁▂▂▂▃▃▃▂▁▂▂▂▁▁▁▂▂▂▃▅▅▅▃▃▃▃▃▃▃▂▄▅▆▇▇▇▅▅▅▇  21.5/hr 4.0      7.4min*

* Overloaded wait counts only completed customers. Those still queued at the time horizon are excluded. This understates congestion.

Five percentage points of load. Nearly 2x the flow time. “95% utilized” sounds like 5% less headroom.

The overloaded sparkline climbs and doesn’t come back.

In steady state, near-full is far worse than this demo shows. M/M/1 theory predicts about 57 minutes of average flow time at ρ=0.95 with 3-minute mean service. The demo’s 5.8 minutes reflects a short cold-start run that never reaches that regime. The nonlinear pain is real. The demo understates it.

Stable scenarios run all customers to completion before measuring. Overloaded runs for a fixed time horizon. The full comparison:

Scenario                        │ target ρ │ peak WIP │ avg WIP │ avg flow
─────────────────────────────────────────────────────────────────────
Lockstep                        │      —   │      0 │   0.0 │        —
Fixed Schedule (D/D/1)          │    0.90  │      0 │   0.0 │   0.0min
Random Arrivals (M/D/1)         │    0.90  │      4 │   0.6 │   2.1min
Random Service (D/M/1)          │    0.90  │      4 │   0.6 │   2.0min
Random Everything (M/M/1)       │    0.90  │      5 │   0.8 │   3.2min
Near Full (M/M/1)               │    0.95  │      6 │   1.6 │   5.8min
Overloaded (M/M/1)              │    1.50  │     10 │   4.0 │   7.4min*

These lessons are only as trustworthy as the simulation behind them. The first version looked plausible and was subtly wrong.

Three review rounds that made it trustworthy

Each round: I sent the current plan to an external AI reviewer for adversarial grading, evaluated the feedback, decided what to change, and had Claude implement the fix.

Round 1: target load 1.0 has no steady state

I’d chosen target load 1.0 as baseline. Capacity equals demand. Natural starting point.

M/M/1 at load 1.0 has no stationary distribution. Mean queue length is infinite. In a 50-customer run, the specific random path dominates the results, not the underlying process. We were demonstrating seed sensitivity, not queuing theory.

I changed it to target load 0.9 for stochastic scenarios. Added the near-full scenario at 0.95. Overloaded at 1.5, where the demo doesn’t claim steady state.

Principle: The obvious parameter made validation impossible.

Round 2: you can’t verify what you assumed

Two catches.

Circular Little’s Law. The implementation computed flow time from WIP / throughput, then “verified” that WIP = throughput * flow time. That’s algebra, not verification.

The fix: timestamp each customer independently. Compute flow time from timestamps. Compute average WIP from event-time integration. Check whether WIP = throughput * flow time. The ratio is 1.00 (within rounding) for every stable scenario:

Little's Law consistency check (WIP ≈ TP × FT):

Random Arrivals (M/D/1)          WIP=0.55  TP×FT=0.55  ratio=1.00
Random Service (D/M/1)           WIP=0.58  TP×FT=0.58  ratio=1.00
Random Everything (M/M/1)        WIP=0.84  TP×FT=0.84  ratio=1.00
Near Full (M/M/1, ρ=0.95)        WIP=1.57  TP×FT=1.57  ratio=1.00

A consistency check, not external validation. But when one side was derived from the other, even this check was impossible.

// Flow time -- filter completed, map to duration, average.
completed := slice.From(r.customers).KeepIf(customer.IsCompleted)
flowTimes := completed.ToFloat64(customer.FlowTime)
m.avgFlow = flowTimes.Sum() / float64(completed.Len())

// integrateWIP accumulates area under the WIP curve.
type wipState struct{ area, prevTime float64; prevWIP int }
integrateWIP := func(s wipState, e logEntry) wipState {
    dt := e.time - s.prevTime
    return wipState{s.area + float64(s.prevWIP)*dt, e.time, e.systemSize}
}

// WIP -- fold over event log, then divide by total time.
final := slice.Fold(r.log, wipState{}, integrateWIP)
m.avgWIP = final.area / r.endTime

Flow time from timestamps. WIP from integration. Neither derived from the other.

“Common seeds” aren’t matched traces. Different scenarios consume random numbers differently. The fixed-schedule scenario uses none. The random-arrivals scenario draws only from the arrival sequence. Sharing a seed doesn’t mean scenarios see the same arrivals. Fix: pre-generate one interarrival sequence and one service sequence. Each scenario slices what it needs.

Principle: Verification that travels the same code path as computation isn’t verification.

Round 3: simulation is not animation

The first implementation used real-time sleeps with 500ms terminal ticks. The refresh rate was the simulation clock.

Two customers arriving 0.3 simulated minutes apart land in the same tick. We weren’t simulating random arrivals. We were simulating whatever the tick granularity permits.

I decided on discrete-event simulation in virtual time. Run instantly. Record everything. Animate playback separately.

func runSim(cfg simConfig) simResult {
    var (
        customers []customer
        log       []logEntry
        eq        eventQueue
        queue     []int // FIFO
        busy      bool
    )
    heap.Init(&eq)

    record := func(t float64, typ eventType, custIdx, qDepth int, serverBusy bool) {
        log = append(log, logEntry{
            time: t, typ: typ, custIdx: custIdx,
            queueDepth: qDepth, serverBusy: serverBusy,
        })
    }
    // ... process events in simulated time, record everything
}

Playback at 360x. All metrics in simulated units — “Avg wait: 5.8 min” means simulated minutes, not wall-clock.

Principle: Coupling simulation to rendering makes both unreliable.

Three questions from these reviews. Is your baseline valid? Is your verification independent of your computation? Is your clock decoupled from your display? Believable output is not the same as a trustworthy model.

Source code

Two Rules for Readable Density

2026-03-26T00:00:00+00:00

Most readability advice resists mechanical checking. “Use good names.” “Keep functions short.” You need the whole function, maybe the whole module, to evaluate those. These two rules you can check by reading a single line. The examples are in Go, but the rules apply to any language with nested expressions.

The uniform comma rule

Every comma in an expression should belong to the same argument list.

result := append(append(items, extra), overflow...)

Two commas, but they belong to different calls. items, extra feed the inner append. append(items, extra) and overflow... feed the outer. Your eye has to match each comma to its call to parse this.

combined := append(items, extra)
result := append(combined, overflow...)

Every comma on each line belongs to one call.

The shallow nesting rule

No more than two opening delimiters — parentheses, brackets, or braces — before a corresponding close.

name := strings.ToLower(strings.TrimSpace(strings.ReplaceAll(raw, "_", " ")))

strings.ToLower( is one open. strings.TrimSpace( is two. strings.ReplaceAll( is three. Three levels deep before anything resolves, all to clean up a string.

spaced := strings.ReplaceAll(raw, "_", " ")
name := strings.ToLower(strings.TrimSpace(spaced))

Neither line nests past two.

Brackets count. Map lookups are delimiter pairs:

name := users[groups[ids[index]]]

Three opens.

id := groups[ids[index]]
name := users[id]

Why two rules

They catch different things.

result := process(transform(x, y), z)

Two opens — nesting is fine. But x, y belongs to transform while transform(x, y), z belongs to process. Commas at two levels. Only the uniform comma rule flags this.

value := outer(middle(inner()))

No commas. Three opens before the first close. Only the shallow nesting rule flags this.

Some real offenders trip both:

parts = append(parts, strconv.FormatFloat(math.Abs(val), 'f', 2, 64))

Three opens and commas at two levels.

formatted := strconv.FormatFloat(math.Abs(val), 'f', 2, 64)
parts = append(parts, formatted)

One extraction and both rules are satisfied. The remaining lines are still dense — but neither nests past two, and every comma belongs to one call. Judge their legibility for yourself.

The fix is always the same: extract to a named variable. Naming the variable documents what the expression computes. The outer expression reads in terms of a word instead of a computation.

Both rules work at the smallest scale: one line, one expression. You can check them in review without understanding what the program does. As far as I can tell, no existing linter enforces either rule. Tools like nestif, gocognit, and ESLint’s max-depth check control-flow nesting — if inside if inside if. None check expression-level delimiter depth or mixed comma membership.

They came from an itch. Certain lines have always struck me as harder to read than they should be, given how little they do. These rules are the closest I’ve come to saying why.

Bash Style Guide

2026-02-27T00:00:00+00:00

Bash Style Guide

Prescriptive conventions for bash code under IFS=$'\n'; set -o noglob. Techniques are general; examples use standalone script style unless demonstrating library conventions.

1. Shebang and Version

#!/usr/bin/env bash. Bash 4.4+ minimum (for ${var@Q}).

File extensions: .bash for libraries, no extension for executables.

2. Safety Preamble

Two tiers: libraries and scripts.

Libraries: expect IFS=$'\n' and noglob from their callers, no set -e — callers own error policy. The library files themselves don’t set these; consumers do after sourcing (see boilerplate below). Some libraries handle IFS internally per-function with IFS='' read -r.

Consumers set this after sourcing:

IFS=$'\n'
set -o noglob

Scripts: defer strict mode until after option parsing. Option parsing uses $* unquoted and tests ${1:-}, which interact poorly with set -eu before args are validated.

Standard for new scripts: set -euo pipefail. Add f if noglob is not already set (f is equivalent to set -o noglob). Libraries should not force strict mode on their consumers.

The return 2>/dev/null line before strict mode enables interactive debugging by sourcing the script without executing main.

Library consumer boilerplate:

source ~/.local/lib/mylib.bash 2>/dev/null || { echo 'fatal: mylib.bash not found' >&2; exit 1; }

# enable safe expansion
IFS=$'\n'
set -o noglob

return 2>/dev/null    # stop if sourced, for interactive debugging
main $*               # entry point — library consumers may strip parsed options first

Script bottom:

# strict mode
return 2>/dev/null
set -euo pipefail
set -o noglob

main "$@"

3. Naming

Every file has a Naming Policy header comment (see template below). The rules:

Functions (libraries): namespace.PascalCase (public), namespace.camelCase (private). Namespace is the project name lowercase (e.g., lib.). Libraries are sourced by others and need namespace collision protection; standalone scripts use plain PascalCase/camelCase (see Standalone scripts below).
Locals: camelCase — begin with lowercase. Compound words that are single semantic concepts stay lowercase: filename, testname, fieldname (not fileName, testName, fieldName). Arrays use plural names (testnames, filenames, requestedTests); scalars use singular. Unpack positional parameters on one local line: local got=$1 want=$2, local msg=$1 rc=${2:-$?}.
Globals: PascalCase — begin with uppercase. Libraries append a randomly-chosen project-specific suffix letter (e.g., DebugQ, ShowProgressQ, TimeFuncQ) to prevent namespace collisions. Globals are not public — create accessor functions if consumers need them. Standalone scripts omit the suffix. Associative and indexed arrays that are global must use declare -gA or declare -ga, not declare -A or declare -a. Without -g, declare inside a function creates a local variable regardless of naming convention. This matters when a library is sourced inside a function (e.g., a convergence wrapper) — the arrays go out of scope when the sourcing function returns.
Namerefs: local -n UPPERCASE=$1 — borrows the environment variable namespace (all-caps). Namerefs point to the caller’s variable, so they need names that won’t collide with any local. UPPERCASE is safe because locals are always camelCase.
“List” in names: functions that serialize arrays into newline-separated strings use “List” – ListOf(), StreamList(). Variables holding serialized lists use the *List suffix (e.g., groupList, commandList). The *List suffix signals multi-value content (implies IFS characters), so no trailing _ is needed – the two conventions are mutually exclusive.
Standard globals (suffix exceptions): NL=$'\n' for string interpolation in double quotes. Prog=$(basename "$0") is standard in scripts that report their own name. These are conventional exceptions to the suffix rule.
Standalone scripts: no namespace prefix on functions, no suffix letter on globals — not sourced by others, so no collision risk.

Example header (library):

# Naming Policy:
#
# All function and variable names are camelCased.
#
# Private function names begin with lowercase letters.
# Public function names begin with uppercase letters.
# Function names are prefixed with "lib." (always lowercase) so they are namespaced.
#
# Local variable names begin with lowercase letters, e.g. localVariable.
#
# Global variable names begin with uppercase letters, e.g. GlobalVariable.
# Since this is a library, global variable names are also namespaced by suffixing them with
# the randomly-generated letter Q, e.g. GlobalVariableQ.
# Global variables are not public.  Library consumers should not be aware of them.
# If users need to interact with them, create accessor functions for the purpose.
#
# Variable declarations that are name references borrow the environment namespace, e.g.
# "local -n ARRAY=$1".

4. Namespace Suffix

Single letter per library appended to all globals and DI vars. Prevents collisions when libraries are sourced together. Choose a random letter per library — described as “randomly-generated” in headers.

Standalone scripts omit the suffix — not sourced by others, so no collision risk.

TimeFuncQ=UnixMilli   # DI variable
ShowProgressQ=1       # global
DebugQ=0              # global

5. Quoting

_ suffix on variables means “must quote on expansion.” Two reasons qualify a variable for the suffix:

Contains IFS characters (newlines under IFS=$'\n') — unquoted expansion splits into multiple words.
Can be empty — unquoted expansion disappears entirely, breaking positional argument pairing.

In practice: commands_ (trap output), content_ (user input, may contain newlines), usage_ (multiline heredoc), tags_ (optional flag, empty when not provided).

The *List suffix is an alternative convention for multi-value variables: groupList, commandList. The suffix signals IFS content (implies must-quote). _ and *List are mutually exclusive on the same variable.

Variables without _ or *List are safe unquoted under IFS=$'\n'; set -o noglob.

Nameref collision avoidance uses a separate strategy: UPPERCASE names (see Naming).

printf %q escapes a value for shell re-evaluation (eval-safe):

printf -v output '%q ' "$@"    # output is safe to eval

${var@Q} renders a human-readable quoted literal. Used for debug output and test copy-paste lines:

CMD="sudo -u ${RunAsUser@Q} bash -c ${CMD@Q}"    # readable in logs
echo "want=${got@Q}"                                # tests — paste to update expected value

read -r discipline: always use read -r to avoid backslash interpretation. Prefer IFS='' read -r when consuming raw lines (see FP Pipeline Helpers for the canonical pattern).

Avoid braces in expansion. $var, not ${var} — braces add noise for no benefit when the variable name is unambiguous. For disambiguation when text follows the name, prefer quotes over braces: "$var"Suffix concatenates the quoted expansion with the literal. Use braces when the variable is embedded mid-string and quotes can’t delimit it: "prefix${var}suffix".

Array/positional expansion: "${array[@]}" and "$@" preserve element boundaries — each element stays a separate word. "$*" joins elements with the first character of IFS (useful for serialization). Unquoted, both ${array[@]} and $@ undergo word splitting on IFS, so elements containing newlines get broken apart. Under set -u, an empty array needs ${args[@]:-} as fallback.

Quoting decision tree. Walk this algorithm for any expansion you’re unsure about:

No-split context? Assignment RHS, [[ ]] (except RHS of == and =~), (( )), case, array subscripts, ${...} operators, redirections, here-strings — quoting is unnecessary. These contexts never split or glob regardless of IFS/noglob settings.
_-suffixed or *List variable? Contains IFS characters (newlines). Must quote in non-assignment contexts: echo "$usage_", hasGroup "$groupList".
Required-quoting context? Array expansion ("${arr[@]}"), RHS of == in [[ (for literal match), eval arguments, trap strings, external command arguments, process substitution with multi-line content — must quote. See the full list below.
Otherwise — safe unquoted under IFS=$'\n'; set -o noglob. The variable has no _ suffix (newline-free by convention), and the context is a shell builtin or function call with scalar arguments.

Why not quote everything? Under IFS+noglob, selective quoting signals intent. Quotes mean “this value needs protection” — either it contains IFS characters, or the context demands exact word boundaries. Quoting every expansion adds noise without adding safety, and obscures which values actually require care. When a reviewer sees quotes, they should be able to trust that those quotes are there for a reason.

When to quote. Under IFS=$'\n'; set -o noglob, most scalar expansions are safe unquoted. Quotes are required in these contexts:

Trust boundaries and the _ suffix — assigning a parameter to a non-_ variable documents that it won’t contain IFS characters: local command=$1 means “I expect single-line input.” If a parameter may contain newlines, assign to a _-suffixed variable and quote from there.
"${array[@]}" / "$@" / "$*" — quote to preserve element boundaries (see above). Unquote only when IFS splitting is intentional (e.g., populating arrays from command output: local arr=( $(command) )).
RHS of == in [[ — [[ $x == "$y" ]] for literal match. Unquoted RHS is a glob pattern: *, ?, [ become wildcards. Leave unquoted for intentional pattern matching: [[ $OSTYPE == darwin* ]].
RHS of =~ in [[ — quoting disables regex metacharacter interpretation in bash 3.2+ (. becomes literal dot, * loses repetition meaning), though the regex engine is still in use. Leave unquoted for regex matching (the common case): [[ $x =~ ^[0-9]+$ ]]. For complex patterns, store in a variable: local pattern='^[0-9]+$'; [[ $x =~ $pattern ]].
_-suffixed variables in non-assignment contexts — contain IFS characters (newlines), must quote: eval "$testSource_", echo "$Usage_".
eval arguments — eval "$CMD". Without quotes, newlines become argument separators; eval joins arguments with spaces, changing multi-line code semantics.
Command substitution as argument — a judgment call. func "$(command)" when the result should be a single word. Unquoted $(command) splits on newlines, which is sometimes desired: local arr=( $(listItems) ).
trap command strings — trap "$command$NL$(existing)" EXIT. The string is stored for later eval; must be a single coherent argument.
Process substitution with multi-line content — diff <(echo "$got") <(echo "$want"). Unquoted echo $var splits on newlines into separate arguments; echo outputs them space-separated, destroying line structure.
External command arguments — mkdir -p "$dir", install -m "$mode", ssh-keygen -f "$file". Without noglob, unquoted values undergo pathname expansion before the command sees them. Scripts using set -euo pipefail without f need this; code following these conventions quotes external command args consistently regardless.
Positional pairing arguments — APIs that consume arguments in key-value pairs (jq --arg name value, custom key value key value functions) break when an empty variable expands to nothing, shifting all subsequent pairs. Quote empty-possible values: --arg t "$type_". Better: design pair-consuming APIs to accept key=value as single arguments so empty values produce key= (one word) rather than disappearing.

When quoting is unnecessary. These contexts never split or glob — quoting is harmless but adds no safety:

Assignment RHS — local var=$value, var=$(command), var=${1:-default}. Bash assigns the full expansion without splitting.
[[ ]] operands (except RHS of == and =~) — [[ -e $file ]], [[ $var == pattern ]] (LHS). The conditional command suppresses splitting.
(( )) arithmetic — (( rc == 0 )), (( ${#array[@]} )). Arithmetic context, not string context.
case word — case $var in. No splitting.
Array subscripts — ${map[$key]}, array[$idx]=val. Inside brackets, no splitting.
Inside ${...} operators — ${1:-$default}, ${var#$prefix}. Nested expansions are protected.
Redirection targets — >$file, <$file, <<<$var. Bash takes the single word.
Scalar command arguments — func $simplevar, printf $fmt $val. No word-splitting surprises for newline-free values under IFS=$'\n'; set -o noglob. This is the default assumption for variables without the _ suffix. Note: commands still interpret values (printf parses its format string) — quoting controls splitting, not command semantics.

6. Variable Scoping

Bash has dynamic scoping: a function can read and modify variables in its caller’s scope, even local variables. This is the opposite of lexical scoping (C, Python, Go) where a function can only see its own locals and globals.

Mechanism. When bash resolves a variable name, it walks up the call stack. A callee’s local x shadows the caller’s x, but without local, the callee accesses the caller’s variable directly. This applies to both reads and writes.

Deliberate use — callback counting. A test runner can exploit dynamic scoping intentionally. The callback modifies passCount and failCount, which are locals in the calling function:

passCount+=1   # in caller's scope

The comment # in caller's scope documents the intentional cross-scope access. Without this pattern, the runner would need to pass counters through return values or globals.

Accidental shadowing — the collision risk. If a callee declares local x and the caller also has local x, the callee gets its own copy. But if the callee doesn’t declare local and uses x, it silently modifies the caller’s x. This is especially dangerous with namerefs: local -n REF=$1 — if $1 is REF, the nameref points to itself (circular reference).

Defenses:

Naming conventions are the primary protection. camelCase locals and PascalCase + suffix globals occupy separate namespaces. Two callees in the same chain are unlikely to collide if they follow conventions.
UPPERCASE namerefs (local -n ARRAY=$1) borrow the environment variable namespace, which never collides with camelCase locals in the caller.
Subshell () function bodies provide hard isolation when dynamic scoping is unwanted. Changes to variables, working directory, and shell options are discarded when the subshell exits:

createCloneRepo() (     # () not {} — subshell isolates side effects
  git init clone
  cd clone              # doesn't affect caller's pwd
  echo hello >hello.txt
  git add hello.txt && git commit -m init
) >/dev/null

Use () when a helper needs to cd or modify shell state; use {} (the default) when the caller needs to see the function’s side effects.

7. Conditionals

[[ exclusively. [[ is bash’s compound command with pattern matching, no word splitting, and &&/|| inside.

(( )) for arithmetic and booleans. Boolean flags are 0/1 integers tested bare: (( failed )) && return 1, (( hasSubtests )) && echo .... Numeric variables use explicit comparison: (( rc == 0 )), (( pid != 0 )). Arithmetic expansion: $(( endTime - startTime )).

8. Error Handling

Two patterns coexist.

fatal() with message + optional exit code. Default rc is $?:

fatal() {
  local msg=$1 rc=${2:-$?}
  echo "fatal: $msg"
  exit $rc
}

Libraries namespace this (e.g., lib.Fatal) and typically print to stderr.

Return code 128 as fatal signal. A test framework can detect 128 and report “fatal” distinct from regular failure:

case $rc in
  0   ) printf $columns $Pass $duration $testname; passCount+=1;;
  128 ) printf $columns $Fatal $duration $Yellow$testname$Reset;;
  *   ) printf $columns $Fail $duration $Yellow$testname$Reset;;
esac

RC capture: cmd && rc=$? || rc=$? preserves exit code that set -e would otherwise lose. Safe under set -e because the || makes the overall compound command always succeed; set -e only triggers on unchecked failures.

output=$(eval "$cmd" 2>&1) && rc=$? || rc=$?

Trailing && at end of function: a function whose last command is [[ test ]] && cmd returns the test’s exit code when the test is false. Under set -e at the call site, that propagates as a non-zero return and terminates the caller — even when the function did exactly what it was meant to do (skip the conditional action).

# Bug: when $stashRef is empty, the [[ -n ]] test fails (rc 1), the
# function returns 1, and a caller running under `set -e` aborts.
gitUpdate() {
  local stashRef
  stashRef=$(git stash list | head -1)
  [[ -n $stashRef ]] && git stash drop $stashRef
}

# Fix 1 (preferred): invert the test so the no-op branch returns success.
[[ -z $stashRef ]] || git stash drop $stashRef

# Fix 2: explicit conditional.
if [[ -n $stashRef ]]; then git stash drop $stashRef; fi

# Fix 3: catch-all trailing return.
[[ -n $stashRef ]] && git stash drop $stashRef
return 0

The gotcha generalizes: any compound where the failure branch is “do nothing” needs the function to still return zero. Inverting the test with || is usually the cleanest form — the conditional reads as “skip unless” rather than “do if.”

pipefail: standard for new scripts. set -euo pipefail.

Strict mode escape: loosely() for sourcing optional configs that may not exist or may fail benignly:

loosely() {
  set +euo pipefail
  "$@"
  set -euo pipefail
}
loosely source /etc/profile.d/optional-tool.sh

9. Dependency Injection

Assign function names to PascalCase + suffix variables. Override in tests:

# production default
TimeFuncQ=UnixMilli

# in test
TimeFuncQ=mockUnixMilli

10. Code Organization

Cuddling: group related lines together, separate concepts with blank lines. One concept per group — similar to golangci-lint’s wsl rules.

Scripts: option parsing near bottom, return 2>/dev/null (debug hook), strict mode, then main call as last line.

Libraries: function definitions only, no main call. Consumer scripts call the entry point.

Library consumers follow boilerplate: source → IFS → noglob → return → entry point.

Standard flags: -h/--help, -v/--version, -x/--trace (set -x for debugging). Libraries typically provide an option handler for these.

11. Comments

Three placements.

Function docs go directly above the definition, no blank line between. Start with the function name:

# lib.Main runs any test functions in the files given as arguments.
# It outputs success or failure.
lib.Main() {

Inline comments explain non-obvious flags, return codes, or surprising behavior:

local tmpname=$(mktemp -u)   # -u doesn't create a file, just a name
(( $? == 128 )) && return 128 # fatal
local NL=$'\n' # newline works with backgrounding (&) and legal semicolons, semicolon doesn't

Section markers use a hierarchical style like inverted markdown headers: # is the lowest level, ## is a level up. Rarely more than ## in practice. Preceded by a blank line:

# strict mode          ← low-level annotation

## library functions   ← major section

## logging             ← major section

12. Testing

Test framework conventions.

Associative array cases define test data:

local -A case1=(
  [name]='not run when ok'
  [command]="cmd 'echo hello'"
  [ok]=true
  [wants]="(ok 'not run when ok')"
)

Unpack with Inherit. Unset optional fields first so missing keys don’t carry over:

unset -v ok shortrun prog unchg want wanterr
eval "$(Inherit "$casename")"

Run with RunCases ${!case@} — pass all case variables at once:

RunCases ${!case@}

RunCases iterates its arguments internally and returns 1 if any case failed, 128 on fatal. For per-case error handling (e.g., early return on fatal), use a loop:

local failed=0 casename
for casename in ${!case@}; do
  RunCases $casename || {
    (( $? == 128 )) && return 128   # fatal
    failed=1
  }
done
return $failed

Assertion failure output shows a diff and a copy-paste line for easy test updates:

[[ $got == $want ]] || {
  echo "${NL}cmd: got doesn't match want:$NL$(Diff "$got" "$want")$NL"
  echo "use this line to update want to match this output:${NL}want=${got@Q}"
  return 1
}

Assertion helpers — the preferred pattern (replaces the manual version above):

AssertGot "$got" "$want"
AssertRC $rc 0

AssertGot compares strings, shows a diff and copy-paste update line on mismatch. AssertRC compares return codes. Both return 1 on failure.

Subshell () for directory isolation in setup helpers — changes to working directory don’t leak:

createCloneRepo() (
  git init clone
  cd clone
  echo hello >hello.txt
  git add hello.txt
  git commit -m init
) >/dev/null

MktempDir with deferred cleanup (cleanup is registered automatically via Defer; see Section 14 for the implementation):

MktempDir dir || return 128

AAA structure: ## arrange, ## act, ## assert comment sections in each subtest.

13. FP Pipeline Helpers

Stdin-based composition: command name as first arg, applied to each line via eval. Core trio: Each (side effects), Map (transform), KeepIf/RemoveIf (filter). The eval "$command $arg" pattern assumes trusted input — callers are responsible for escaping with printf %q if values originate from untrusted sources.

The pattern:

each() {
  local command=$1 arg
  while IFS='' read -r arg; do
    eval "$command $arg"
  done
}

keepIf() {
  local command=$1 arg
  while IFS='' read -r arg; do
    eval "$command $arg" && echo "$arg"
  done
  return 0
}

map() {
  local VARNAME=$1 EXPRESSION=$2
  local "$VARNAME"
  while IFS='' read -r "$VARNAME"; do
    eval "echo \"$EXPRESSION\""
  done
}

Call site:

each Ln <<'  END'
  .config         ~/config
  .local          ~/local
  .ssh            ~/ssh
  secrets/netrc   ~/.netrc
END

Inline versions are common in standalone scripts; a shared library consolidates them with return 0 guards to prevent error propagation from the last iteration.

14. Trap Handling

EXIT traps only — ERR, DEBUG, RETURN, and signal handlers are not used.

Two patterns coexist: single assignment (scripts) and stacked (libraries).

Single assignment — scripts and test functions that control their own trap:

dir=$(mktemp -d)
trap "rm -rf $dir" EXIT

Direct trap "..." EXIT overwrites any previous handler. Safe when the function or script owns its entire trap lifecycle.

Stacked/deferred — libraries that must not overwrite the caller’s trap:

Defer() {
  local command=$1
  local NL=$'\n'
  trap "$command$NL$(existingDeferlist)" EXIT
}

New handlers prepend to the existing chain. existingDeferlist extracts the current handler via trap -p EXIT and strips the wrapper syntax. Commands execute in FIFO order. Use newlines (not semicolons) as separators — semicolons interact poorly with backgrounding (&).

Temp directory cleanup — the canonical pattern:

MktempDir() {
  local -n DIR=$1
  DIR=$(mktemp -d /tmp/bash.XXXXXX) || { echo 'could not create temporary directory'; return 1; }
  [[ $DIR == /*/* ]] || { echo 'temporary directory does not comply with naming requirements'; return 1; }
  [[ -d $DIR ]] || { echo 'temporary directory was made but does not exist now'; return 1; }
  Defer "rm -rf $DIR"
}

Validates the path before registering cleanup. The /*/* guard prevents rm -rf / if mktemp returns something unexpected.

15. Risks and Limitations

IFS=$'\n' + noglob + naming conventions eliminate most bash footguns, but not all. Each risk below describes the bash mechanism, how it bites, and the mitigation.

1. Dynamic scoping collision. A callee that omits local silently modifies the caller’s variable. A nameref whose name matches its target creates a circular reference:

outer() { local x=before; inner; echo $x; }   # prints "after" — inner modified outer's x
inner() { x=after; }                           # no local — writes to caller's scope

wrapper() { local -n REF=$1; REF=value; }
wrapper REF   # circular reference — bash emits "circular name reference" error

Mitigation: follow naming conventions (Section 3) — camelCase locals, UPPERCASE namerefs. Document intentional cross-scope access with # in caller's scope. See Section 6 for the full explanation.

2. Eval injection. The FP helpers execute eval "$command $arg" where $arg is a line from stdin. If arg contains shell metacharacters, they execute as code:

echo '; rm -rf /tmp/important' | each processLine   # eval runs: processLine ; rm -rf /tmp/important

Mitigation: only pass trusted input through FP pipelines. For untrusted values, escape with printf -v safe '%q' "$untrusted" before piping. The trust boundary is the eval call — everything reaching it must be safe to execute as shell words.

3. [[ RHS pattern matching. In [[ $x == $y ]], the unquoted RHS is a glob pattern — *, ?, and [ are wildcards. This is independent of set -o noglob, which only affects pathname expansion in command arguments. [[ has its own pattern-matching rules:

want='file[1]'
[[ 'file[1]' == $want ]]    # false — [1] is a character class matching the single character 1
[[ 'file[1]' == "$want" ]]  # true — literal comparison

Mitigation: quote the RHS for literal comparison: [[ $x == "$y" ]]. Leave unquoted only for intentional pattern matching: [[ $OSTYPE == darwin* ]].

4. Trailing newline stripping. Command substitution $(command) always strips trailing newlines from the output. This is a POSIX requirement, not a bash quirk:

output=$(printf 'hello\n\n')   # output is "hello" — both trailing newlines stripped
content=$(cat "$file")          # file's trailing newline(s) silently lost

Mitigation: if trailing newlines matter, append a sentinel and strip it: output=$(command; echo x); output=${output%x}. In practice, this rarely matters — most values are single-line identifiers or paths.

5. set -e propagation. In bash versions before 4.4, set -e does not propagate into command substitutions $(...), so failures inside are silently swallowed. Bash 4.4 introduced shopt -s inherit_errexit to fix this, but it is off by default — you must enable it explicitly. Even with inherit_errexit, compound commands inside $(...) can still behave unexpectedly. Process substitutions <(...) never inherit set -e in any version:

set -e
result=$(false; echo "still runs")    # "still runs" executes — errexit not inherited without inherit_errexit
while read -r line; do
  process "$line"
done < <(failing_command)              # failure undetected — process substitution ignores set -e

Mitigation: don’t rely on set -e inside command substitutions. Use explicit RC capture: result=$(command) && rc=$? || rc=$?. For critical operations, check $? after every command substitution. Alternatively, add shopt -s inherit_errexit to the preamble (bash 4.4+) to propagate set -e into command substitutions — but process substitutions remain unaffected.

6. Pipeline subshell variable loss. Each stage of a pipeline runs in a subshell. Variables modified inside a pipeline stage are lost when it exits:

count=0
command | while read -r line; do count+=1; done
echo $count   # still 0 — the while loop ran in a subshell

Mitigation: use process substitution instead: while read -r line; do count+=1; done < <(command). This runs the loop in the current shell while the command runs in the subshell. Code following these conventions avoids piping into loops.

7. loosely() hardcoded restore. The loosely() wrapper does set +euo pipefail then set -euo pipefail after the command. It doesn’t capture the previous shell options — it assumes the caller always uses -euo pipefail:

set -eu              # no pipefail yet
loosely source lib   # sets +euo pipefail, then -euo pipefail
# now pipefail is ON even though caller never set it

Mitigation: loosely() is safe only after set -euo pipefail is set. For library code that needs to temporarily relax options, save and restore with set +o:

local prevOpts
prevOpts=$(set +o)        # captures restore commands for all options
set +eu; set +o pipefail
command
eval "$prevOpts"           # restores exact previous state

set +o outputs set -o/set +o commands that reproduce the current option state. This handles all options including pipefail without fragile string matching.

Breadcrumbs for Humans and AI: How Pattern Docs Guide Developers to Correct Code

2026-02-02T00:00:00+00:00

A backend returns 200 OK with a JSON error body when downloads fail. This may seem unexpected at first. 200 indicates success. Arguably this is a protocol adherence issue, but it remains. Every new developer that works on downloads must learn this—one way or another. Every code review catches someone checking response.ok. The knowledge exists—in some developers’ heads.

This is tribal knowledge. It doesn’t scale. People leave, context-switch, or just forget. Code review becomes an oral tradition.

Pattern docs fix this. They externalize institutional knowledge into structured documentation that lives alongside the code. And because they’re structured, AI assistants benefit too—but that’s a bonus, not the point.

The Problem: Knowledge That Doesn’t Scale

Every codebase has conventions that aren’t obvious from the code:

Why we check Content-Type instead of response.ok
When to use the cache freshness indicator (and when not to)
Which ESLint rules we wrote ourselves and why

This knowledge lives in people’s heads. It transfers through:

Code review comments (repeated endlessly)
Slack threads (unsearchable after a month)
Onboarding conversations (different every time)
Trial and error (expensive)

The result: inconsistent code, repeated mistakes, slow onboarding, and knowledge that walks out the door when people leave.

The Solution: Pattern Documentation

Pattern docs capture the “why” behind conventions. They live in docs/patterns/ alongside the codebase.

Each pattern doc answers:

What’s the problem? Code example of what fails
What’s the solution? Working code with comments
When do I use this? Decision criteria
How do I find existing usages? Grep command

Example: Defensive File Download

Problem:

// PROBLEMATIC - Don't use
const response = await fetch(downloadPath);
if (!response.ok) throw new Error('Download failed');
// This misses errors! The backend returns 200 OK with JSON error body

Solution:

// Check Content-Type, not status code
const response = await fetch(downloadPath);
const contentType = response.headers.get('Content-Type');
if (contentType?.includes('application/json')) {
    const errorData = await response.json();
    throw new Error(errorData.error || 'Failed to download file');
}

When to use: User-initiated downloads needing error feedback

When NOT to use: Static CDN files, streaming large files (>100MB)

Human Benefits

Onboarding and knowledge preservation: New developers read the pattern doc instead of discovering conventions through trial and error. When someone leaves, the knowledge stays. “Why do we do it this way?” has a documented answer that doesn’t depend on who’s in the room.

Code review: Instead of explaining the same convention repeatedly, link to the pattern doc. Review comments become “See docs/patterns/defensive-file-download.md” instead of a paragraph of explanation.

Consistency: When the pattern is documented, people follow it. When it’s tribal knowledge, they reinvent it—differently each time.

Discoverability: Comments in code point to pattern docs:

// See: docs/patterns/defensive-file-download.md
const response = await fetch(downloadPath);

Developers see the comment, follow the link, understand the context. The breadcrumb is right where they need it.

AI Benefits (The Bonus)

If you document patterns for humans, AI assistants benefit automatically.

When an AI coding assistant reads code with a // See: docs/patterns/... comment, it follows the path. LLMs gather context before suggesting changes—a file path is an unambiguous signal.

The pattern doc answers what the AI implicitly asks: “Why is this code written this way? What constraints apply?”

Before pattern docs: AI suggests if (!response.ok)—correct generically, wrong for this codebase. Developer corrects it manually.

After pattern docs: AI reads the pattern doc, suggests the Content-Type check. No correction needed.

Same docs, two audiences. Write once, benefit twice.

AI Assists (The Accelerator)

AI assistants don’t just consume pattern docs—they help create them.

The grade/improve loop:

Describe the problem to the AI, show examples, let it draft
Ask the AI: “Grade this pattern doc—is it clear? Complete? Are the examples concrete?”
Prompt: “Improve” → the AI addresses its own critique
Repeat until satisfied
Apply your codebase knowledge, deploy, refine when reality reveals gaps

The AI handles the structure; you provide the institutional knowledge. Documentation that used to get postponed indefinitely now gets written.

Patterns Evolve

Pattern docs aren’t static. They evolve as real-world use reveals gaps.

Example: A custom ESLint rules pattern evolved over a few days:

Initial version flagged a specific accessor option
Refined to “all accessors should be suspect”—the initial scope was too narrow

The update workflow:

Discovery: Real-world use reveals the pattern is incomplete
Update the doc (source of truth)
Run Find References: grep -rn "docs/patterns/your-pattern" src/
Update code comments if needed

Bidirectional traceability—code points to docs, docs find code—makes updates systematic rather than “hope everyone got the memo.”

When This Doesn’t Work

Patterns requiring judgment: “Choose appropriate log level” doesn’t help anyone—human or AI. You need: “Use ERROR for user-facing failures, WARN for recoverable issues, DEBUG for everything else.”

Unstable conventions: Patterns that change weekly create maintenance churn. Start with stable, mechanical conventions.

Overhead: Doc renames require updating all reference sites. Worth it for stable patterns; consider this before frequent reorganization.

Getting Started

Start with work you just finished: You just fixed a bug or implemented a feature. Was there something non-obvious? A gotcha you discovered? Document it now while the context is fresh. That’s your first pattern doc.

Template:

Problem Statement - code example of what fails (and why)
Solution - working code with comments
When to Use / When NOT to Use - decision criteria
Find References - grep command to locate usages

Add the breadcrumb: Put // See: docs/patterns/your-pattern.md in the relevant code. Now it’s discoverable.

Use AI to draft: Describe the problem, let AI draft, grade/improve until satisfied.

The Payoff

Document conventions for humans. AI assistants benefit automatically. AI assistants help you write the docs faster.

The knowledge that used to exist only in people’s heads—now it scales.

The G/I Cycle: How Specific Deductions Beat ‘Try Harder’

2026-02-02T00:00:00+00:00

You write something with AI. It’s 70% right. Now what?

Most people accept it. That leaves quality on the table — wins that need only a little effort to tease out, but are typically much more expensive to defer to implementation.

The G/I cycle fixes this.

The G/I Cycle

G/I stands for Grade/Improve. The cycle is simple:

Work → Grade → Improve → Re-grade → Repeat until stuck

Grade means assigning a letter grade with specific point deductions. Not “this is pretty good” — that tells you nothing. Instead: “B+ (86/100). Deductions: -5 for not checking X, -4 for missing baseline, -3 for unverified assumption.”

Improve means addressing those deductions. Each “-5 for X” becomes a task. Do the task, then grade again.

Repeat until you can’t identify concrete improvements, or remaining deductions total less than 5 points.

The test: “If asked to improve right now, what would I do?” If you have an answer, you’re not done.

Why It Works

Three mechanisms:

1. Provides attention bandwidth. Each iteration lets the model focus on concerns it couldn’t address earlier. It genuinely improves itself across passes. These are free wins — you just say “improve” and the LLM follows its own judgment based on its grade. Most G/I cycles are just this: low-effort extraction of quality the model already knows how to deliver.

2. Exposes thinking for course correction. Grading externalizes the model’s assessment. You can see what it thinks is wrong. Most of the time, you let it run. But occasionally you notice something off — a wrong assumption, a misguided priority. That’s when you redirect. A single course correction can prevent entire avenues of wasted inquiry.

3. Surfaces unknown unknowns. Grading forces the model to ask “what didn’t I check?” — questions it wouldn’t ask if just told to “improve.” For deeper blind spots, use “grade your analysis” to grade at a meta level: the thinking process, not just the output.

A note on self-grading: LLMs grade themselves leniently. If you find gaps after an A, the A was wrong. B is not “acceptable” — B is incomplete work. Push past it.

The Economics

Stand on the LLM’s shoulders, not vice versa.

Your attention is expensive. The LLM’s iterations are cheap. Let it do its best work first — then invest your attention in evaluating the result.

Wrong: You guide every step → LLM executes → you fix gaps Right: LLM iterates to its best → you evaluate final output → you build on that foundation

When to step in: Remaining deductions under 5 points, grade stabilizes across iterations, or gaps require information you have and it doesn’t. Don’t stop just because you “improved once” or it “feels complete.” Use the point threshold.

One Caveat

Self-run G/I cycles in a single response aren’t worthwhile — except that they expose thinking for course correction. The value is in the separate prompts: you see the thinking, you can redirect if needed, then you say “improve.” Ignore the grade itself — focus on the deductions. If there are actionable deductions you find valuable, it’s not done, even if it gave itself an A+. It wanted to be done, but shouldn’t be. For deeper blind spots, say “grade your analysis” to surface unknown unknowns.

When G/I Works

Structured content, documentation, analysis, code review prep.

Why: These domains have verifiable criteria. You can objectively assess completeness, accuracy, and coverage. The grade has meaning.

When G/I Doesn’t Work

Creative work — no objective grading standard
Unstable requirements — criteria change faster than iterations
Time pressure under 5 minutes — overhead exceeds benefit

Getting Started

Try it on your next draft:

Ask the AI: “grade the plan” when planning, or “grade your work” after implementation
Glance at the deductions — redirect only if something looks off
Ask it, “improve” (nothing specific)
Repeat until deductions total less than 5 points
Now invest your attention in the result

Most cycles, step 2 is just a glance — you barely have to look. The AI follows its own judgment, and that’s usually fine. Just say “improve” (or configure a shortcut like /i). The value is in the accumulated improvement across iterations, plus the occasional checkpoint where you catch something before it goes sideways.

Example: Catching a Fabrication

A coaching report claimed “Research supports iteration for exploration and idea generation” — citing “Zhang et al. (2024).”

Grading would have caught:

-10: Citation mismatch — actual source says TDD remediation for local errors, not “exploration”
-5: Phantom citation — “Zhang et al. (2024)” doesn’t exist

Without G/I, the claim survived to the final report as unsourced “common wisdom.” With G/I, it would have been flagged and fixed in iteration 1.

The Payoff

The G/I cycle lets you extract the LLM’s best work before investing your attention. You stand on its shoulders rather than having it stand on yours.

The resulting plan stands alone — the synthesis baked in the dependencies. That’s how you free attention for implementation: you’re not carrying unresolved planning concerns forward.

The Reference

Copy this into your LLM’s system prompt or project instructions:

# G/I Cycle Reference

## The Cycle

Work → Grade → Improve → Re-grade → Repeat until stuck

**Grade:** Assign a letter grade with specific point deductions.
**Improve:** Address the deductions (or just say "improve" and let the LLM follow its judgment).
**Repeat:** Until remaining deductions <5 points or you hit a wall.

## Why It Works (Practical)

### 1. Attention Bandwidth (Primary Benefit)

Each iteration lets the model focus on concerns it couldn't address earlier. Most G/I cycles are just this: low-effort wins you'd otherwise defer to implementation.

### 2. Course Correction (Occasional)

Grading externalizes the model's thinking. Most of the time, you let it run. Occasionally you notice something off and redirect. A single course correction can prevent entire avenues of wasted inquiry.

### 3. Surfaces Unknown Unknowns

Grading forces the model to ask "what didn't I check?" — questions it wouldn't ask if just told to "improve." For deeper blind spots, use "grade your analysis" to grade at a meta level.

## Why Complexity Requires G/I (Theory)

One theory that aligns with observed results: LLMs have limited coherent attention for evaluating plans. Single-shot has enough budget for trivial changes but not complex ones. G/I works around this limit through:

1. **Output extends thinking** — writing the grade surfaces concerns that wouldn't fit in the attention window otherwise
2. **Synthesis reduces dependencies** — evaluation collapses conceptual complexity (like substituting y for f(x) — the evaluation happens once, not repeatedly)
3. **Addressed concerns free capacity** — each iteration doesn't re-attend to what's already fixed
4. **Surfaces what the LLM doesn't know it doesn't know** — LLMs have blind spots they can't see. Grading at a meta level (grading the thinking process, not just the output) can knock these loose

**The phasing effect:** G/I shifts planning work to the planning phase, where it belongs. Without G/I, unresolved planning concerns bleed into implementation, competing for attention and context needed for implementation details.

**Self-contained plans:** Planning evaluation produces a plan that stands alone — it no longer requires the context of the dependencies you evaluated to create it. The synthesis baked them in.

This reframes the economics: it's not just that fixing things later costs more effort. Unresolved planning work *actively degrades* implementation by consuming resources needed for implementation details.

## Grading Format

**Weak:** "I did a good job but could have done better."

**Strong:** "B+ (86/100). Deductions: -5 for not checking X, -4 for no baseline, -3 for unverified assumption."

## Watch for Inflated Grades

LLMs grade themselves leniently. If you find gaps after an A, the A was wrong. B is not "acceptable" — B is incomplete work. Push past it.

If you're getting As but the deductions feel real, they are real. Address them.

## The Test

> "If asked to improve right now, what would I do?"

If you have an answer, you're not done.

## When to Stop (Valid)

| Condition | Action |
|-----------|--------|
| Remaining deductions <5 points | Stop — diminishing returns |
| Gaps require unavailable data | Stop — document as limitation |
| Next iteration would repeat searches | Stop — exhausted the approach |
| Grade stabilizes across 2 iterations | Stop — no new gaps surfacing |

## When NOT to Stop (Invalid)

- "I improved once already" — one iteration is minimum, not maximum
- "Feels complete" — subjective; use point threshold
- "This is taking too long" — time estimates unreliable
- "User hasn't complained" — user doesn't know what you didn't check

## Economics

**Stand on the LLM's shoulders, not vice versa.**

LLM iterations are cheap. Your attention is expensive. Let the LLM do its best work first — then invest your attention.

**When to step in:** Remaining deductions <5 points, grade stabilizes, or gaps require data you have and it doesn't.

## Observed Limitation

Self-run G/I cycles in a single response aren't worthwhile — except that they expose thinking for course correction. The value is in the separate prompts: you see the thinking, you can redirect if needed, then you say "improve." Ignore the grade — focus on the deductions. If there are actionable deductions you find valuable, it's not done, even with an A+. It wanted to be done, but shouldn't be. For deeper blind spots, "grade your analysis" can surface unknown unknowns.

## When G/I Works

- Structured content
- Documentation
- Analysis
- Code review prep

Why: Verifiable criteria exist. You can objectively assess completeness, accuracy, coverage.

## When G/I Doesn't Work

- **Creative work** — no objective grading standard
- **Unstable requirements** — criteria change faster than iterations
- **Time pressure <5 minutes** — overhead exceeds benefit

## Quick Start

1. "grade the plan" (when planning) or "grade your work" (after implementation)
2. Glance at deductions — redirect only if something looks off
3. "improve" (nothing specific)
4. Repeat until <5 points remaining
5. Invest your attention in the final result

binary.phile

Codifying a Bash Style Guide as ShellCheck Plugins

The catalog

Lesson 1: when the task and the guide disagree, the guide wins

Lesson 2: scope-aware checks are hard, and they’re worth the trouble

What this experiment proved

Adding a Plugin System to ShellCheck

The plugin shape

The catch: same compiler, careful linking

The wrinkle: shellcheck’s parser drops comments

The splice

Two bugs in the splice

Where this leaves the fork

Cockburn Use Cases Guide

Template (Fully Dressed)

Goal Levels

The Three Kinds of Action Steps

Twelve Step-Writing Guidelines

Extension Rules

Stakeholder Interests

Preconditions and Guarantees

Quality Tests

Common Mistakes

Process

Shostack Threat Modeling Guide

1. The Goal: Focused Defense Over Whack-a-Mole

2. The Four Questions

3. Drawing Your System (Data Flow Diagrams)

4. Where to Start: Three Approaches

The Cautionary Tale of Zero-Knowledge Systems

Standard Answers to “What’s Your Threat Model?”

5. STRIDE: The Six Threat Categories

Detailed Threat Examples

Focus on Feasible Threats

6. STRIDE-per-Element

7. Attack Trees

8. Attack Libraries (CAPEC, OWASP)

When to Use Which

9. Privacy Threats (Brief Overview)

10. From Threats to Bugs

11. The Three Responses

12. Mitigations Mapped to STRIDE

Detailed Mitigation Techniques

13. ⚠️ Taking It Too Far

Over-modeling

Paralysis by Analysis

Category Obsession

Security That Creates Insecurity

Ignoring Easy Fixes

Letting Perfect Be the Enemy of Good

14. Worked Example: Login Flow

15. Quick Reference

The Four Questions

STRIDE Threats

STRIDE-per-Element Quick Check

Threat Response Checklist

DFD Validation

Validation Checklist

16. Connection to Go Development Guide

17. Glossary

18. Key Quotes

It’s Been Eight Years Since NIST Said to Stop Rotating Passwords

The old world

Why it made passwords worse

Forced rotation breeds predictable mutations

Composition rules produce a monoculture

Short minimums invite brute force

Blocking paste punishes the right behavior

No blocklist means the attacker’s job is easy

Rev 3 vs Rev 4: from recommendation to mandate

What the requirements look like as scenarios

Setting a password

Authentication

Sessions

Compromise response

Why rotation doesn’t appear here

What’s still missing from most organizations

References

Appendix: formal use cases

System Scope