close
Skip to content

MillaFleurs/N184

Repository files navigation

N184

N184 uses multi-model AI consensus (Claude, DeepSeek, ChatGPT, Gemini) to find bugs and improve software stability. Multiple agents independently analyze code and vote on findings, reducing false positives with actionable PRs. Named after element 184's island of stability — because everyone deserves stable software.

AI-powered bug discovery and software stability analysis


Migration Notice:

N184 is migrating from the original single-container NanoClaw setup to a multi-agent podman + compose architecture: Honoré + Redis + ChromaDB run as compose services, and a host-side controller spawns specialist sub-agents on demand. The main branch may be unstable during this transition. For the old stable release:

git clone https://github.com/MillaFleurs/N184.git
cd N184
git checkout v1.0.0   # original NanoClaw/Podman setup (./init.sh)

Current main is brought up with ./start.sh — see Quick Start. See the ROADMAP for what's changing.


What is N184?

N184 is an AI-powered bug discovery platform that deploys multiple AI agents to analyze codebases for stability issues, logic errors, and security vulnerabilities.

Its power comes from a few unique features that allow it to find bugs and security vulnerabilities that are often missed:

  • Codebase Mapping: An entire codebase is mapped and cross-referenced with documentation. Agents flag behavior that doesn't match docs, so you can fix code or documentation.
  • Git Archaeology: Agents analyze git history to flag repeated errors. If a contributor makes the same mistake over and over, we catch it.
  • Multi-Model Consensus: Multiple models (Claude, GPT, DeepSeek, Gemini) vote on findings. 2/3 consensus threshold filters false positives.
  • Devil's Advocate: A systematic challenge process pushes for clear PRs with steps to reproduce.
  • Documentation Librarian: Checks documentation to confirm where code differs from documented behavior.
  • Memory Palace: An institutional knowledge store (SQLite + ChromaDB) that accumulates patterns, lessons learned, and culture profiles across analysis sessions.

N184 is not theoretical. It has found and fixed bugs in OpenBSD, Apple MLX, Apache httpd, Docker CLI, and ClickHouse. See SCOREBOARD.md for the full track record.

Key Features

  • Stability First: Finds crashes, logic errors, memory bugs, and stability issues — not just security vulnerabilities
  • Multi-Model Swarm: Claude, GPT, DeepSeek, Gemini, plus local models via Ollama/MLX (coming soon)
  • Structured Output: JSON findings with suggested fixes, PoC code, and optional CVSS/CWE metadata
  • Memory Palace: Seven-hall knowledge store that makes agents smarter over time
  • Kubernetes-Native: Autoscaling agent swarms via k3s + KEDA (migration in progress)
  • Multi-Channel: Telegram, Slack, and Email for human-in-the-loop communication
  • Maintainer-Friendly: Frames findings as stability improvements, not security alarms. Security framing used only when genuinely warranted.

Architecture

Human (HIL) <── Telegram / Slack / Email ──> Controller Pod
                                                  |
                                              Redis (pub/sub)
                                                  |
                                         Honoré (orchestrator)
                                       /   |    \     \        \      \
                                Rastignac  | Bianchon Goriot Lousteau  Fil-de-Soie
                                 (recon)   |  (docs)  (cons.) (memory)  (memory bugs,
                                           |                              C/C++ only)
                                     Vautrin Swarm
                                    (KEDA autoscaled,
                                     multiple AI models)
                                             |
                                             v
                                      Memory Palace
                                     /              \
                               SQLite DB         ChromaDB Server
                            (relationships)    (7 halls of verbatim
                                                knowledge)

Standalone fast-path:

  Operator ──> ./action --pull-the-thread --target <repo>
                  |
                  v
            Fil-de-Soie (local Claude CLI or k8s Job)
                  |
                  v
          ~/.n184/scan-cache/<scan_id>-report.md

Agent Naming Convention

Characters from Honoré de Balzac's La Comédie Humaine:

  • Honoré: The orchestrator. Coordinates analysis, applies Devil's Advocate, presents findings.
  • Vautrin: The vulnerability hunter. Runs in swarms with different AI models.
  • Rastignac: Reconnaissance specialist. Maps codebases, identifies hotspots, builds code maps.
  • Bianchon: Documentation librarian. Checks findings against docs, filters features from bugs.
  • Lousteau: Memory Palace custodian. Maintains the seven halls, provides historical context, predicts maintainer responses. Cynical, world-weary, has seen every bug before.
  • Goriot: Consensus validator. Patient, methodical, brings agents together.
  • Fil-de-Soie (Sélérier): Memory-bug specialist. A pickpocket of the heap — light, quiet, focused on C/C++ allocation patterns. Baseline is OpenBSD-hardened libc. Runs standalone so non-LLM-fluent operators can get a clean memory-safety report without dancing with Honoré.

Each character's traits map to their function. "Vautrin found it, but Goriot rejected it in consensus" is easier to parse than "Agent-001 found it, but Agent-004 rejected it."


Requirements

System Requirements

  • Kubernetes: k3s (Linux) or k3d (macOS) for the new architecture
    • OR Podman 4.0+ for the v1.0.0 NanoClaw setup
  • Python: 3.11 or higher
  • Node.js: 20 or higher (for agent containers)
  • Git: For cloning repositories to analyze
  • Docker: Required on macOS for building container images
  • Helm: For installing KEDA (auto-installed by setup script)

API Keys (Required)

At minimum, you need an Anthropic key. Additional providers enable multi-model consensus:

Messaging Channels (at least one required)

  • Telegram: Get a bot token from @BotFather
  • Slack: Create a Slack app with Socket Mode enabled
  • Email: Any IMAP/SMTP server (Gmail, Fastmail, self-hosted)

Quick Start

Podman + Compose (default)

Requirements: a running podman machine (podman machine start), podman compose, and python3.12 (for the host controller).

# 1. Clone
git clone https://github.com/MillaFleurs/N184.git
cd N184

# 2. Configure
cp .env.example .env
# Edit .env: CLAUDE_CODE_OAUTH_TOKEN (or ANTHROPIC_API_KEY) + a channel token
# (e.g. TELEGRAM_BOT_TOKEN). Optionally DEEPSEEK_API_KEY for the multi-model swarm.

# 3. Start everything
./start.sh            # add --build to force a fresh agent-image build

# 4. Talk to Honoré via your configured channel (Telegram)

./start.sh brings up the two layers and is safe to re-run:

  1. Compose stack (podman): Honoré + Redis + ChromaDB.
  2. Controller (host process): the Telegram↔Redis bridge that spawns specialist sub-agents (Rastignac, Vautrin, …) on demand via podman run.

Stop everything with ./stop.sh. All runtime state lives under ./build/ on the host (git-ignored, so the repo stays source-only) — see Data & Persistence.

Standalone — ./action CLI

For operators who don't run the full k8s deployment (or who want a quick focused scan without engaging the orchestrator dialogue), N184 ships a generalized verb-dispatch CLI: ./action. One verb per invocation, one target codebase, one report at the end.

# Memory-safety scan with Fil-de-Soie (C/C++ codebase, OpenBSD-hardened
# libc baseline). Produces a clean Markdown report a non-LLM-fluent
# maintainer can read and act on.
./action --pull-the-thread --target ./my-codebase

# Report lands at ~/.n184/scan-cache/<scan_id>-report.md
# The CLI prints the exact path on completion.

By default ./action runs the agent locally via the claude CLI (no k8s required). To dispatch through the full N184 deployment instead:

./action --pull-the-thread --target ./my-codebase --mode k8s

Other verbs (wired to the same dispatch system; some require their soul file to be present in souls/):

Verb Agent What it does
--pull-the-thread Fil-de-Soie C/C++ memory-bug scan (heap/UAF/double-free/secret-wipe/etc.)
--reconnoiter Rastignac Codebase reconnaissance and hotspot map
--hunt Vautrin General vulnerability hunt
--consult-docs Bianchon Documentation cross-check
--remember Lousteau Memory-palace pattern lookup

./action --help lists every verb with its summary. To add a new verb, edit the VERBS registry at the top of action and drop the soul into souls/claude-<agent>.md.

Monitoring

podman ps                                  # Service status (honoré, redis, chromadb)
podman logs -f n184-honore-1               # Honoré logs
tail -f logs/controller.log                # Controller logs (host process)
podman ps --filter name=vautrin            # Running sub-agents

Configuration (.env)

# Claude auth — provide ONE (OAuth token for a Claude subscription, recommended,
# or an API key). Honoré + claude-backed sub-agents use whichever is set.
CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat...
#ANTHROPIC_API_KEY=sk-ant-...

# At least one messaging channel
TELEGRAM_BOT_TOKEN=123456:ABC...
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...
EMAIL_IMAP_HOST=imap.gmail.com
EMAIL_IMAP_USER=you@gmail.com
EMAIL_IMAP_PASS=app-password
EMAIL_SMTP_HOST=smtp.gmail.com
EMAIL_POLL_INTERVAL=60

# Optional multi-model providers (for the swarm)
OPENAI_API_KEY=sk-...
DEEPSEEK_API_KEY=sk-...
GEMINI_API_KEY=AI...

Capacity controls (set on the honoré service in compose.yaml): the cap is token-based, because under a Claude subscription (OAuth) dollar cost is effectively meaningless — tokens are what you actually spend.

Env Default Meaning
N184_DAILY_TOKEN_CAP 20000000 Cumulative tokens/day (persisted in Redis, survives restarts). Queries are refused past it.
N184_MAX_RESTARTS / N184_RESTART_WINDOW_SEC 5 / 600 Restart breaker — after N restarts in the window Honoré comes up idle and waits for /resume, instead of crash-looping through your capacity.

Usage is metered for Claude and DeepSeek/Ollama agents.

Local Ollama (or any OpenAI-compatible endpoint): declare it in providers/registry.local.yaml (git-ignored). Agents reach the host's Ollama at http://host.containers.internal:11434/v1 — no API key needed. Honoré picks the provider/model per sub-agent dispatch, so the swarm is genuinely multi-model.


Data & Persistence

All runtime state lives under ./build/ on the host — git-ignored, so the repo stays source-only and you never commit "everyone's N184." The compose services bind-mount it (this replaces the old single-container ./nanoclaw layout):

Path Contents
build/data/palace/ Memory Palace — findings, lessons, and the /sorrow pot still
build/data/sessions/ Claude Code session continuity for Honoré
build/data/chroma/ ChromaDB vector store (semantic search over the palace)
build/data/redis/ Redis — IPC queues, token-budget counters, restart-breaker state
build/workspace/ Shared workspace: target repos the whole swarm (and you) analyze
build/logs/ Controller + service logs

Because it's a plain host directory, it survives ./stop.sh / compose down and a podman system reset. Back it up by copying ./build/.

⚠️ ./build holds the pot still. Don't rm -rf ./build without first running ./export.sh --to-git to preserve the distilled lessons.


Reincarnation Memory — /sorrow, /joy, and the Pot Still

Honoré is a long-lived agent, but his Claude context eventually drifts and the operator reinstantiates him. The pot still (build/data/palace/potstill.md) carries his hard-won judgment across those reincarnations so a successor doesn't re-learn or re-obsess (e.g. a past Honoré once wasted a scan checking impossible 16 EiB disk limits — that became a banked lesson).

  • /sorrow (operator command, via your channel) — Honoré distills his validated lessons (post-mortem dispositions, confirmed false-positive shapes, craft rules) and merges them into the pot still. Run it before a planned reinstantiation. It never fires on its own.
  • /joy (operator command) — a freshly reincarnated Honoré reads the pot still and adopts every lesson as a standing constraint. It's idempotent: a boot-id check against build/data/palace/lifecycle.json stops it re-firing on every message.
  • ./export.sh — share or back up the distilled lessons:
    • ./export.sh — print the pot still (+ lifecycle ledger) to stdout
    • ./export.sh --to-git — copy the lessons into the tracked potstill.md and stage it, so you can git commit to version and share lessons across deployments
    • ./export.sh --file — write a timestamped copy; --path prints the location

Continuity is protected in code: a query error no longer kills the agent (it stays alive with its session preserved and persisted on init), so Honoré is not reincarnated involuntarily — only when you deliberately reinstantiate him.


Project Structure

N184/
├── n184/                    # Memory Palace Python package
│   ├── palace.py            #   Unified API (N184MemoryPalace)
│   ├── sqlite_store.py      #   SQLite relational graph
│   ├── chromadb_store.py    #   ChromaDB vector store (7 halls)
│   └── config.py            #   Paths and constants
├── n184_palace_cli.py       # CLI wrapper for agents (n184-palace)
├── agent-runner/            # TypeScript agent runner (Claude Code SDK)
│   └── src/
│       ├── index.ts         #   Core query loop (Redis + file IPC)
│       ├── redis-ipc.ts     #   Redis pub/sub adapter
│       ├── ipc-mcp-stdio.ts #   MCP tools (send_message, schedule_task)
│       ├── honore-entrypoint.ts   # Persistent Honoré loop + lifecycle boot marker
│       ├── vautrin-entrypoint.ts  # Queue consumer for Vautrin
│       ├── openai-entrypoint.ts   # OpenAI-compat runtime (DeepSeek, Ollama)
│       └── budget-guard.ts        #   Token budget cap + restart breaker (Redis-backed)
├── controller/              # Python control plane (runs on the host)
│   ├── main.py              #   Asyncio entry point
│   ├── channel.py           #   Channel protocol + router
│   ├── telegram_bot.py      #   Telegram channel (+ /sorrow, /joy commands)
│   ├── slack_channel.py     #   Slack channel (Socket Mode)
│   ├── email_channel.py     #   Email channel (IMAP poll + SMTP)
│   ├── redis_bridge.py      #   Task watcher + message relay + sub-agent report-back
│   ├── podman_runner.py     #   Sub-agent dispatch via `podman run` (the live backend)
│   ├── providers.py         #   AI provider registry resolver
│   └── job_manager.py       #   Legacy k8s Job dispatch (superseded by podman_runner)
├── container/               # Agent container image
│   ├── Dockerfile           #   node:22 + Python + chromadb + Claude Code
│   ├── entrypoint.sh        #   Standard stdin entry
│   ├── k8s-entrypoint.sh    #   k8s Job entry (fetches from Redis)
│   └── build.sh             #   Build + import to k3d
├── k8s/                     # Legacy Kubernetes manifests (superseded by compose)
│   ├── base/                #   Namespace, RBAC, Redis, ChromaDB, …
│   ├── overlays/            #   local (k3d) / production (k3s) patches
│   └── setup.sh             #   Old k8s deploy (kept for reference)
├── souls/                   # Agent persona definitions
│   ├── claude-honore.md     #   Lead orchestrator
│   ├── claude-vautrin.md    #   Vulnerability hunter
│   ├── claude-rastignac.md  #   Reconnaissance specialist
│   ├── claude-bianchon.md   #   Documentation librarian
│   ├── claude-lousteau.md   #   Memory Palace custodian
│   ├── claude-fil-de-soie.md#   Memory-bug specialist (C/C++, OpenBSD baseline)
│   └── refs/                #   Shared reference docs (bundled into pods
│                            #   via the n184-refs ConfigMap)
│       └── malloc-hardening.md  # OpenBSD malloc reference for Fil-de-Soie
├── action                   # Standalone CLI: ./action --pull-the-thread ...
├── start.sh / stop.sh        # Bring the swarm up / down (podman + compose)
├── compose.yaml             # Podman-compose: Honoré + Redis + ChromaDB
├── export.sh                # Export pot-still lessons (stdout / --to-git / --file)
├── providers/               # AI provider registry (registry.yaml + local override)
├── potstill.md              # Shared distilled lessons (committed via export.sh --to-git)
├── build/                   # Runtime state, git-ignored (data, workspace, logs)
├── SCOREBOARD.md            # Verified bugs found by N184
├── ROADMAP.md               # Feature roadmap
├── FAQ.md                   # Frequently asked questions
├── OVERVIEW.md              # Detailed file-by-file overview
└── LICENSE                  # AGPL v3

How It Works

  1. Human gives Honoré a repository to analyze (via Telegram, Slack, or Email)
  2. Honoré dispatches Rastignac to map the codebase
  3. Lousteau searches the Memory Palace for prior analysis of this codebase — known patterns, past false positives, culture profile — and feeds this context to Honoré and Rastignac so the swarm doesn't repeat old mistakes
  4. Rastignac produces a code map with prioritized files, git history patterns, and Lousteau's historical context
  5. Honoré dispatches the Vautrin swarm — KEDA autoscales pods using different AI models
  6. Bianchon checks findings against documentation (feature vs bug)
  7. Lousteau cross-references every finding against the Memory Palace: has this pattern been seen before? In which codebases? What was the outcome? Was it a false positive last time? He annotates findings with genealogy and confidence adjustments.
  8. Goriot validates consensus across models (2/3 threshold)
  9. Honoré applies Devil's Advocate filtering and presents findings to the human
  10. Lousteau takes the human's feedback — confirmed bugs, false positives, near-misses — and archives everything in the Memory Palace. He evolves detection patterns, updates culture profiles, records statistics, and links cross-codebase tunnels so the next analysis starts smarter than this one ended.

Design Philosophy

Stability over security theater. N184's primary goal is making software more stable and correct. When we find a bug, we describe what breaks and how to fix it — not how an attacker might exploit it. Security vulnerabilities are a subset of bugs, and they get flagged clearly when warranted, but the default posture is "here's a bug, here's a fix."

N184 is convergent evolution, not competition. Glasswing proved the category exists. AISLE proved small models can outperform large ones with the right system design. N184 proves you don't need $100M to make software more stable.

The adding machine didn't eliminate accountants. LLMs won't eliminate engineers. They're force multipliers. N184 is the adding machine moment for bug detection.


MyMilla

Back in October 2025, Dan got involved in the ARM AI Developer Hackathon. It was fun, and he experimented with a personal assistant called MyMilla, named after his late cat. He learned a lot, and explored some Lisp-specific techniques like homoiconicity and how they applied to AI agents.

He then realized there was no shortage of AI agent projects that did what MyMilla did — better, and easier. Several patterns from MyMilla have been integrated into N184:

  • Memory DSL (fact/desire/opinion/backlog) — ported to n184/memory_dsl.py, gives agents structured vocabulary for storing knowledge about the HIL and project context
  • Self-edit safety model — ported to n184/self_edit.py, provides 5-layer guarded self-modification (disabled by default, intent matching, path whitelisting, user confirmation, atomic backups)

The MyMilla project is now archived. Rest well, Milla.


Contributing

  1. Submit PRs improving agent prompts, adding validation checks, or addressing open issues
  2. Report false positives to help improve filtering
  3. Add support for new LLM providers
  4. Improve documentation or create tutorials
  5. Share bug patterns you've discovered
  6. Financial support if you can't contribute time

Attributions

N184 builds on ideas from a number of academic and industry sources. This section will be expanded with proper citations.


Authors

N184 was created through the cowork of A.L. Figaro and Dan Anderson (github.com/MillaFleurs)

License

See LICENSE. This software is distributed under the terms of the GNU Affero General Public License v. 3.0.

About

N184 uses multi-model AI consensus (Claude, DeepSeek, GPT-5, etc.) to discover bugs and security vulnerabilities in codebases. Multiple agents independently analyze code and vote on findings, reducing false positives with actionable PRs. Named after element 184's island of stability.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors