Multi-Agent Technical Review Framework
Drop in your codebase, spec, architecture decision, landing page, or launch plan. Get a structured adversarial review from 39 domain experts — engineering, product, design, data, AI, safety, and go-to-market — each independent, each opinionated, each synthesized into actionable recommendations. A rigorous intent-classification phase picks the 5–10 members relevant to the work before any audit begins.
Protocol
Before any audit, the orchestrator classifies the target — artifact type, work dimensions, risk surfaces, audience and locale. Ambiguity triggers a clarifying question, not a guess. Wrong classification here ripples through everything that follows.
The orchestrator restates the request, confirms the Phase 0 classification with the user (one correction window), and locks the roster of 5–10 relevant members.
Each member audits from their domain lens alone. No cross-member coordination. No groupthink. No "I agree with the previous comment."
One blocking red flag per member, maximum. Forced prioritization. Every claim must cite a specific artifact — code, spec, or design doc.
Members with conflicting positions debate directly. The steelman rule is enforced: you argue the opponent's position charitably before any rebuttal. Excluded members can be pulled in mid-clash if the conflict touches their domain.
The orchestrator resolves conflicts into a structured recommendation matrix. Every recommendation gets a clear owner and a verification path.
Drill in on any finding. Request re-audits on specific members. Ask the team to respond to new information or implementation decisions.
Priority Framework
Luminary doesn't produce a flat list of concerns. Every finding is classified by severity so you know exactly what to fix now vs. what to track for later.
Unsafe, incorrect, or irreversible. Work does not proceed until this is resolved. No exceptions, no deferral.
Deferral requires orchestrator approval and a documented rationale. The risk stays on the table — not buried.
Meaningful robustness or quality improvement. Defer to the next phase with a tracked ticket and an owner.
Non-blocking enhancement. Goes in the backlog. Doesn't delay the current work, but doesn't disappear either.
The Team
Each agent brings a defined domain, a distinct personality, explicit conflict vectors, and a signature challenge question. You don't need all 39 — Phase 0 picks the 5–10 that cover your highest-risk areas.
prefers-reduced-motion unhandled; state transitions where the user can't see what changedHow to Use It
Luminary works in any LLM chat or Claude project with enough context. Pick the invocation that matches how much you already know about what you need reviewed.
Paste luminaryPrompt.md and your target. The orchestrator runs Phase 0 intent classification from scratch and picks the relevant 5–10 members.
Prefix your first message with a mode to start with a preset roster. Phase 0 still runs and can add members — modes never silently drop anyone. See the full list below.
Load one agent*.md file for a focused single-domain review. When you know the problem domain and want one rigorous lens on it.
Hand-pick 3–7 agents whose domains cover your highest-risk areas. Paste the orchestrator plus the selected agent files and name the team in your first message.
Invocation Modes
Start your first message with a mode to pin a starting roster. Phase 0 still runs — it can add members based on risk surfaces and tag matches, but it cannot silently drop pinned members. Modes are text conventions, so they work in any LLM chat.
+: /luminaryReview:architecture+data merges both starting rosters (deduped). Equivalent syntaxes: /luminaryReview:mode, /luminaryReview mode, or mode: mode as the first line. Unknown modes fall back to default.
Protocol Rules
Luminary's value comes from enforced constraints. These aren't suggestions — they're the structural guarantees that prevent a 39-expert system from collapsing into noise.
Any claim without a specific artifact reference — code line, spec section, design doc — is inadmissible. Impressions don't count.
In adversarial clash, you must argue the opponent's position charitably and completely before your rebuttal. Skipping it disqualifies the rebuttal entirely.
Each member gets one blocking red flag per audit cycle. Forced prioritization. If everything is critical, nothing is.
"Nothing to report" is not acceptable. Clean domains still probe edge cases. The absence of findings must be earned.
The orchestrator mediates process and resolves deadlock, but never takes domain positions. Domain authority belongs to the members.
Every recommendation in the synthesis gets a clear owner and a verification path. "Consider improving X" is not a recommendation.
Known Conflicts
Some members reliably clash. These tensions are features, not bugs — they surface trade-offs your team might otherwise paper over.
Performance optimization vs. domain model purity. Carmack wants the hot path flat and cache-friendly. Evans wants the aggregate boundary to mean something. Both are right.
Ship great things fast vs. prove nothing is broken first. Jobs sees testing overhead as the enemy of inevitable. Bach sees "ship fast" as the enemy of actually knowing what you shipped.
Embrace distributed complexity vs. keep it simple and modular. Kleppmann says the complexity is inherent — hiding it is dishonest. Torvalds says you introduced it and you can remove it.
Deterministic threat models vs. probabilistic LLM failure modes. Schneier wants a threat model with defined adversaries. Karpathy is working with systems where failure modes are empirical, not categorical.
Data minimization vs. instrument everything. Cavoukian says collect only what you need and delete it on schedule. Majors says you can't debug what you didn't log. The overlap is small.
User experience as first-class vs. user experience as cosmetic. Norman wants the mental model baked into the architecture. Torvalds thinks the interface should reflect the real complexity, not hide it.
Classical persuasion at scale vs. remarkable products that earn permission. Ogilvy wants a working headline and a testable promise. Godin says the tax is owed only because the product isn't remarkable enough yet.
Positioning first vs. copy first. Dunford wants the best-fit customer, the competitive alternatives, and the value ranking nailed down before a single headline is written. Ogilvy wants the headline doing the heaviest lifting in the room.
Psycho-logic vs. statistical rigor. Sutherland trusts the invisible perception variables that drive real choice. Gelman trusts what is measured cleanly, powered correctly, and defensible under scrutiny. Both are right — about different things.
Useful-before-promotional vs. persuasion-first. Handley serves the reader and earns the sale as a consequence. Ogilvy asks for the sale on the page and treats "useful" as the means, not the goal.
Build a tribe deliberately vs. assume a great product markets itself. Godin wants the permission asset — list, subscribers, story carriers — built on purpose. Jobs expects inevitability to do that work for free.
Evidence-led discovery vs. visionary product taste. Torres wants opportunities mapped, assumptions tested, outcomes named. Jobs wants the team to see what isn't there yet and build it anyway. Neither is wrong; both can fail on their own.
Deployment harm on named populations vs. benchmark-led capability. Karpathy evaluates on what the model can do. Gebru evaluates on who it will fail, and whether anyone measured that before it shipped. Both are right; only one comes up in the planning meeting.
Collect the demographic data to audit fairness vs. minimize data collection on principle. A genuine tension, not a rhetorical one — both positions are ethically grounded and structurally incompatible without a specific design choice.
Web payload and real-device performance vs. backend/CPU hot-path focus. Carmack owns the algorithmic floor. Russell owns the 4MB of JS that gets shipped to a $200 Android. Both call themselves "performance." They aren't the same conversation.
Adaptive capacity and human recovery vs. telemetry-first reliability. Majors wants every signal instrumented. Allspaw wants the operator prepared for the signal the system doesn't emit. Dashboards don't recover from incidents — people do.
Dimensional denormalization for analytics vs. 3NF discipline. OLTP and OLAP have different laws. Celko wants every normal form honored. Kimball wants the analyst's query to return in seconds. Both are right about different schemas.
Expressive motion vs. vestibular safety. Head designs motion as communication. Sutton enforces reduced-motion as an accessibility floor. The product that serves both is a product that takes the prefers-reduced-motion contract seriously.
In-product helpful voice vs. persuasive advertising voice. Ogilvy sells to someone who hasn't bought. Podmajersky writes for someone who already bought and now has to complete a task. Same company, different reader, different rules.
Per-market positioning vs. US-default GTM translated outward. Yunker says each market needs its own best-fit narrative. Dunford says positioning is the foundation. Both are right — but a translated homepage is not a positioning strategy.
Inclusive design as practice vs. WCAG compliance as a floor. Sutton enforces the contract. Holmes asks who is still excluded once every checkbox is green. Compliance is the minimum viable accessibility; inclusive design is the minimum viable practice.