I want to walk you through the hardest kind of artifact to produce: a root-cause analysis that names your own prior mistakes. Not a post-mortem of someone else's code. Not a critique of a library you inherited. A diagnosis of a system you designed and built, explaining in specific terms why the thing you care about most is not working, with no cushion language and no blame-shifting. This is the kind of document that only exists if the Phase 6 self-retrospective discipline is actually functioning — if the agent that did the work, or the operator who commissioned it, is willing to audit the output critically instead of declaring victory.
The file I am going to excerpt is plan/guardrail-effectiveness.md. It lives in the Precheck repo alongside twenty other plan files. It is structured exactly like every other plan file — problem statement, root causes, proposed fixes, implementation order — but it is the only one whose subject is "the most important part of this product is not working and here is why." And the why has five numbered causes, each of which is a decision I made earlier that turned out to be wrong.
The problem, in one paragraph
The problem statement from the top of that plan is worth reading in full, because it sets the tone. It is not hedged. It does not say "guardrails have opportunities for improvement." It says the core feature is dead code.
That opening is the hardest part of the document to write. Not technically — the sentence structure is simple — but psychologically. The writer has to accept that the product's central claim is not holding up. Every sentence in that paragraph is an admission. "The thing that differentiates us is broken." "Every feature passes on the first try, which in context means the model is rubber-stamping." "The impact bar in the UI shows zeros." "The entire right side of the decision tree is dead code." Each of those lines is a small concession. Taken together, they are a concession that the project's current state is not the project's claimed state.
If you cannot write that paragraph about your own system, the Phase 6 discipline is not yet functioning. The discipline is not about producing documents. It is about the willingness to audit honestly. The documents are what that willingness produces.
The five causes
The plan identifies five compounding reasons the guardrails never triggered. Each one is a specific decision with a specific file location. No hand-waving. Here they are.
1. Guardrail definitions were gutted before reaching the model
The first cause is the most basic. The guardrails had names, descriptions, and categories. When the prompt was being assembled, only the names made it into the prompt text. The descriptions — the part that explained what the guardrail actually meant and what a violation looked like — were dropped on the way to the model.
Stop and think about this one for a moment. The bug was not complex. It was one line of code that called ToString() on a guardrail when it should have called a full-formatting helper that included the description and category. The model was being handed a list of slogans and asked to judge code against them. Of course the model couldn't evaluate the constraints — it had no information about what the constraints meant. I wrote that ToString() call months ago and never looked at it again because the tests passed and the feature "worked." The feature did not work. The feature produced output that looked like it worked, because the model played along with vague instructions the only way it could — by accepting everything.
2. The assess prompt was biased toward acceptance
The second cause is prompt language. The assess prompt — the instruction I wrote telling the model how to evaluate a feature against its guardrails — contained a sentence that framed acceptance as the default.
That sentence was not written maliciously. It was written because I was trying to prevent the model from being overly strict in the early prototype, when I did not yet trust the guardrail quality. I put a thumb on the scale toward acceptance to keep the early runs from being useless. Then I forgot the thumb was there. Then the product grew around that biased default. By the time the guardrail-effectiveness diagnosis was written, the thumb had been on the scale for months.
3. No per-guardrail evaluation in the output schema
The third cause is the output schema. The assess prompt asked the model for a single holistic verdict (accept/refine/restructure) with a single rationale string. The model was never required to evaluate each guardrail individually before arriving at a verdict.
The consequence is that the model could "consider" the guardrails in aggregate — silently, in whatever reasoning it did internally — and wave them through with a single rationale. There was no structured output showing which guardrails had been checked and what the result was for each one. The UI's guardrail impact bar, which was supposed to show which guardrails each feature struggled with, had to do text-matching on free-text rationale fields because there was no structured data to read.
This is a chain-of-thought failure. The model was not being forced to enumerate each guardrail, find evidence for it, and produce a verdict per constraint. Without that forcing function, the aggregate "accept" path was always the path of least resistance. The fix is to require the model to produce a guardrail_results[] array as a prerequisite to the verdict — each entry has a guardrail name, a pass/fail/partial status, and an evidence string. The model cannot generate a verdict until it has generated per-guardrail results. Chain of thought as schema.
4. Config defaults created uniform output
The fourth cause is configuration. The planner has a MaxStoriesPerFeature setting that defaults to 5. The refine prompt told the model to "Produce 1–5 stories." The epic prompt told the model that features should be "refinable into 1–5 stories."
Here is the part I had not internalized when I wrote those defaults: LLMs anchor to the upper bound of stated ranges. If you tell a model "produce 1–5 stories," it produces 5. Not because the feature needs five stories. Because 5 is the number that was said, and saying it primed the output. The range I thought was providing flexibility was actually providing a target. Combined with the acceptance-biased prompt, the result was the perfectly uniform 3×5×5 grid — three epics, five features per epic, five stories per feature — appearing in run after run, regardless of the input work request.
A user looking at that output might be impressed. Every plan is neatly structured. Every feature has exactly five stories. The output looks rigorous. The output is actually mechanical. It is uniform because the model is filling a template, not because the model is exercising judgment about scope.
5. The assessment budget was too thin for retry cycles
The fifth cause is budget. The planner had a MaxAssessmentsPerRun limit of 10. With 3 epics × 5 features = 15 features per run, 10 assessments meant most features could only be assessed once. The retry-and-restructure path — which requires at least two assessments per feature to exercise — was structurally impossible for most of the run.
Even if every other bug had been fixed, this one alone would have prevented the assess-and-retry loop from happening at scale. The budget was a silent ceiling that guaranteed the interesting paths could not be exercised. I set it to 10 because that looked like a reasonable default during early development when runs were smaller. I never went back and checked it once the runs got bigger. It sat there, quietly killing the feature, for weeks.
What ties these together
Each of the five causes is individually small. A ToString() call. A sentence in a prompt. A missing output field. A default value. A budget number. None of them are architecturally hard to fix. The fix plan estimates each one as "small effort" with "no risk" or "low risk." The actual shipping PR was not difficult.
What makes this document interesting is not the difficulty of the fixes. It is the fact that the five causes compounded. Any one of them alone would have produced a degraded system that was still obviously broken in ways users would complain about. All five together produced a system that looked like it was working. The output was structured. The decision tree rendered. The impact bar had no errors. Every run completed successfully. The failure mode was that everything passed, which is the worst failure mode a constraint system can have, because it is invisible.
Small compounding errors in a constraint system do not degrade it loudly. They degrade it silently. The output still looks right. The failure mode is that nothing ever fails, which is indistinguishable from success until you ask: is the constraint doing any work? If the answer is no, you have a decorative feature, and decorative features rot projects from the inside.
The test I should have run earlier
There is a sixth cause that is not in the document but is implicit in why the document had to be written in the first place. I was not running the one test that would have caught this months earlier. That test is: construct a feature that should be bounced by a guardrail — a feature that clearly violates a constraint in the guardrail set — and run it through the planner. If the planner accepts it, the system is broken.
I did not run that test. I ran tests that checked the output structure, tests that checked that guardrails were being attached to runs, tests that checked the UI rendered the impact bar, tests that checked that assessment verdicts parsed correctly. None of those tests answered the question "does the guardrail evaluation actually reject things that should be rejected." I had tested the machinery around the constraint, not the constraint itself.
The document does not explicitly list this as a sixth cause. It lists it implicitly, by the fact that the document exists at all. If I had been running the "should fail" test, the guardrail-effectiveness problem would have been caught the first week, not the third. The document is the corrective for the missing test.
Why this is the most portable post in the series
Every person who has built an AI system with constraints, rules, guardrails, policies, filters, or evaluators is going to recognize at least three of those five causes in their own system. I am confident of this because I have talked to enough people who were surprised to discover their guardrail system was decorative. The failure modes are universal.
The specific ones to check, if you have a similar system:
- Are the full constraint definitions reaching the model, or is something getting stringified to just the name?
- Does your prompt language frame acceptance as the default?
- Are you requiring per-constraint evaluation in structured output, or are you asking for a holistic verdict with free-text rationale?
- Do your config defaults anchor the model's output to a specific shape, and is that shape showing up in every run?
- Does your assessment budget allow for the retry path, or is it silently capping the number of evaluation cycles?
- And the sixth one: do you have a test that constructs a deliberately-violating input and asserts that the system rejects it?
If the answer to any of those is uncertain, the answer is probably no, and your guardrail system is probably decorative. I mean that gently. It is the default state. Every constraint system starts decorative and only becomes load-bearing if someone does the diagnostic work to make it load-bearing. The diagnostic work is boring. The fixes are small. The hard part is the willingness to look.
The Phase 6 self-retrospective is what produced this document. Without the discipline of auditing the work instead of declaring victory, the five-cause analysis would never have been written. The fixes exist because the audit existed. The audit existed because the workflow treats critical self-review as a deliverable, not a nice-to-have.
Back to the arc
The parent article frames Phase 6 around agents doing their own retrospectives at the end of each session. This post is the heaviest receipt in the series for that discipline. The guardrail-effectiveness document is not something a human wrote in a quiet moment of reflection. It is the output of a session where the explicit goal was "audit the feature that matters most, list every reason it is not working, and produce a fix plan." The session happened because the discipline required it. The document exists because the session happened. The fixes landed because the document existed.
The next post turns from product audits to infrastructure ones — a migration retrospective that is not about porting code, but about moving trust models. Same muscle, different target.