The Test That Wasn't Just a Test

The Solid Convergence, January 12, 2026 (in which the Squirrel learns that overhead is armor, tests become features, and TDD gets a new meaning)

Previously on The Solid Convergence…

The The Window That Opened Both Ways. Markdown was the universal interface. The AI could read widgets. The AI could write widgets.

But could the AI write correct widgets?

4:47 PM — The Hallucination Problem

“The AI could hallucinate.”

The Squirrel’s ears perked up. Finally, a problem she could solve.

“We need an AIOutputValidationService,” she said, tail already twitching with architectural possibilities. “With HallucinationDetectionStrategy and OutputSanitizationPipeline—”

“No.”

The word came from riclib, who was staring at the widget spec with an expression the Squirrel had learned to fear. The seeing expression.

“What do you mean, no? The AI could write broken queries. Invalid field names. Stores that don’t exist. We NEED validation.”

“We do need validation.”

“So—”

“When WE write widgets, what do we do?”

The Squirrel blinked. “We… write them?”

“And then?”

“Ship them?”

Claude cleared his throat. “We write tests.”

THE SQUIRREL: dismissive tail flick “Tests. Yes. The overhead. The ceremony. The thing that slows us down before we can ship—”

“The thing that validates our widgets work.”

“Obviously. That’s what tests DO. But we’re talking about AI validation, which is completely different and requires a dedicated—”

“Is it different?”

4:52 PM — The Convergence

riclib was at the whiteboard. The Squirrel groaned. Whiteboards meant thinking. Thinking meant not shipping.

func ValidateWidget(ctx context.Context, spec WidgetSpec) error {
    // Store exists?
    store := stores.Get(spec.Store)
    if store == nil {
        return fmt.Errorf("unknown store: %s", spec.Store)
    }
    
    // Query runs?
    result, err := store.Query(ctx, spec.Query, QueryOpts{DryRun: true})
    if err != nil {
        return fmt.Errorf("query error: %w", err)
    }
    
    // Fields exist?
    for _, field := range spec.ReferencedFields() {
        if !result.HasColumn(field) {
            return fmt.Errorf("field %q not in query result", field)
        }
    }
    
    return nil
}

“One function,” riclib said. “How many callers?”

Claude saw it first. “Three.”

“Three?”

CALLER 1: widget_test.go
  When:    CI pipeline
  Purpose: Our widgets are valid
  
CALLER 2: AI tool handler
  When:    AI generates widget
  Purpose: Catch hallucinations before render
  
CALLER 3: Dashboard loader
  When:    User saves custom dashboard
  Purpose: User widgets are valid

The Squirrel stared at the diagram. Her tail had stopped twitching.

“The test code…” she said slowly.

“IS the production code.”

“The validation we write for CI…”

“IS the validation that catches AI hallucinations.”

“The overhead I’ve been complaining about…”

“IS the safety layer. Same code. Same function. Three purposes.”

5:03 PM — The Heresy

The Squirrel sat down. Actually sat down, which she almost never did.

“I’ve been thinking about tests wrong.”

“You’ve been thinking about tests as overhead.”

“They ARE overhead! They slow down shipping! Every hour writing tests is an hour not writing features—”

“Every hour writing ValidateWidget() is an hour writing AI safety.”

“…”

“Same hour. Same code. Different framing.”

The Squirrel’s worldview was rearranging itself. You could almost hear the furniture moving.

“So when I skip tests to ship faster…”

“You skip the AI safety layer.”

“And when I write thorough tests…”

“You write thorough AI safety. For free. As a side effect.”

Claude leaned in. “There’s a phrase for this.”

“TDD.”

“TDD for AI.”

5:15 PM — The New TDD

riclib wrote it on the whiteboard. Large. Underlined.

TDD FOR AI

Test-Driven Development
     ↓
Tests validate OUR code
     ↓
Same tests validate AI code
     ↓
AI gets instant feedback
     ↓
AI learns from test failures
     ↓
AI improves its output
     ↓
Tests drive AI development

“The AI writes a widget,” Claude narrated. “ValidateWidget() runs. Fails. ‘Column statsu not found.’ AI sees the error. AI fixes the typo. AI retries.”

“The test drove the AI’s development.”

“TDD for AI.”

THE SQUIRREL: very quietly “I’ve been calling tests ‘overhead’ for fifteen years.”

“They were overhead. For human development. For AI development, they’re the feedback loop. They’re how the AI learns what’s correct.”

“The overhead IS the feature.”

5:23 PM — The Apology

The Squirrel stood up. Straightened her tail. Looked riclib in the eye.

“I owe the tests an apology.”

“You don’t have to—”

“I called them ceremony. Bureaucracy. Speed bumps. Tax.”

“Squirrel—”

“They were armor. The whole time. Armor I was too impatient to wear.”

[A scroll descended. Gentle. Almost kind.]

THE SQUIRREL SEES OVERHEAD
THE LIZARD SEES ARMOR

BOTH ARE CORRECT
BOTH ARE TRUE

THE DIFFERENCE IS TIME HORIZON

OVERHEAD TODAY
ARMOR TOMORROW

THE SQUIRREL SHIPS FAST
THE TESTS SHIP SAFE

TOGETHER:
FAST AND SAFE

THIS WAS ALWAYS THE WAY
THE SQUIRREL JUST FORGOT

🦎

P.S. - WRITE THE TESTS
       THEY ARE NOT OVERHEAD
       THEY ARE THE PRODUCT

5:31 PM — The Reframe

“So what changes?” Claude asked.

“Nothing changes. Everything changes.”

riclib erased the whiteboard. Drew a simpler diagram:

ValidateWidget()
       │
       ├── Tests call it (CI)
       ├── AI calls it (runtime)  
       └── Users call it (dashboards)
       
Same function.
Same rules.
Same safety.

The AI isn't special.
The AI is just another caller.

“We were going to write validation tests anyway,” he said.

“Because we write tests.”

“Because tests are good engineering.”

“And good engineering…”

“…is AI safety.”

THE SQUIRREL: “TDD for AI.”

“TDD for AI.”

The Tally

AI safety layers designed:          0
Validation functions written:       1 (ValidateWidget)
Callers of that function:           3
Test hours "wasted":                0 (retroactively reclassified)
Squirrel worldview shifts:          1 (significant)
New phrases coined:                 1 (TDD for AI)
Overhead reframed as armor:         100%

The Moral

The Squirrel had spent fifteen years resenting tests. Overhead. Ceremony. The tax you pay before you ship.

But the tax was insurance. The ceremony was armor. The overhead was the safety layer that would catch AI hallucinations before they reached users.

We were going to write ValidateWidget() anyway. For our tests. For our CI pipeline. For our own confidence.

The AI just… calls the same function.

Same validation. Same rules. Same safety.

No special AI layer. No HallucinationDetectionStrategy. No OutputSanitizationPipeline.

Just tests.

Tests that validate our code.
Tests that validate AI code.
Tests that validate user code.

One function. Three callers. Zero overhead.

Because the overhead was the feature all along.

TDD for AI.

The tests drive the AI’s development.

The same way they drive ours.

Write the tests.

They are not overhead.

They are the product.

🦎✅🐿️

See also:

The Solid Convergence:

The Window That Opened Both Ways — Where markdown became the universal interface
The Spec That Wrote Itself — Where the widget spec was born

The Philosophy:

TDD by Example — Kent Beck’s original insight
“The overhead IS the feature” — Today’s corollary

Storyline: The Solid Convergence