esc
Anthology / Yagnipedia / Heisenbug

Heisenbug

The Bug That Knows You're Looking
Phenomenon · First observed 1985 (though it had been observed earlier, but vanished before anyone could document it) · Severity: Inversely proportional to the number of people watching

Heisenbug (noun) — a software bug that disappears or alters its behaviour when one attempts to observe, debug, or study it. Named after Werner Heisenberg’s uncertainty principle, which states that you cannot simultaneously know a particle’s position and momentum. The Heisenbug extends this to software: you cannot simultaneously know what the bug is doing and have the bug doing it.

The Heisenbug is not shy. Shyness implies self-awareness and a preference for solitude. The Heisenbug is not aware of you. It simply ceases to exist under the specific conditions you create by trying to observe it, and resumes existing under the conditions you create by giving up. The bug is not hiding. You are, by looking, creating a universe in which it was never there.

“I do not chase Heisenbugs. I sit very still. I do nothing. Eventually the bug forgets I am here, and resumes. Patience is a debugger that changes no memory layout.”
The Lizard, who has never attached a debugger to anything

The Observation Problem

The fundamental mechanism of a Heisenbug is that the act of observation changes the system being observed. In quantum physics, this involves photons and wave function collapse. In software, the mechanisms are more mundane and more infuriating:

Timing changes. Adding a log statement takes time. Printing to stdout takes time. That time — sometimes microseconds, sometimes milliseconds — is enough to change the interleaving of concurrent threads. The race condition that crashes the program every third run now loses the race, because fprintf gave thread B just enough time to finish first. The developer sees clean logs. The developer removes the logs. Thread B loses again. The crash returns.

Memory layout. Attaching a debugger changes the process’s memory layout. Debug symbols add padding. Breakpoints insert trap instructions. The buffer overflow that was stomping on a function pointer is now stomping on padding bytes that nobody reads. The program works perfectly under the debugger. The developer detaches the debugger. The function pointer gets stomped again. The program crashes in a function it has never called.

Optimiser behaviour. Debug builds disable optimisation. The variable that the optimiser had kept in a register is now on the stack. The use-after-free that was reading a stale register value is now reading a valid stack slot. The release build crashes. The debug build does not. The developer adds -O0 to the release build and ships it, which is the software equivalent of leaving the lights on so the monsters stay under the bed.

The Heisenbug Hunt

The standard Heisenbug debugging session follows a ritual as predictable as it is futile:

  1. The Report. “It crashes every third request.” The developer cannot reproduce it.
  2. The Logging. log.Printf("HERE 1"), log.Printf("HERE 2"), log.Printf("HERE 3"). The crash stops.
  3. The Removal. The developer removes the logging. The crash returns.
  4. The Targeted Logging. The developer adds logging only near the suspected area. The crash stops.
  5. The Minimal Logging. The developer adds a single log statement. The crash stops.
  6. The Empty Logging. The developer adds log.Printf(""). The crash stops.
  7. The Debugger. The developer attaches gdb. The crash stops. The developer detaches gdb. The crash returns. The developer reattaches gdb. The crash stops. The developer considers leaving gdb attached in production.
  8. Lunch. The developer goes to lunch. The crash happens forty-seven times while they are eating a sandwich. The monitoring dashboard is on fire. The developer returns, opens the code, and the crash stops.

“I added a print statement and the bug went away. I removed the print statement and the bug came back. I added a comment — just a comment, no code — and the bug went away again. I am now mass-commenting my codebase for structural integrity.”
The Caffeinated Squirrel, who has 400 log statements in a 200-line function

Taxonomy of Observer-Dependent Bugs

The Heisenbug is the most famous member of a family of bugs named after physicists, a tradition that lends unearned dignity to what are fundamentally programming mistakes:

The Bohrbug. The opposite of a Heisenbug. A Bohrbug is deterministic, reproducible, and consistent. It crashes every time, in the same place, for the same reason. Bohrbugs are named after Niels Bohr’s model of the atom — stable, predictable, and orbiting the same path forever. Bohrbugs are easy to find and fix. Nobody writes articles about Bohrbugs. Nobody tells war stories about Bohrbugs. Bohrbugs are boring. Boring is good.

The Mandelbug. A bug whose causes are so complex and layered that the behaviour appears chaotic. Named after Benoit Mandelbrot. Fix one symptom and another appears, fractally, at a different scale. The Mandelbug is not one bug. It is an ecosystem.

The Schrödinbug. A bug that has existed in the code for years but only manifests after someone reads the code and realises it should never have worked in the first place. The code was simultaneously working and broken until observed. Upon observation, it collapses into “broken” and has been broken ever since, retroactively. The Schrödinbug is the most philosophically troubling member of the family, because it implies that code works by consensus rather than by logic.

The Compiler’s Role

The Compiler is the Heisenbug’s most reliable accomplice. The compiler optimises code differently at -O0 (debug) and -O2 (release). Variables that exist in debug mode are optimised away in release mode. Memory that is zeroed in debug mode contains whatever was there before in release mode. Loops that execute in order in debug mode are vectorised and reordered in release mode.

The developer who says “it works in debug but crashes in release” is not reporting a bug. They are reporting that the debug build and the release build are different programs that happen to be compiled from the same source code. The compiler has written two programs. One works. The other is the one you ship.

Measured Characteristics

See Also