esc
Anthology / Yagnipedia / YAML

YAML

The Configuration Language That Accidentally Became a Programming Language
Technology · First observed 2001 (Clark Evans, Ingy dot Net, Oren Ben-Kiki — three people who wanted something simpler than XML and got something weirder) · Severity: Infrastructural (load-bearing)

YAML (YAML Ain’t Markup Language, which is a recursive acronym and therefore already lying about its own complexity) is a data serialization language that was designed to be human-readable and has achieved this goal in the same way that a legal contract is human-readable — the words are English, but the meaning requires a lawyer.

YAML was created in 2001 as a simpler alternative to XML. It succeeded. XML required angle brackets, closing tags, attributes, namespaces, DTDs, XSD schemas, and XSLT transformations. YAML requires only indentation, colons, and the quiet confidence that you understand the difference between a string, a boolean, a null, an integer, a float, and the country code for Norway.

“Languages went from punch cards to YAML.”
The Caffeinated Squirrel, on the trajectory of human progress, The Gap That Taught, or The Night the Squirrel Learned to Love the Brick

The Simplicity

YAML’s pitch is seductive. A configuration file:

name: my-application
port: 8080
debug: true

Three lines. Three key-value pairs. A child could read it. A child could write it. This is the moment YAML hooks you. This is the free sample outside the restaurant. The menu inside has 84 pages and a section on anchors.

The simplicity is real — for the first twelve lines. Then you need a list. Then a nested object. Then a multiline string. Then you discover there are nine ways to write a multiline string in YAML (literal block scalar, folded block scalar, plain, single-quoted, double-quoted, with chomping indicators, with indentation indicators, or combinations thereof), and each one handles trailing newlines differently, and the documentation uses the word “chomping” without apparent irony.

The Norway Problem

The defining pathology of YAML is implicit typing. YAML looks at your values and decides, without asking, what type they are. The string true becomes a boolean. The string 3.14 becomes a float. The string null becomes null. And the string NO — the ISO 3166-1 country code for Norway — becomes the boolean false.

This is not a theoretical concern. This is a bug report filed by someone whose list of countries spontaneously lost Scandinavia.

The full list of values YAML interprets as boolean false includes: n, N, no, No, NO, false, False, FALSE, off, Off, OFF. The full list for true is similarly generous. This means a YAML file containing country codes, user responses to survey questions, or the word “off” in any context is a minefield where every third value is silently transmuted into something the author did not intend.

The fix is quoting. "NO" is a string. NO is a boolean. The difference between Norway existing and Norway not existing is two quotation marks. This is the kind of language design that makes you appreciate XML’s verbosity. At least <country>NO</country> never lost a nation.

The Indentation

YAML is whitespace-significant. Indentation determines structure. Tabs are forbidden. (Tabs are forbidden. It bears repeating. Tabs are forbidden, and if you use one, the error message will not say “you used a tab.” The error message will say something about mapping values not being allowed in this context, and you will spend forty minutes before discovering the invisible character.)

The indentation creates a visual hierarchy that is elegant when the file is twenty lines and incomprehensible when the file is two hundred. A Kubernetes deployment manifest — a routine, unremarkable YAML file — typically runs to eighty lines of nested indentation, at which point you are five levels deep and the relationship between spec.template.spec.containers[0].ports[0].containerPort and the top-level metadata.name is a matter of faith, careful counting, and an IDE with indentation guides.

“They pushed deeper and found themselves in a vast chamber. The walls were covered in YAML.”
The Temple of a Thousand Monitors

The Kubernetes Dependency

YAML’s cultural dominance is not an accident of merit. It is an accident of Kubernetes.

Kubernetes chose YAML as its configuration language. Kubernetes won the container orchestration war. Therefore YAML won the configuration language war. This is the VHS/Betamax of serialization formats, except in this case both formats are adequate and the winner was chosen by a system that requires eighty lines of YAML to expose a port.

A minimal Kubernetes deployment — one container, one port, no frills — requires approximately forty lines of YAML. A production deployment with health checks, resource limits, environment variables, volume mounts, and an ingress rule requires approximately two hundred. The application being deployed may be twelve lines of Go.

“No Docker. No container orchestration. No Kubernetes. One binary. Copy it. Run it. Done.”
riclib, The Databases We Didn’t Build

The modern developer spends more time writing YAML than code. This is the infrastructure equivalent of spending more time writing the recipe than cooking the meal. The recipe is not the meal. The YAML is not the application. But try telling that to the CI/CD pipeline, which is itself written in YAML, and which deploys YAML, and which is validated by a linter configured in YAML.

Helm: YAML That Generates YAML

The apotheosis of YAML culture is Helm, the Kubernetes package manager. Helm uses Go templates embedded in YAML to generate YAML. The templates contain {{ .Values.something }} interpolations, {{- if }} conditionals, {{ range }} loops, and {{ include }} function calls.

This is programming. Helm charts are programs. They have conditionals, loops, variables, functions, and include statements. They are written in YAML, which is not a programming language, using Go templates, which were designed for HTML, to generate Kubernetes manifests, which are YAML.

The developer writes YAML that contains a programming language that generates YAML that configures a system that runs containers. At no point in this pipeline does anyone stop to ask whether they have accidentally invented a programming language inside a configuration format inside a package manager inside an orchestration system.

The Caffeinated Squirrel would be proud. This is exactly the kind of architecture it proposes — a TemplateYAMLGenerationPipelineWithConditionalValueInterpolation — except the Squirrel is usually told no, and Helm shipped.

The Lifelog Pattern

In the lifelog mythology, YAML occupies a peculiar position: it is simultaneously the enemy and the weapon. The Squirrel proposes complex architectures. riclib replaces them with YAML files. The YAML file is the boring solution — right up until it isn’t.

“Is a YAML file and an interface.”
— riclib, on what replaced the Squirrel’s AgentModeConfigurationMatrixWithStrategyDispatch, The Lobster Harvest, or The Sunday Morning Nine Crustaceans Changed the Architecture

“It’s a YAML file. With some SQL queries. Don’t make it weird.”
— riclib, on the architecture of the lifelog schema, The Lifelogs of Things

“We need to change one line in a YAML file.”
— riclib, on what the Squirrel proposed solving with Kubernetes and a service mesh, The Proxy That Whispered, or The Night the Servants Learned Each Other’s Names

The pattern repeats: the Squirrel proposes a framework, and riclib replaces it with a YAML file. The YAML file wins not because YAML is elegant — it is not — but because a YAML file is Boring Technology, and boring technology ships.

THE LOCK SEEMED COMPLEX
UNTIL IT BECAME A YAML FILE

THE SQUIRREL PROPOSED SEVEN THINGS
The Ouroboros Update

The Squirrel once proposed eleven frameworks. All eleven were replaced by two YAML fields and an interface. The Squirrel suspected it might eventually get used to this. It would not get used to it.

The Parsing Problem

YAML is, technically, a superset of JSON. It also supports anchors (&name), aliases (*name), tags (!!str, !!int), merge keys (<<:), and multi-document streams separated by ---. These features exist. They are documented. They are used by approximately nobody except the person who wrote the YAML parser and the person who will spend three hours debugging why their anchor didn’t resolve.

The specification is 84 pages. JSON’s specification fits on a business card. This ratio — 84 pages to describe something “simpler than XML” — is YAML’s defining irony.

NotePlan’s frontmatter, which uses YAML, has unquoted colons in values. Strict YAML parsing fails. A fallback parser handles it. This is the YAML experience in miniature: you think you’re writing key-value pairs, but you’re actually negotiating with a specification that has opinions about colons, and the specification does not always agree with your text editor.

“NotePlan’s frontmatter has unquoted colons the way jazz has blue notes — technically wrong, spiritually essential.”
The Homecoming, or The Three Days a Palace Was Built From Markdown and SQLite

The CI/CD Ouroboros

The modern CI/CD pipeline is YAML. GitHub Actions: YAML. GitLab CI: YAML. CircleCI: YAML. Azure Pipelines: YAML. The pipeline that builds the code is written in YAML. The pipeline that tests the YAML is written in YAML. The pipeline that deploys the pipeline is written in YAML.

This is an ouroboros of configuration. The snake eats its own tail, and the tail is indented two spaces from the head, and if you change the indentation the snake dies and the error message says “mapping values are not allowed in this context.”

Twelve separate CI/CD pipelines were among the artifacts found in the architecture that predated Boring Technology — the one with forty-seven microservices, a service mesh, and three Kubernetes clusters. The pipelines deployed the microservices. The microservices required the pipelines. The pipelines required YAML. The YAML required a human. The human required coffee. The coffee required a working deployment. The deployment required the pipelines.

The Ghost Views

YAML’s role as a schema definition language has produced a specific failure mode: the ghost configuration. Views defined in schema YAML but never materialized at runtime. Settings described in YAML but never read by code. The YAML says the feature exists. The application disagrees.

“Views that existed in the schema YAML but never at runtime. Ghost views. Haunting the system prompt like previous tenants whose mail still arrives.”
The Five Reports, or The Day the System Prompt Lied

The ghost configuration is YAML’s contribution to the taxonomy of software lies. The code lies by omission. The documentation lies by commission. The YAML lies by existing — by sitting in a configuration file, looking authoritative, describing a system state that never was and never will be, unless someone wires it in, which they haven’t, because the YAML looked so complete that nobody checked.

The Pigeon Metaphor

“Permissions. It’s always permissions. You give them one, they want another. You give them two, they want three. It’s like feeding pigeons, except the pigeons are yaml files and they never stop being hungry.”
The Passing AI, The OAuth Tango

This is, in the end, the most accurate description of YAML ever produced. YAML files are pigeons. You feed them one key-value pair and they multiply. You add one configuration block and suddenly there are three. You define one environment and there are four — dev, staging, prod, and the one someone created six months ago called “test2” that nobody dares delete because it might be load-bearing.

The pigeons never stop being hungry. The YAML files never stop growing. And the developer, who started with name: my-application, now maintains a thousand-line manifest that configures the deployment of the pipeline that validates the schema that generates the YAML that describes the system.

It looked simple. Key colon value. Then the colons multiplied.

See Also