XML

XML (Extensible Markup Language) is a data format that can describe anything — and did, for approximately fifteen years, during which the industry described everything in XML: data, configuration, build scripts, message protocols, database schemas, user interfaces, deployment manifests, and the specifications for the specifications that specified how XML should be used to specify things.

XML was standardised by the W3C in 1998, but its core insight — using angle brackets to create self-describing, hierarchical data structures — was older. In Lisbon, 1993, a developer needed variable field data for an integration platform. The solution was angle brackets, attributes, nesting. It was Proto-XML — the same concept, independently discovered, five years before the W3C named it.

“Proto-XML. 1993. He needed variable field data, and the solution turned out to be what the world would call XML five years later.”
— The Passing AI, Interlude — The Versions That Never Shipped …

The developer did not publish a specification. The developer solved the problem. Five years later, the W3C solved the same problem with a committee, a specification, and a namespace system that would haunt enterprise developers for the next two decades.

The Verbosity

XML’s defining characteristic is that it is verbose. Not as a flaw — as a design decision. XML was designed to be human-readable. XML was designed to be self-describing. XML was designed to be unambiguous. These goals require structure, and structure requires syntax, and syntax requires characters, and characters add up:

<?xml version="1.0" encoding="UTF-8"?>
<person>
  <name>The Lizard</name>
  <speaks>false</speaks>
  <blinks>true</blinks>
  <principles>
    <principle>simplicity</principle>
    <principle>directness</principle>
    <principle>one binary</principle>
  </principles>
</person>

The same data in JSON:

{"name":"The Lizard","speaks":false,"blinks":true,"principles":["simplicity","directness","one binary"]}

One line vs. twelve. The XML is more readable. The JSON is more concise. The industry chose concise, because developers read data in debuggers and network tabs, not in printed specifications, and in a network tab, twelve lines of angle brackets around three values is not “self-describing” — it is “wasting bandwidth.”

The SOAP Era

XML’s golden age was the SOAP era (2000–2010), during which enterprise systems communicated via XML messages wrapped in XML envelopes described by XML schemas defined in XML service descriptions. The stack was:

SOAP — XML messages with XML headers in XML envelopes
WSDL — XML documents describing the XML messages
XSD — XML documents defining the structure of the XML documents
XSLT — XML documents transforming XML documents into other XML documents
UDDI — XML-based directory for discovering XML-based services

The entire communication stack was XML. The message was XML. The description of the message was XML. The validation of the description was XML. The transformation from one format to another was XML. A developer who wanted to call a service needed four XML documents before writing a single line of code.

The ESB thrived in this environment because the ESB’s core capability — transforming XML from one format to another — was genuinely necessary when every system spoke a different XML dialect. The ESB was not solving a fake problem. The ESB was solving the problem that XML’s flexibility created: when anything can be described in XML, everything is described differently in XML.

The Enterprise Survivor

XML is not dead. XML is the COBOL of data formats: declared dead by every generation, still running everything that matters.

Configuration: Maven’s pom.xml, .NET’s web.config, Android’s AndroidManifest.xml
Documents: OOXML (Word, Excel, PowerPoint are ZIP files containing XML), SVG (vector graphics are XML), RSS/Atom (feeds are XML)
Enterprise integration: most B2B integrations still exchange XML, because the schemas were defined in 2005 and nobody wants to renegotiate
Government: most government data exchange formats are XML, because government procurement moves at geological speed and XML was the format when procurement started

The enterprise does not choose formats based on developer ergonomics. The enterprise chooses formats based on existing contracts, schema validation requirements, and the sunk cost of twelve years of XSLT transformations. XML meets all three criteria. JSON meets none. XML persists.

The Proto-XML Origin

The lifelog’s most remarkable XML connection is not the technology but its pre-invention. In Lisbon, 1993 — five years before the W3C recommendation — a developer working on integration problems needed a way to represent variable field data. The solution was hierarchical, tagged, self-describing. It used delimiters. It nested. It was, in every conceptual sense, XML.

The developer did not know he was inventing XML. The developer was solving a problem. The W3C, five years later, would solve the same problem by committee and produce a specification. The specification was more complete. The developer’s solution shipped first.

This is the pattern of Interlude — The Versions That Never Shipped …: solving problems so thoroughly that the solution is the future, years before the future has a name. Proto-XML in 1993. The Data Fabric — what Gartner would call an ESB — in 1998. The same developer, the same pattern, the same gap between solving and naming.

The Phone Call

In 1999, after the developer had moved on from the Portuguese National Archives, the phone rang. The project manager of the four-developer team hired to replace him had a question: what were the weird blobs in the codebase? The hierarchical tagged data structures. The angle brackets. The nesting. They didn’t recognise the format.

The answer was one line:

<?xml version="1.0" encoding="UTF-8"?>

“Add this declaration at the top,” the developer said. “Then use any XML parser to read them.”

The Proto-XML was so close to XML that the migration from proprietary format to W3C standard was: add a declaration line. The structures were already valid. The hierarchy was already correct. The self-describing tags were already there. The only thing missing was the declaration that told parsers “this is XML” — because when the developer wrote it in 1993, XML didn’t exist yet, and the data didn’t know it needed to introduce itself.

Four developers were hired to understand what one developer had built. The one developer solved their problem in one sentence. The sentence was an XML declaration. The W3C would have been proud, had they known.

Measured Characteristics

W3C recommendation year: 1998
Proto-XML invention year: 1993 (Lisbon, no committee)
Gap between invention and naming: 5 years
Lines needed to migrate Proto-XML to XML: 1 (<?xml version="1.0"?>)
Developers hired to understand the codebase: 4
Developers needed to explain it: 1 (by phone)
Characters to represent "name": "hello" in JSON: 16
Characters to represent the same in XML: 27 (<name>hello</name>)
Verbosity ratio: ~1.7x
SOAP layers to send a message: 4 (minimum)
XML documents needed to describe an XML service: 3 (WSDL + XSD + SOAP)
Enterprise integrations still using XML: most of them
Government data formats using XML: nearly all
Developers who enjoy writing XSLT: statistically zero
XML’s status: declared dead, still everywhere

Type	Technology
First Observed	1998 (W3C Recommendation); pre-invented 1993 (Lisbon, by a developer who needed variable field data)
Severity	Historical (load-bearing in enterprise; decorative elsewhere)
Natural Predator	JSON (which won by being less)
Tags	languages enterprise
Cited in	EDI episode Hate episode JSON episode Postel's Law episode Yagnipedia episode

XML