JSON vs YAML: Why JSON Reigns Supreme in Data Exchange?

"YAML is so much easier for humans to read, so why do APIs exclusively use JSON?"
"What are YAML's strengths, and what made JSON the standard?"

JSON vs YAML Medieval Battle


1. The Dawn of the Data Format Wars

In the past, developers needed a 'common standard' for systems to exchange data. In the early days, XML held that position.

<person>
    <name>Alice</name>
    <age>25</age>
    <skills>
        <skill>Python</skill>
        <skill>Django</skill>
    </skills>
</person>

However, XML was overly verbose and heavy, requiring every tag to be explicitly closed. It was also tedious to read. This led to the emergence of JSON and YAML as alternatives.


2. JSON: Becoming the Standard for Web Data Exchange

In 2001, Douglas Crockford created JSON, drawing inspiration from JavaScript's object literal notation. The core idea was to be "as lightweight as possible, and easily readable by machines."

JSON's success was clear: * Unrivaled Lightweight Design: It transmits only data, without unnecessary embellishments. * Perfect Synergy with the Web: Browsers (JavaScript) could directly convert it into objects without needing separate libraries. * Intuitive Mapping: Its structure closely matches data structures in modern languages, such as Python's Dict and JavaScript's Object.

Especially as REST APIs became the dominant paradigm on the web, JSON effectively became the universal language.


3. YAML: A Format Born for Human Readability

There was also a camp that focused more on the "joy of human readability" than JSON. The question, "Can't we write data more cleanly without braces or quotes?" led to the creation of YAML.

name: Alice
age: 25
skills:
  - Python
  - Django

YAML's characteristics are distinct: * Exceptional Readability: Being indentation-based, the text is remarkably clean. * Comment (#) Support: A major advantage is the ability to add 'explanations,' which JSON lacks. * Configuration File Powerhouse: Thanks to this readability, it has completely dominated infrastructure configuration formats like Kubernetes, Docker, and GitHub Actions.

But why has it failed to surpass JSON in data exchange (APIs)?


4. Practical Reasons Why YAML Lagged in Data Exchange

First, Strict Indentation (Space vs Tab): While JSON uses brackets to define structure, YAML's structure is determined by invisible whitespace. When exchanging complex data, a single misplaced space can lead to a debugging nightmare.

Second, Parsing Speed and Resources: JSON's simple syntax allows for very lightweight and fast parsers. In contrast, YAML's grammar is extensive and complex (even including code execution capabilities), consuming more memory and CPU to interpret. This is a critical drawback in API environments dealing with large volumes of data.

Third, Security Concerns: YAML can include features that go beyond simple data, allowing direct invocation of objects in specific languages. This poses a risk of security vulnerabilities like Remote Code Execution (RCE), making it unsuitable for APIs that exchange data with an untrusted public.


5. Ultimately, a Matter of "Right Tool for the Right Job"

If JSON is the king of data exchange, then YAML has firmly established itself as the king of configuration files, each solidifying its own domain.

  • When to use JSON: Web API communication, NoSQL database storage, data transfer between clients.
  • When to use YAML: Project configuration files (docker-compose.yml), CI/CD scripts, documents managed directly by humans.

Even in Django Rest Framework(DRF), while YAMLRenderer can be added, the default is always JSONRenderer. This is because data needs to be accurate and fast.


Conclusion: A One-Line Summary

"For machine-to-machine communication (APIs), use JSON; for human-to-machine communication (configurations), use YAML."

Further Reading: