postgrails   >   20211230-data-languages-1  

Data Languages, Part One: Object Data (Better than JSON)

published 2021-12-30

There are three things that would make for a better JSON, which could still be interpreted by Javascript:

  1. Multi-line and inline comments as doc-strings. At any point in a JSON source, A comment can occur that can be retained as a doc-string.

    • /* ... */ -- Multi-line comments are doc-strings for the following object. Leading spaces are stripped.
    • // ... -- Inline comments are doc-strings for the current line of code.
  2. Comma allowed after the last item in a list or dict.

  3. All keys are strings, quotes can be omitted. This is how Javascript works, and it works fine. (Quotes are only needed in Javascript for keys like "" or "+" -- i.e., when the key is empty or would be an operator. We could even relax this requirement.)

  4. String values can be multiline. Leading spaces are stripped to the indentation of the first non-whitespace character.

These three changes would make JSON more readable and usable by human beings, while still being a safe and robust data format for software to use for exchange.

Let's call the new data language "JSD" (Javascript Data, pronounced "jazzed") and compare it with YAML and JSON.

YAML:

# The app uses nginx for static hosting: just mount
# your document root at /var/www and serve away.
apiVersion: v1  # keys don't need quotes
kind: Pod
metadata:
  name: nginx   # commas aren't really a thing
spec:
  containers:
    - name: nginx
      image: nginx:1.14.2
      ports:
        - containerPort: 80
# a non-kubernetes field for a multi-line string
description: |
    Multi-line strings in YAML have several 
    possibilities, which are somewhat hard to 
    remember. But they are often used (with |) for 
    embedded multi-line scripts. That's one of the
    reasons that YAML is popular for system config,
    CI/CD pipeline definitions, etc.

JSON:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "name": "nginx"
  },
  "spec": {
    "containers": [
      {
        "name": "nginx",
        "image": "nginx:1.14.2",
        "ports": [
          {
            "containerPort": 80
          }
        ]
      }
    ]
  },
  "description": "Multi-line strings in JSON\nhave to have escaped newlines / carriage returns.\nThis makes them not very readable, nor useful for\nthings like inline scripts."
}

JSD (Javascript Data)

/* 
The app uses nginx for static hosting: just mount 
your document root at /var/www and serve away.
*/
{
  apiVersion: "v1", // keys don't need quotes
  kind: "Pod",
  metadata: {
    name: "nginx",  // comma is fine
  },
  spec: {
    containers: [
      {
        name: "nginx",
        image: "nginx:1.14.2",
        ports: [
          {
            containerPort: 80,
          }
        ]
      }
    ]
  },
  description: "
    Multi-line strings in JSD are just strings
    that go for multiple lines. Leading whitespace 
    is stripped, interior newlines are not removed.
    So it can be used just as well as YAML for 
    system config, CI/CD pipeline definitions, etc.
    ",
  : "some value", // empty keys are null.
}

YAML is the most readable, but many consider it as having too many features. Multi-line strings could be a great feature of YAML, but there are too many possibilities, and it's hard to keep them straight. It also has capabilities that many consider unsafe, including templating and references.

JSON is very strict. It's readable when indented, but takes more typing because all the keys have to be quoted. Multi-line strings have to have their newlines / carriage returns escaped, which makes them the most explicit format, but least readable. It also disallows trailing commas, which makes it a pain to work with by hand.

JSD is as simple a data format as JSON, but it's more readable (multi-line strings, keys without quotes), more understandable (comments), and more usable (trailing commas, multi-line strings for things like scripts and long descriptions.)