JSON is usually the least bad option for machine-readable output formats

August 24, 2024

Over on the Fediverse, I said something:

In re JSON causing problems, I would rather deal with JSON than yet another bespoke 'simpler' format. I have plenty of tools that can deal with JSON in generally straightforward ways and approximately none that work on your specific new simpler format. Awk may let me build a tool, depending on what your format is, and Python definitely will, but I don't want to.

This is re: <Royce Williams Fediverse post>

This is my view as a system administrator, because as a system administrator I deal with a lot of tools that could each have their own distinct output format, each of which I have to parse separately (for example, smartctl's bespoke output, although that output format sort of gets a pass because it was intended for people, not further processing).

JSON is not my ideal output format. But it has the same virtue as gofmt does; as Rob Pike has said, "gofmt's style is no one's favorite, yet gofmt is everyone's favorite" (source, also), because gofmt is universal and settles the arguments. Everything has to have some output format, so having a single one that is broadly used and supported is better than having N of them. And jq shows the benefit of this universality, because if something outputs JSON, jq can do useful things with it.

(In turn, the existence of jq makes JSON much more attractive to system administrators than it otherwise would be. If I had no ready way to process JSON output, I'd be much less happy about it and it would stop being the easy output format to deal with.)

I don't have any particular objection to programs that want to output in their own format (perhaps a simpler one). But I want them to give me an option for JSON too, and most of the time I'm going to go with JSON. I've already written enough ad-hoc text processing things in awk, and a few too many heavy duty text parsing things in Python. I don't really want to write another one just for you. If your program does use only a custom output format, I want there to be a really good reason why you did it, not just that you don't like the aesthetics of JSON. As Rob Pike says, no one likes gofmt's style, but we all like that everyone uses it.

(It's my view that JSON's increased verbosity over alternates isn't a compelling reason unless there's either a really large amount of data or you have to fit into very constrained space, bandwidth, or other things. In most environments, disk space and bandwidth are much cheaper than people's time and the liability of yet another custom tool that has to be maintained.)

PS: All of this is for output formats that are intended to be further processed. JSON is a terrible format for people to read directly, so terrible that my usual reaction to having to view raw JSON is to feed it through 'jq . | less'. But your tool should almost always also have an option for some machine readable format (trust me, someday system administrators will want to process the information your tool generates).


Comments on this page:

I agree, and you have a good point here, specially as system administrator, where you're likely to look at the data itself and output from the system. This happens in particular for structured logging, as it's fairly common to have JSON logging.

Although for configuration maybe YAML or even TOML are my preferred, having to "manually" write JSON is a pain.

JSON is likely the most widespread format, with `jq` definitely making it more usable.

My exception to this would be for formats with specific usages, like `protobuf` which has some machine-to-machine advantages over JSON. And even that has an option to convert to JSON.

So while JSON is good enough, at least for reading/parsing, other cases I'd say could benefit from other formats.

Agree wholeheartedly. Also, shout out to another JSON-reading/managing tool I stumbled on recently, https://jless.io/ - it even has keybindings for copying various bits out of the document you're navigating. (Eg: raw value, JSON value, path to value that you can then turn around and stuff into a 'jq' invocation, etc etc.)

JSON is often the least bad for complex data structures, but sometimes even simple data structures get turned into bad JSON. See https://www.datafix.com.au/BASHing2/2024-04-12.html

100% this. I do not think tools like jq are considered enough when people critique a format in some academic fashion

By Joker_vD at 2024-08-27 10:01:48:

As for this part of the original tweet (toot?):

Repeating every verbose field name in each record, when the schema is flat,

well, I've seen JSON data in

   {"fields": ["name", "price", "amount", "date", "legal"],
    "data": [{"dog toy", 41.99, 11, 1724766809, true},
             {"aspirin", 0.99, 2000, 1724347831, false}, 
             ...]}

format which is awkward to use with jq, but one can use it if absolutely needs to.

Overly-specialized parsing of bespoke command output is the UNIX philosophy. It's pitiful that a data format made by taking a shitty programming language's syntax has become so popular. JSON can't properly encode integers of arbitrary length, and its processing is rife with edge cases different tools handle differently. Bencoding as used by BitTorrent is superior in every way. S-expressions are also better, but I'm not holding my breath for people to accept those.

If your program does use only a custom output format, I want there to be a really good reason why you did it, not just that you don't like the aesthetics of JSON.

This is an admission that the UNIX philosophy doesn't work, at all. I design custom formats because that is the only way to get total efficiency, although I'm referring to the on disk representation, as it's commonly called.

As Rob Pike says, no one likes gofmt's style, but we all like that everyone uses it.

In a Lisp Machine, Smalltalk system, and other better computers, everything is a language-level object, and no parsing is needed whatsoever. It's funny how that simple idea is so unpopular.

As sysadmin having properly formatted output for automations is great (so it won't break easily with another update just because of some formatting change). That's why I love FreeBSD approach with libxo https://wiki.freebsd.org/LibXo

e.g. w --libxo json | json_pp

{
  "uptime-information" : {
     "days" : 0,
     "hours" : 2,
     "load-average-1" : 0.22,
     "load-average-15" : 0.1,
     "load-average-5" : 0.15,
     "minutes" : 52,
     "seconds" : 12,
     "time-of-day" : " 3:28PM",
     "uptime" : 10332,
     "uptime-human" : "  2:52,",
     "user-table" : {
        "user-entry" : [
           {
              "command" : "w --libxo json",
              "from" : "tmux(2948).%0",
              "login-time" : "12:37PM",
              "tty" : "pts/2",
              "user" : "root"
           }
        ]
     },
     "users" : 3
  }
}
Written on 24 August 2024.
« My (current) view on open source moral obligations and software popularity
How to talk to a local IPMI under FreeBSD 14 »

Page tools: View Source, View Normal.
Search:
Login: Password:

Last modified: Sat Aug 24 22:28:00 2024
This dinky wiki is brought to you by the Insane Hackers Guild, Python sub-branch.