2017-12-06

The Advent of Void: Day 6: jq

Day 6! Today we want to get some pseudo modern web technology in the game. jq(1) is a commandline tool that allows to query and manipulate JSON documents and is great to use with shell scripts.

Let’s get started with this simple JSON document:

{
 "title" : "The Advent of Void: Day 4: containers",
 "pubDate" : "2017-12-04 00:00:00",
 "link" : "http://voidlinux.eu/news/2017/12/advent-containers.html",
 "guid" : "http://voidlinux.eu/news/2017/12/advent-containers"
}

To query the title of this element you can use the following:

# jq '.title' file.json
"The Advent of Void: Day 4: containers"

Now, this is still formatted as a JSON string. Let’s get the plain string we can pass -r on the shell:

# jq -r '.title' file.json
The Advent of Void: Day 4: containers

To get both the title and the pubDate, just add it as a comma seperated list:

# jq -r '.title, .pubDate' file.json
The Advent of Void: Day 4: containers
2017-12-04 00:00:00

That’s quite helpful. But the real power of jq comes to light once we work with JSON Arrays. Let’s have a little more complex example:

[
  {
     "guid" : "http://voidlinux.eu/news/2017/12/advent-containers",
     "pubDate" : "2017-12-04 00:00:00",
     "link" : "http://voidlinux.eu/news/2017/12/advent-containers.html",
     "title" : "The Advent of Void: Day 4: containers"
  },
  {
     "link" : "http://voidlinux.eu/news/2017/12/advent-ministat.html",
     "title" : "The Advent of Void: Day 3: ministat",
     "guid" : "http://voidlinux.eu/news/2017/12/advent-ministat",
     "pubDate" : "2017-12-03 00:00:00"
  },
  {
     "link" : "http://voidlinux.eu/news/2017/12/advent-taskwarrior.html",
     "title" : "The Advent of Void: Day 2: taskwarrior and friends",
     "pubDate" : "2017-12-02 00:00:00",
     "guid" : "http://voidlinux.eu/news/2017/12/advent-taskwarrior"
  },
  {
     "link" : "http://voidlinux.eu/news/2017/12/advent-gcal.html",
     "title" : "The Advent of Void: Day 1: gcal",
     "guid" : "http://voidlinux.eu/news/2017/12/advent-gcal",
     "pubDate" : "2017-12-01 00:00:00"
  }
]

Assume we want a list of all links and titles as a CSV in the document: Our first step is to extract them both from the document:

# jq -r '.[] | ( .title, .link )' file.json
The Advent of Void: Day 4: containers
http://voidlinux.eu/news/2017/12/advent-containers.html
The Advent of Void: Day 3: ministat
http://voidlinux.eu/news/2017/12/advent-ministat.html
The Advent of Void: Day 2: taskwarrior and friends
http://voidlinux.eu/news/2017/12/advent-taskwarrior.html
The Advent of Void: Day 1: gcal
http://voidlinux.eu/news/2017/12/advent-gcal.html

That’s all that’s needed to query the information. To format the output we need to generate an array from every article:

# jq -r '.[] | [.title, .link ]' json
[
  "The Advent of Void: Day 4: containers",
  "http://voidlinux.eu/news/2017/12/advent-containers.html"
]
[
  "The Advent of Void: Day 3: ministat",
  "http://voidlinux.eu/news/2017/12/advent-ministat.html"
]
[
  "The Advent of Void: Day 2: taskwarrior and friends",
  "http://voidlinux.eu/news/2017/12/advent-taskwarrior.html"
]
[
  "The Advent of Void: Day 1: gcal",
  "http://voidlinux.eu/news/2017/12/advent-gcal.html"
]

Now as a last step join the arrays using the tab character:

# jq -r '.[] | [.title, .link ] | join("\t")' file.json
The Advent of Void: Day 4: containers    http://voidlinux.eu/news/2017/12/advent-containers.html
The Advent of Void: Day 3: ministat http://voidlinux.eu/news/2017/12/advent-ministat.html
The Advent of Void: Day 2: taskwarrior and friends    http://voidlinux.eu/news/2017/12/advent-taskwarrior.html
The Advent of Void: Day 1: gcal http://voidlinux.eu/news/2017/12/advent-gcal.html

All in all jq is a quite powerful tool and we did only scratch the surface of what you can do with jq. For more comprehensive documentation consider the jq Manual and try it online.