2018-12-21
The Advent of Void: Day 21: fex
fex is a simple yet powerful field extraction tool for working with text.
If you spend enough time in a shell fiddling with code or data, you inevitably want to pull fields out of some text you have. For example, say you want the hex sha256sum of a file, minus the name:
$ sha256sum launch-codes
a0cd7db7343bed89416660b4f92c43fe7b556439daa2e26e9844ce82491191c6 launch-codes
You could write this output to a file and remove the filename from it with a text editor, or use cut or awk. However, fex can save you a bit of time by extracting the first field containing the checksum for you:
$ sha256sum launch-codes | fex 1
a0cd7db7343bed89416660b4f92c43fe7b556439daa2e26e9844ce82491191c6
Or, say you want a list of all home directories from passwd. To do this, we ask fex to split by colon and select the sixth field:
$ fex </etc/passwd :6
/root
/dev/null
/var/lib/colord
/var/empty
...
Or maybe we want gid:login
pairs to work with:
$ fex </etc/passwd ':{4,1}'
0:root
99:nobody
999:colord
22:dbus
...
Here we can see that fex will keep the separator we split by, letting us
reduce the data to only what we want while maintaining its format. And,
it allows us to rearrange fields using {N,M,...}
selectors.
You can alse use fex to pluck fields by narrowing the selection with different split characters. So let’s say we want to get an idea of what the most common pairs of ‘subject: verb’ are in commit messages for void-packages – maybe to see which packages receive the most updates.
To do this, we’ll use git to search for commit message subjects with
a colon and pipe that to fex. Using fex, we’ll write our first selector
(:1
) to take the first field behind the colon. We’ll then use a second
selector (:2 1
) to take the text after the colon, split that by
spaces, and select the first word from that. With this, we get a list of
subjects and verbs we can sort and count:
$ git log --pretty=%s --grep ':' | fex ':1' ':2 1' | sort | uniq -c | sort -rh | head
328 youtube-dl update
206 xbps-git bump
160 git update
151 Adapta update
149 ImageMagick update
143 exiftool update
142 firefox update
136 kernel update
133 rpi-kernel update
129 python-setuptools update
From that, we can see that updates make up all but one of the top ten pairs, with xbps-git bumps being the second in line after youtube-dl updates. This only covers basic use of fex, too – you can also select field ranges, fields matching a regular expression, and combine all of these.
There are lots of ways to use fex in day to day data munging, programming, and writing your own tools. For many field extraction tasks, fex allows you to easily get the data you want without writing small awk programs or messing with cut. Plus, it’s just fun to use.
For more information and examples, please read the fex(1) manpage.