[SOLVED] BASH How to improve performance of JSON parsing with jq

Table of Contents

Issue

Context: I have a scenario where I need to perform copies of backups from one system to another. I want the backups list to be configurable, so I went with a JSON approach inside the script itself.

The list contains key (name of the backup to show in output), user to ssh with, and path to get the backup from.

Example:

backups_to_perform='[
  {
    "key": "key1",
    "user": "user1",
    "path": "path1"
  },
  {
    "key": "key2",
    "user": "user2",
    "path": "path2"
  },
  {
    "key": "key3",
    "user": "user3",
    "path": "path3"
  },
]'

The reason I’m going with JSON, is that I wanted to have a similar structure to a python dictionary, since associative arrays can only have key:pair, instead of key{key:pair; key:pair} (Please correct me if I’m wrong)

This is how I’m parsing the JSON:

  while read -r backup; do
    IFS=, read -r key user path <<<"$(jq -cr '"\(.key),\(.user),\(.path)"' <<<"$backup")"
    rsync_backup "$key" "$user" "$path"
  done < <(jq -cr '.[]' <<<"$backups_to_perform")

rsync_backup is just a function to perform rsync that accepts those args.

It possible there’s a better solution to achieve the backup copies that what I want, but I’d like to improve this type of code so I can better apply it next time.

My problem is that this seems to take some time when the JSON is big (I’ve cut back to 3 for this post). It also looks like my way of parsing the JSON is very convoluted but I couldn’t make it work any other way.

It’s probably bad that I’m calling jq once to feed the loop, and then call it again for each iteration.

Solution

Update

A few things to consider:

  • You can avoid using jq inside the while loop:
#!/bin/bash

while IFS=',' read -r key user path
do
#   rsync_backup "$key" "$user" "$path"
    echo "key=$key user=$user path=$path"
done < <(
    jq -cr '.[] | "\(.key),\(.user),\(.path)"' <<< "$backups_to_perform"
)
  • You should safeguard against typos in the JSON that will lead to null values (for example if you typed "usr": instead of "user":).

  • You should allow the use of commas in "key": and "user":, and of any character (but the NULL BYTE) in "path":.


With all that in mind, I would choose the TSV format as jq output:

#!/bin/bash

# safety check
if $(jq 'any(.[]; .key and .user and .path | not)' <<< "$backups_to_perform")
then
    jq -c '.[] |select(.key and .user and .path |not)' <<< "$backups_to_perform" |
    awk -v prefix="[WARNING] missing attribute in record: " '{print prefix $0}'
fi

# doing the backups
while IFS=$'\t' read -r key user path
do
    # unescape TSV values
    printf -v key  %b "$key"
    printf -v user %b "$user"
    printf -v path %b "$path"
#   rsync_backup "$key" "$user" "$path"
    echo "key=$key user=$user path=$path"
done < <(
    jq -r '.[] | select(.key and .user and .path) | [.key,.user,.path] | @tsv' <<< "$backups_to_perform"
)

You can test it with this input:

IFS='' read -r -d '' backups_to_perform <<'EOJ'
[
  {
    "__comment__": "comma in key value",
    "key": "key,1",
    "user": "user1",
    "path": "path1"
  },
  {
    "__comment__": "newline in key value",
    "key": "key\n2",
    "user": "user2",
    "path": "path2"
  },
  {
    "__comment__": "mispelled user attribute",
    "key": "key3",
    "usr": "user3",
    "path": "path3"
  },
  {
    "__comment__": "path containing ascii range [0x01-0x7f]",
    "key": "key4",
    "user": "user4",
    "path": "\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>[email protected][\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f"
  }
]
EOJ

Answered By – Fravadona

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *