Table of Contents
Issue
Context: I have a scenario where I need to perform copies of backups from one system to another. I want the backups list to be configurable, so I went with a JSON approach inside the script itself.
The list contains key (name of the backup to show in output), user to ssh with, and path to get the backup from.
Example:
backups_to_perform='[
{
"key": "key1",
"user": "user1",
"path": "path1"
},
{
"key": "key2",
"user": "user2",
"path": "path2"
},
{
"key": "key3",
"user": "user3",
"path": "path3"
},
]'
The reason I’m going with JSON, is that I wanted to have a similar structure to a python dictionary, since associative arrays can only have key:pair
, instead of key{key:pair; key:pair}
(Please correct me if I’m wrong)
This is how I’m parsing the JSON:
while read -r backup; do
IFS=, read -r key user path <<<"$(jq -cr '"\(.key),\(.user),\(.path)"' <<<"$backup")"
rsync_backup "$key" "$user" "$path"
done < <(jq -cr '.[]' <<<"$backups_to_perform")
rsync_backup is just a function to perform rsync that accepts those args.
It possible there’s a better solution to achieve the backup copies that what I want, but I’d like to improve this type of code so I can better apply it next time.
My problem is that this seems to take some time when the JSON is big (I’ve cut back to 3 for this post). It also looks like my way of parsing the JSON is very convoluted but I couldn’t make it work any other way.
It’s probably bad that I’m calling jq
once to feed the loop, and then call it again for each iteration.
Solution
Update
A few things to consider:
- You can avoid using
jq
inside thewhile
loop:
#!/bin/bash
while IFS=',' read -r key user path
do
# rsync_backup "$key" "$user" "$path"
echo "key=$key user=$user path=$path"
done < <(
jq -cr '.[] | "\(.key),\(.user),\(.path)"' <<< "$backups_to_perform"
)
-
You should safeguard against typos in the JSON that will lead to
null
values (for example if you typed"usr":
instead of"user":
). -
You should allow the use of commas in
"key":
and"user":
, and of any character (but theNULL BYTE
) in"path":
.
With all that in mind, I would choose the TSV format as jq
output:
#!/bin/bash
# safety check
if $(jq 'any(.[]; .key and .user and .path | not)' <<< "$backups_to_perform")
then
jq -c '.[] |select(.key and .user and .path |not)' <<< "$backups_to_perform" |
awk -v prefix="[WARNING] missing attribute in record: " '{print prefix $0}'
fi
# doing the backups
while IFS=$'\t' read -r key user path
do
# unescape TSV values
printf -v key %b "$key"
printf -v user %b "$user"
printf -v path %b "$path"
# rsync_backup "$key" "$user" "$path"
echo "key=$key user=$user path=$path"
done < <(
jq -r '.[] | select(.key and .user and .path) | [.key,.user,.path] | @tsv' <<< "$backups_to_perform"
)
You can test it with this input:
IFS='' read -r -d '' backups_to_perform <<'EOJ'
[
{
"__comment__": "comma in key value",
"key": "key,1",
"user": "user1",
"path": "path1"
},
{
"__comment__": "newline in key value",
"key": "key\n2",
"user": "user2",
"path": "path2"
},
{
"__comment__": "mispelled user attribute",
"key": "key3",
"usr": "user3",
"path": "path3"
},
{
"__comment__": "path containing ascii range [0x01-0x7f]",
"key": "key4",
"user": "user4",
"path": "\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>[email protected][\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u007f"
}
]
EOJ
Answered By – Fravadona
Answer Checked By – Dawn Plyler (BugsFixing Volunteer)