[SOLVED] capture repeated groups in regular expressions

Issue

Similarly to this question I want to capture a group that repeats more than once. However, I don’t want to use findall, because Im using the order of the evaluation of the regex.

My issue –
I want to parse arguments that look like this –

"(a, {b, c, d}, e)"  # arguments are 1: "a", 2: "b, c, d", 3: "e"
"({a, b}, c, {d, e}, f)" # arguments are 1: "a, b", 2: c, 3: "d, e"

ext.
The arguments are separated by commas, but the contexts of a pair of curly brackets is a single argument.
This is the regex I tried to write –

_SingleArg = "(?:(\{.+?\})|(.+?))"

ArgsParse = re.compile(f"(?:{_SingleArg}, )*{_SingleArg}?$")

The _SingleArg variable tries to match a full argument within brackets, and if it fails it tries matching a regular argument.

I can’t think of a way to do this with findall. I can do it by running multiple regular expressions – first finding the arguments within braces, and then replacing them with the empty string, and finally splitting by comma. But this is a very inelegant solution, especially since I want to know the order of the arguments as well.
Is there a better way to do this with regular expressions?
Thanks,

Solution

You can use this pattern and method to preserve the order of argument:

Pattern: \w+|\{([\w, ]+)\}

Code:

pattern = r"\w+|\{([\w, ]+)\}"
test_string = "({a, b}, c, {d, e}, f)"

result = [(x, y.group().strip('{}')) for x, y in enumerate(re.finditer(pattern, test_string), start=1)]
print(result)

Output:

[(1, 'a, b'), (2, 'c'), (3, 'd, e'), (4, 'f')]

Answered By – gajendragarg

Answer Checked By – Clifford M. (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *