Issue
Note: This has to work in JavaScript RegExp
I have to parse string like this:
yo (p:abc-123-def) meets \(p:2) \(in the cinema\) \\ (p:3) (p:4\) won't
What I need to extract are all (<entity>:<id>)
markups but ignore escaped things like \(in the ciname\)
or \\
. From the above example, the regex should only match
(p:abc-123-def)
(p:3)
but not \(p:2)
or \(p:4)
since the brackets are escaped.
Now, I am still able to modify that markup so if there is a simpler way to do the whole thing I’m open to suggestions. If not, I’d need to be able to get those (<entity>:<id>)
markups from a regex.
Something like this
(?<!\\)\([^(?<!\\)\(]*\)
would work but look-behind groups are not supported by all browsers.
Solution
It can get complex when backslashes are repeated many times, like: \\\\\\\\\\\\\\(p:1)
. You would need to know whether the number of backslashes is even or odd in order to know whether the (
is escaped or not.
Secondly, the colon occurring within parentheses might be escaped as well, and would then not count(?).
So I would suggest to work with something like (?:\\.|[^:)\\])*
which deals with escaped characters (.
) and puts some requirements for unescaped characters, like [^:)\\]
.
So this is the result:
(?<!\\)(?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\)
This uses look-behind which is being supported in the latest versions of popular browsers.
If look-behind is not an option, then capture the character that precedes the potential backslashes, and make a capture group for the part you need:
(?:[^\\]|^)((?:\\.)*\((?:\\.|[^:)\\])*:(?:\\.|[^:)\\])*\))
So here you need to work with the first captured group.
Answered By – trincot
Answer Checked By – Cary Denson (BugsFixing Admin)