[SOLVED] Regex to get only combination of characters

Issue

[EDIT: I already gave an accepted answer, so I’ll spell out the question (which seemed obvious enough to me) in more technical terms: the OP wanted a regexp to match a substring consisting of only ‘a’ and ‘b’ characters, with the constraint that there must be at least one of each character present. That’s consistent with everything they said in the question, with their example, and with their comments.]

I need to get only ‘ab’ characters combination, without ‘a’ and ‘b’ only:
for example:
from ‘aaaaabbbbsaaaaaaaa’
I need to get only ‘aaaaabbbb’ part.

I tried

'[ab]+'

pattern, but it gives aaaaa combinations

Solution

Try this. "The trick" is that you need to ensure there’s at least one each of "a" and "b". That’s easy to do if you make two cases of it. The non-capturing group ("(?:…)") is so that re.findall() shows the part you actually care about:

>>> import re
>>> re.findall("(?:a+b|b+a)[ab]*", 'aaaaabbbbsaaaaaaaa')
['aaaaabbbb']

Answered By – Tim Peters

Answer Checked By – Candace Johnson (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *