I’m parsing multiple XML files with Python 2.7, there are some strings like:
string ="[2,3,13,37–41,43,44,46]". I split them to get a list of all elements, and then I have to detect elements with “–” like “37–41”, but it turns out this is not a regular dash, it’s a non-ASCII character:
elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']
So I need something like
for e in elements: if "–" in e: # do something about it
If use that non-ASCII char in this if expression, then I get an error:
"SyntaxError: Non-ASCII character '\xe2' in file...".
I tried to replace the
if expression with this re method:
but it’s not the case again. So I’m looking for a way to either convert that non-ASCII char to a regular ASCII “-” or use the ASCII number directly in the search expression.
# -*- coding: utf-8 -*- import re elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46'] for e in elements: if (re.sub('[ -~]', '', e)) != "": #do something here print "-"
re.sub('[ -~]', '', e) will strip out any valid ASCII characters in
e (Specifically, replace any valid ASCII characters with “”), only non-ASCII characters of e are remained.
Hope this help
Answered By – ilovecomputer
Answer Checked By – Senaida (BugsFixing Volunteer)