[SOLVED] How to detect non-ASCII character in Python?

Issue

I’m parsing multiple XML files with Python 2.7, there are some strings like: string ="[2,3,13,37–41,43,44,46]". I split them to get a list of all elements, and then I have to detect elements with “–” like “37–41”, but it turns out this is not a regular dash, it’s a non-ASCII character:

elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']

So I need something like

for e in elements:
  if "–" in e:
      # do something about it

If use that non-ASCII char in this if expression, then I get an error: "SyntaxError: Non-ASCII character '\xe2' in file...".

I tried to replace the if expression with this re method:

re.search('\xe2', e)

but it’s not the case again. So I’m looking for a way to either convert that non-ASCII char to a regular ASCII “-” or use the ASCII number directly in the search expression.

Solution

# -*- coding: utf-8 -*-

import re

elements = [u'2', u'3', u'13', u'37\u201341', u'43', u'44', u'46']

for e in elements:
    if (re.sub('[ -~]', '', e)) != "":
        #do something here
        print "-"

re.sub('[ -~]', '', e) will strip out any valid ASCII characters in e (Specifically, replace any valid ASCII characters with “”), only non-ASCII characters of e are remained.

Hope this help

Answered By – ilovecomputer

Answer Checked By – Senaida (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *