[SOLVED] Pandas: function sometimes produces IndexError: list index out of range

Issue

I understand why this error occurs but thought I had covered my bases in the function.

This function searches a folder structure and outputs the matching line, line before, and line after …if they exist. On most terms, it works, but on some it produces the index error.

def pattern_search(x,pattern):
    fname = x['Search File']
    file  = os.path.join(DATA,fname)
    match = ""
    
    if os.path.exists(file):
        match = extract_match(file,pattern)
        
    else:
        match = "File NOT FOUND"
    
    return match

def extract_match(file,pattern):
    contents = open(file, encoding="ISO-8859-1").read()
    
    if re.search(pattern, contents):
        lines       = contents.splitlines()
        match       = ""
        i = 0
        
        for index, line in enumerate(lines):
            if i < 1:
                if re.search(pattern, line):
                    i += 1
                    line = f"MATCH: ({str(index)}) {line}"
                    
                    if lines[index - 1]:
                        line = f"PREV: {lines[index - 1]}" + "\n" + line 
                    if lines[index + 1]:
                        line += "\n" + f"POST: {lines[index + 1]}"
                        
                    match = line
                    
                else:
                    pass
    else:
        match = "NF"
                
        #print(match)      
    return match

Run as follows:

df["term1"] = df.apply(pattern_search, args=[term1_pat], axis=1)

For most terms, it will return the matching line with context:

PREV: I like cake
MATCH: This is a cake related matching sentence with cake term: batter
POST: mix 3 cups of regex with butter and add cream cheese.

I assume this is with files with few lines or maybe the match occurs and the very end or beginning. How should I account for these conditions?

Solution

This happens due to the lines where you check if lines with specific index are falsey.

You need to make sure the index itself is above zero when you decrement, or check if the current index is not equal to the line count when you increment.

Replace

if lines[index - 1]:
if lines[index + 1]:

With

if index > 0 and lines[index - 1]:
if index < len(lines) - 1 and lines[index + 1]:

Answered By – Wiktor Stribiżew

Answer Checked By – Katrina (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published.