[SOLVED] How to accelerate this process of JavaScript web-page scraping?

Issue

This python function aims to scrape a specific identifier (called as PMID) from a JavaScript web-page. When a URL is passed to the function, it gets the page using selenium. The code then tries to find the class "pubmedLink" within tag of html. If found, it returns the extracted PMID to another function.

This works fine, but is literally really slow. Is there a way to accelerate the process may be by using another parser or with a completely different method?

from selenium import webdriver


def _getPMIDfromURL_(url):

    driver = webdriver.Chrome('/usr/protoLivingSystematicReviews/drivers/chromedriver')
    driver.get(url)

    try:
        if driver.find_element_by_css_selector('a.pubmedLink').is_displayed():
            json_text = driver.find_element_by_css_selector('a.pubmedLink').text
            return json_text
    except:
        return "no_pmid"

    driver.quit()

Examples of the URL for the JS web-page,

Solution

Well, selenium is fast, that’s why is the favorite for many testers. On the other hand you could improve your code by parsing the content once instead two times.

The return value of the statement

 driver.find_element_by_css_selector('a.pubmedLink')

might by stored in a variable and use that variable. This will improve your speed about 1.5x.

try:
    elem =driver.find_element_by_css_selector('a.pubmedLink')
    if  elem.is_displayed():
        return elem.text
except:
    return "no_pmid

Answered By – Raydel Miranda

Answer Checked By – Marilyn (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *