Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in DevOps and Agile by (19.7k points)

I want to scrape all the data of a page implemented by an infinite scroll. The following python code works.

for i in range(100):

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    time.sleep(5)

This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time-efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time-efficient.

1 Answer

0 votes
by (62.9k points)

The webdriver can wait for a page to load by default via .get() method.

As you may be looking for some specific element as @user227215 said, you should use WebDriverWait to wait for an element located in your page:

from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.common.by import By

from selenium.common.exceptions import TimeoutException

browser = webdriver.Firefox()

browser.get("url")

delay = 3 # seconds

try:

    myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'IdOfMyElement')))

    print "Page is ready!"

except TimeoutException:

    print "Loading took too much time!"

I have used it for checking alerts. You can use any other type methods to find the locator.

EDIT 1:

I ought to mention that the webdriver will wait for a page to load by default. It doesn't wait for loading within frames or for Ajax requests. It means when you use .get('url'), your browser will wait until the page is completely loaded and then go to the next command in the code. But after you are posting an Ajax request, webdriver does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; therefore there's a module named expected_conditions.

...