Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)
recategorized by

I'm getting an error when I tried to find a broken link in Python and Selenium.

import requests

from selenium import webdriver

chrome_driver_path = "D:\\drivers\\chromedriver.exe"

driver=webdriver.Chrome(chrome_driver_path)

driver.get('https://google.co.in/')

links = driver.find_elements_by_css_selector("a")

images = driver.find_elements_by_css_selector("img")

for link in links:

    r = requests.head(link.get_attribute('href')

    print(r.status_code == 200)

Getting:

raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='myaccount.google.com', port=443): Max retries exceeded with url: /?utm_source=OGB&utm_medium=app (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:777)'),))

Another exception arose while handling the above exception:

self._sslobj.do_handshake() ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:777)

Traceback (most recent call last):

1 Answer

0 votes
by (26.4k points)

Use the following code, To find the status of the links on the page

Code:

import requests

from selenium import webdriver

options = webdriver.ChromeOptions() 

options.add_argument("start-maximized")

options.add_argument('disable-infobars')

driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')

driver.get('https://google.co.in/')

links = driver.find_elements_by_css_selector("a")

for link in links:

    r = requests.head(link.get_attribute('href'))

    print(link.get_attribute('href'), r.status_code)

Output:

https://mail.google.com/mail/?tab=wm 302

https://www.google.co.in/imghp?hl=en&tab=wi 200

https://www.google.co.in/intl/en/options/ 301

https://myaccount.google.com/?utm_source=OGB&utm_medium=app 302

https://www.google.co.in/webhp?tab=ww 200

https://maps.google.co.in/maps?hl=en&tab=wl 302

https://www.youtube.com/?gl=IN 200

https://play.google.com/?hl=en&tab=w8 302

https://news.google.co.in/nwshp?hl=en&tab=wn 301

https://mail.google.com/mail/?tab=wm 302

https://www.google.com/contacts/?hl=en&tab=wC 302

https://drive.google.com/?tab=wo 302

https://www.google.com/calendar?tab=wc 302

https://plus.google.com/?gpsrc=ogpy0&tab=wX 302

https://translate.google.co.in/?hl=en&tab=wT 200

https://photos.google.com/?tab=wq&pageId=none 302

https://www.google.co.in/intl/en/options/ 301

https://docs.google.com/document/?usp=docs_alc 302

https://books.google.co.in/bkshp?hl=en&tab=wp 200

https://www.blogger.com/?tab=wj 405

https://hangouts.google.com/ 302

https://keep.google.com/ 302

https://earth.google.com/web/ 200

https://www.google.co.in/intl/en/options/ 301

https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.co.in/ 200

https://www.google.co.in/webhp?hl=en&sa=X&ved=0ahUKEwj0qNPqnqHbAhXYdn0KHXpeAo0QPAgD 200

Want to learn more about Python concepts? Come and Join: Python course

Browse Categories

...