Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I want to download embedded PDF from a webpage using selenium just like in this image. Embedded PDF image

For example, page like this: https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html

I tried the code mentioned below but it did not work out.

def download_pdf(lnk):

    from selenium import webdriver

    from time import sleep

    options = webdriver.ChromeOptions()

    download_folder = "/*My folder*/"    

    profile = {"plugins.plugins_list": [{"enabled": False,

                                         "name": "Chrome PDF Viewer"}],

               "download.default_directory": download_folder,

               "download.extensions_to_open": ""}

    options.add_experimental_option("prefs", profile)

    print("Downloading file from link: {}".format(lnk))

    driver = webdriver.Chrome('/*Path of chromedriver*/',chrome_options = options)

    driver.get(lnk)

    imp_by1 = driver.find_element_by_id("secondaryToolbarToggle")

    imp_by1.click()

    imp_by = driver.find_element_by_id("secondaryDownload")

    imp_by.click()

    print("Status: Download Complete.")

    driver.close()

download_pdf('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

Any help is appreciated.

Thanks in advance!!

3 Answers

0 votes
by (25.1k points)

Firstly initialize the browser. Then send a get request  to the website.Then fine the tag for url and get the src attribute. Then send a get request to the acquired url, then find element download button by xpath and click on that button using the click(method)

import os

browser = webdriver.Chrome(os.getcwd()+'/chromedriver')

browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

pdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")

browser.get(pdf_url)

download = browser.find_element_by_xpath('//*[@id="download"]')

download.click()

If you want to learn python, visit this python tutorial and Python course.

If you are interested to learn Selenium on a much deeper level and want to become a professional in the testing domain, check out Intellipaat’s Selenium course!

0 votes
by (2.9k points)

Below is mentioned code for downloading embedded PDF from webpage using selenium:
import os
# initialize browser
taken_browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
# load page with iframe
taken_browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

# find pdf url
mypdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
# load page with pdf
browser.get(mypdf_url)
# download file
to_download = taken_browser.find_element_by_xpath('//*[@id="download"]')
to_download.click()

0 votes
ago by (1.9k points)
You can follow the below steps to improve your code and how you can download an embedded PDF from a webpage using Selenium. The following changes can make it better and resolve your issue, Issue encountered may be because either the way of PDF embedding or the downloading gets triggered by Selenium.

You can disable the Chrome PDF viewer plugin, which can prevent the automatic opening of PDFs in the browser, I believe it is not necessary for downloading. You can add comments on the plugins. plugins_list section in the profile dictionary.

Handling Dynamic Content: Your code assumes the elements you're interacting with secondaryToolbarToggle and secondaryDownload have static IDs. Usually, these IDs are generated dynamically. So, it's better to implement robust element selection methods like XPath or CSS selectors, and It clicks the download button but doesn't wait till the download is completed. So implement a wait mechanism using libraries like WebDriverWait and ExpectedConditions from Selenium.

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.chrome.options import Options

def download_pdf(url, download_folder):

   options = Options()

   preferences = {

       "download.default_directory": download_folder,

       "download.prompt_for_download": False,

   }

   options.add_experimental_option("prefs", preferences)

   driver = webdriver.Chrome(options=options)

   try:

       driver.get(url)

       download_button = WebDriverWait(driver, 10).until(

           EC.element_to_be_clickable((By.XPATH, "//button[text()='Download']"))

       )

       download_button.click()

       print("Download initiated. Waiting for completion...")

   except Exception as e:

       print(f"An error occurred: {e}")

   finally:

       driver.quit()

download_folder = "your/download/folder"  # Replace the folder path

url="https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html"

download_pdf(url, download_folder)

31k questions

32.9k answers

507 comments

693 users

...