How to download embedded PDF from webpage using selenium?

Question

asked Jul 29, 2019 in Python by Rajesh Malhotra (19.9k points)

I want to download embedded PDF from a webpage using selenium just like in this image. Embedded PDF image

For example, page like this: https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html

I tried the code mentioned below but it did not work out.

def download_pdf(lnk):
from selenium import webdriver
from time import sleep
options = webdriver.ChromeOptions()
download_folder = "/*My folder*/"
profile = {"plugins.plugins_list": [{"enabled": False,
"name": "Chrome PDF Viewer"}],
"download.default_directory": download_folder,
"download.extensions_to_open": ""}
options.add_experimental_option("prefs", profile)
print("Downloading file from link: {}".format(lnk))
driver = webdriver.Chrome('/*Path of chromedriver*/',chrome_options = options)
driver.get(lnk)
imp_by1 = driver.find_element_by_id("secondaryToolbarToggle")
imp_by1.click()
imp_by = driver.find_element_by_id("secondaryDownload")
imp_by.click()
print("Status: Download Complete.")
driver.close()
download_pdf('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

Any help is appreciated.

Thanks in advance!!

3 Answers

Anirudh Singh · Answer 1 · 2019-07-29T11:09:37+0000

Firstly initialize the browser. Then send a get request to the website.Then fine the tag for url and get the src attribute. Then send a get request to the acquired url, then find element download button by xpath and click on that button using the click(method)

import os
browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')
pdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
browser.get(pdf_url)
download = browser.find_element_by_xpath('//*[@id="download"]')
download.click()

If you want to learn python, visit this python tutorial and Python course.

If you are interested to learn Selenium on a much deeper level and want to become a professional in the testing domain, check out Intellipaat’s Selenium course!

Yashika · Answer 2 · 2024-11-08T11:57:23+0000

Below is mentioned code for downloading embedded PDF from webpage using selenium:
import os
# initialize browser
taken_browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
# load page with iframe
taken_browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

# find pdf url
mypdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
# load page with pdf
browser.get(mypdf_url)
# download file
to_download = taken_browser.find_element_by_xpath('//*[@id="download"]')
to_download.click()

Apurti · Answer 3 · 2025-01-07T08:13:54+0000

You can follow the below steps to improve your code and how you can download an embedded PDF from a webpage using Selenium. The following changes can make it better and resolve your issue, Issue encountered may be because either the way of PDF embedding or the downloading gets triggered by Selenium.

You can disable the Chrome PDF viewer plugin, which can prevent the automatic opening of PDFs in the browser, I believe it is not necessary for downloading. You can add comments on the plugins. plugins_list section in the profile dictionary.

Handling Dynamic Content: Your code assumes the elements you're interacting with secondaryToolbarToggle and secondaryDownload have static IDs. Usually, these IDs are generated dynamically. So, it's better to implement robust element selection methods like XPath or CSS selectors, and It clicks the download button but doesn't wait till the download is completed. So implement a wait mechanism using libraries like WebDriverWait and ExpectedConditions from Selenium.

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.chrome.options import Options

def download_pdf(url, download_folder):

   options = Options()

   preferences = {

       "download.default_directory": download_folder,

       "download.prompt_for_download": False,

   }

   options.add_experimental_option("prefs", preferences)

   driver = webdriver.Chrome(options=options)

   try:

       driver.get(url)

       download_button = WebDriverWait(driver, 10).until(

           EC.element_to_be_clickable((By.XPATH, "//button[text()='Download']"))

       )

       download_button.click()

       print("Download initiated. Waiting for completion...")

   except Exception as e:

       print(f"An error occurred: {e}")

   finally:

       driver.quit()

download_folder = "your/download/folder" # Replace the folder path

url="https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html"

download_pdf(url, download_folder)

How to download embedded PDF from webpage using selenium?

3 Answers

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources