Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in DevOps and Agile by (19.7k points)

I run a query on one web page, then I get a result URL. If I right-click see HTML source, I can see the HTML code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solutions using selenium. Here's my code:

from selenium import webdriver

url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'

driver = webdriver.PhantomJS(executable_path='C:\python27\scripts\phantomjs.exe')

driver.get(url)

print driver.page_source

>>> <html><head></head><body></body></html>        

Obviously, It's not right!!

Here's the source code I need in right-click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 

noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"

href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 

to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

        jQuery(document).ready(function() {

            jQuery(".ancestry-information-tooltip").actooltip({

href: "#AncestryInformationTooltip", orientation: "bottomleft"});

        });

=========== So my question is =============== How to get the information generated by JS?

1 Answer

+1 vote
by (62.9k points)

I have had the same problem and I solved it using the below code:

  1. First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body")

#print(driver.page_source)

2. Second, parse HTML using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep

 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

3. Third, print out the value you need

 for span in viewcount:

    print(span.string) 

Complete code

from selenium import webdriver

import lxml

urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

driver = webdriver.PhantomJS()

##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body")

##print(driver.page_source)

import bs4

import re

from time import sleep

sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})

for span in viewcount:

print(span.string)

driver.quit()

I hope this helps!

If you are interested to learn Selenium on a much deeper level and want to become a professional in the testing domain, check out Intellipaat’s Selenium course!

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...