I check the site you are scraping and it seems the scripts were already included in the html page, so I think you don't need to use webdriver and you can just use requests and beautifulsoup.get the html data using requests:
res = requests.get(url, headers=headers, params=params)
Then, Soup the html text to get the script tags and find which tags has the var imgInfoData:
soup = BeautifulSoup(res.text, "html5lib")
scripts = soup.findAll('script', attrs={'type':'text/javascript'})
for script in scripts:
if "var imgInfoData" in script.text: #script with imgInfoData captured
return script.text.replace("var imgInfoData =","").strip()[:-1]
just remove the
var imgInfoData =
and
;
of the text to get the string value or you could use regex to get the JSON string inside a text.
Full Code:
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text, "html5lib")
scripts = soup.findAll('script', attrs={'type':'text/javascript'})
for script in scripts:
if "var imgInfoData" in script.text: #script with imgInfoData captured
return script.text.replace("var imgInfoData =","").strip()[:-1]
return None
print(getimgInfoData())
def getimgInfoData():
url =
"https://land.naver.com/info/complexGallery.nhn"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
params = {"newComplex":"Y",
"startImage":"Y",
"rletNo":"102235"}
res = requests.get(url, headers=headers, params=params)
then, just convert the result from getimgInfoData() to json if you want.
If you want to make your career in the testing field you must take up the following selenium automation certification.