I check the site you are scraping and it seems the scripts were already included in the html page, so I think you don't need to use webdriver and you can just use requests and beautifulsoup.get the html data using requests:
res = requests.get(url, headers=headers, params=params)
Then, Soup the html text to get the script tags and find which tags has the var imgInfoData:
soup = BeautifulSoup(res.text, "html5lib")
scripts = soup.findAll('script', attrs={'type':'text/javascript'})
for script in scripts:
if "var imgInfoData" in script.text: #script with imgInfoData captured
return script.text.replace("var imgInfoData =","").strip()[:-1]
just remove the
var imgInfoData =
and
;
of the text to get the string value or you could use regex to get the JSON string inside a text.
Full Code:
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(res.text, "html5lib")
scripts = soup.findAll('script', attrs={'type':'text/javascript'})
for script in scripts:
if "var imgInfoData" in script.text: #script with imgInfoData captured
return script.text.replace("var imgInfoData =","").strip()[:-1]
return None
print(getimgInfoData())
def getimgInfoData():
url =
"https://land.naver.com/info/complexGallery.nhn"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
params = {"newComplex":"Y",
"startImage":"Y",
"rletNo":"102235"}
res = requests.get(url, headers=headers, params=params)
then, just convert the result from getimgInfoData() to json if you want.