I'm trying to do web scraping on a page that doesn't use much h1, h2, h3 structures, etc. It predominantly uses the strong tag. I want to search for a specific word (in a p tag) and, if I find it, also take the texts from above levels (tagged with strong) ...
I noticed that my lists created with the command I .find_previous_siblings ('strong') return blank list. While if I use soup.body.findAll ('strong') it works, returning a huge list of items (is what I need!!)
How to get the list of strong tags using the function find_previous_siblings??
Examples / This worked (and print a huge list):
url = 'http://www.mpsp.mp.br/portal/page/portal/DO_Estado/2020/DO_20-06-2020.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.body.findAll('strong'):
print(i.text.strip())
Not worked (print empty list):
url = 'http://www.mpsp.mp.br/portal/page/portal/DO_Estado/2020/DO_20-06-2020.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.body.contents:
if isinstance(i, element.NavigableString):
continue
if isinstance(i, element.Tag):
texts = i.text
if texts == 'HELENA BONILHA DE TOLEDO LEITE':
print(i.find_previous_siblings('h1'))
print(i.find_previous_siblings('strong'))
print(i)