How use “find_previous_siblings” of BeatifulSoup to find <strong> tag?

Question

asked Jun 23, 2020 in Data Science by blackindya (18.4k points)

I'm trying to do web scraping on a page that doesn't use much h1, h2, h3 structures, etc. It predominantly uses the strong tag. I want to search for a specific word (in a p tag) and, if I find it, also take the texts from above levels (tagged with strong) ...

I noticed that my lists created with the command I .find_previous_siblings ('strong') return blank list. While if I use soup.body.findAll ('strong') it works, returning a huge list of items (is what I need!!)

How to get the list of strong tags using the function find_previous_siblings??

Examples / This worked (and print a huge list):

url = 'http://www.mpsp.mp.br/portal/page/portal/DO_Estado/2020/DO_20-06-2020.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.body.findAll('strong'):
  print(i.text.strip())
Not worked (print empty list):
url = 'http://www.mpsp.mp.br/portal/page/portal/DO_Estado/2020/DO_20-06-2020.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.body.contents:
  if isinstance(i, element.NavigableString):
  continue
  if isinstance(i, element.Tag):
  texts = i.text
  if texts == 'HELENA BONILHA DE TOLEDO LEITE':
  print(i.find_previous_siblings('h1'))
  print(i.find_previous_siblings('strong'))
  print(i)

1 Answer

supriya · Answer 1 · 2020-06-23T06:01:27+0000

They are not siblings because strong is inside another paragraph tag, p.

I think you want find_previous like:

from bs4 import BeautifulSoup, element
import requests
url = 'http://www.mpsp.mp.br/portal/page/portal/DO_Estado/2020/DO_20-06-2020.html'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.body.contents:
if isinstance(i, element.NavigableString):
continue
if isinstance(i, element.Tag):
texts = i.text
if texts == 'HELENA BONILHA DE TOLEDO LEITE':
print(i.find_previous('h1'))
print(i.find_previous('strong'))
print(i)

Learn Python for Data Science to improve your technical knowledge.

How use “find_previous_siblings” of BeatifulSoup to find <strong> tag?

1 Answer

Related questions

Browse Categories