Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

Let us assume we have the HTML-pattern like

<p class='cls1'> Hello </p>

So I want to find the tag <p> (and I don't know what tag is around the text for now) by searching "Hello" using BS4.

It should be something like this:

full_string = soup.find(text=re.compile('Hello'))

full_string.get_parent_tag() # <p>

full_string.get_parent_class() # cls1

Is it possible in the BS4? 

1 Answer

0 votes
by (36.8k points)

Yes, it's possible.

import re

from bs4 import BeautifulSoup

your_html = """<p class='cls1'> Hello </p>"""

print(BeautifulSoup(your_html, "html.parser").find_all(lambda t: t.name == "p" and re.compile("Hello")))

Output:

[<p class="cls1"> Hello </p>]

If you don't know the tag you're after, you could try this:

from lxml import html

your_html = """<p class='cls1'> Hello </p>"""

print(html.fromstring(your_html).xpath("//*[contains(text(), 'Hello')]"))

Output:

[<Element p at 0x7f2b172ae5e0>]

Want to be a master in Data Science? Enroll in this Data Science Courses 

Browse Categories

...