Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

Here is the example of my html I am trying to extract from:

    <div class="small subtle link">                      

                    <a href="https://example.com" target=&quot;_blank&quot;  nofollow >Example</a>

                     This text!

            </div>

I want to grab the "This text!" but I keep getting "Example" with it when I do this

                myText=soup.findAll('div',{'class':re.compile('small subtle link')})

        if myText: 

            extractedText=myText.text.strip()

How do I leave out my text that is in a tag?

1 Answer

0 votes
by (36.8k points)

Here is the correct approach:

from bs4 import BeautifulSoup

html_src = \

    '''

    <html>

    <body>

    <div class="small subtle link">

        <a href="https://example.com" nofollow="" target='"_blank"'>

            Example

        </a>

        This text!

    </div>

    </body>

    </html>

    '''

soup = BeautifulSoup(html_src, 'lxml')

print(soup.prettify())

div_tag = soup.find(name='div', attrs={'class': 'small subtle link'})

div_content_text = []

for curr_text in div_tag.find_all(recursive=False, text=True):

    curr_text = curr_text.strip()

    if curr_text:

        div_content_text.append(curr_text)

print(div_content_text)

Edit: The solution by Sushil is quite clean, too.

Want to gain skills in Data Science with Python? Sign up today for this Data Science with Python and be a master in it

Related questions

Browse Categories

...