Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (16.4k points)
edited by

I'm searching for an HTML Parser module for Python that can assist me with getting the labels as Python list/dictionary/objects. 

On the off chance that I have a record of the structure: 

<html>

<head>Heading</head>

<body attr1='val1'>

    <div class='container'>

        <div id='class'>Something here</div>

        <div>Something else</div>

    </div>

</body>

</html>

at that point, it should give me an approach to get to the settled labels by means of the name or id of the HTML tag so I can essentially request that it get me the substance/text in the div tag with class='container' contained inside the body tag, or something comparable. 

In the event that you've utilized Firefox's "Assess component" highlight (see HTML), you would realize that it gives you all the labels in a decent settled way like a tree. 

I'd favor an implicit module yet that may be asking excessively a lot. 

I experienced a lot of inquiries on Stack Overflow and a couple of sites on the web and a large portion of them propose BeautifulSoup or lxml or HTMLParser however not many of these detail the usefulness and just end as a discussion over which one is quicker/more efficient.

1 Answer

0 votes
by (26.4k points)
edited by

So I can request that it get me the content/text in the div tag with class='container' contained inside the body tag, Or something comparable

Look at the below code:

try: 

    from BeautifulSoup import BeautifulSoup

except ImportError:

    from bs4 import BeautifulSoup

html = #the HTML code you've written above

parsed_html = BeautifulSoup(html)

print(parsed_html.body.find('div', attrs={'class':'container'}).text)

Want to become a expert in python? Join the python course fast!

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Oct 9, 2019 in Python by Sammy (47.6k points)

31k questions

32.9k answers

507 comments

693 users

...