Parsing HTML using Python

Question

asked Dec 18, 2020 in Python by laddulakshana (16.4k points)
edited Dec 18, 2020 by laddulakshana

I'm searching for an HTML Parser module for Python that can assist me with getting the labels as Python list/dictionary/objects.

On the off chance that I have a record of the structure:

<html>
<head>Heading</head>
<body attr1='val1'>
<div class='container'>
<div id='class'>Something here</div>
<div>Something else</div>
</div>
</body>
</html>

at that point, it should give me an approach to get to the settled labels by means of the name or id of the HTML tag so I can essentially request that it get me the substance/text in the div tag with class='container' contained inside the body tag, or something comparable.

In the event that you've utilized Firefox's "Assess component" highlight (see HTML), you would realize that it gives you all the labels in a decent settled way like a tree.

I'd favor an implicit module yet that may be asking excessively a lot.

I experienced a lot of inquiries on Stack Overflow and a couple of sites on the web and a large portion of them propose BeautifulSoup or lxml or HTMLParser however not many of these detail the usefulness and just end as a discussion over which one is quicker/more efficient.

1 Answer

hari_sh · Answer 1 · 2020-12-18T04:58:34+0000

So I can request that it get me the content/text in the div tag with class='container' contained inside the body tag, Or something comparable

Look at the below code:

try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from bs4 import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)

Want to become a expert in python? Join the python course fast!

Parsing HTML using Python

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources