I'm searching for an HTML Parser module for Python that can assist me with getting the labels as Python list/dictionary/objects.
On the off chance that I have a record of the structure:
<html>
<head>Heading</head>
<body attr1='val1'>
<div class='container'>
<div id='class'>Something here</div>
<div>Something else</div>
</div>
</body>
</html>
at that point, it should give me an approach to get to the settled labels by means of the name or id of the HTML tag so I can essentially request that it get me the substance/text in the div tag with class='container' contained inside the body tag, or something comparable.
In the event that you've utilized Firefox's "Assess component" highlight (see HTML), you would realize that it gives you all the labels in a decent settled way like a tree.
I'd favor an implicit module yet that may be asking excessively a lot.
I experienced a lot of inquiries on Stack Overflow and a couple of sites on the web and a large portion of them propose BeautifulSoup or lxml or HTMLParser however not many of these detail the usefulness and just end as a discussion over which one is quicker/more efficient.