Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (47.6k points)

I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me:

>>> from BeautifulSoup import BeautifulSoup 

>>> soup = BeautifulSoup("<p>&pound;682m</p>") 

>>> text = soup.find("p").string 

>>> print text 

&pound;682m

How can I decode the HTML entities in text to get "£682m" instead of "&pound;682m".

1 Answer

0 votes
by (106k points)

You can use html.unescape()to decode HTML entities in Python string:

import html 

print(html.unescape('&pound;682m'))

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
asked Oct 9, 2019 in Python by Sammy (47.6k points)
0 votes
1 answer
asked Jul 22, 2019 in Java by Shubham (3.9k points)

Browse Categories

...