I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me:

>>> from BeautifulSoup import BeautifulSoup 

>>> soup = BeautifulSoup("<p>&pound;682m</p>") 

>>> text = soup.find("p").string 

>>> print text 


How can I decode the HTML entities in text to get "£682m" instead of "&pound;682m".

1 Answer

You can use html.unescape()to decode HTML entities in Python string:

import html 


