0 votes
1 view
in Python by (41.4k points)

I'm parsing some HTML with Beautiful Soup 3, but it contains HTML entities which Beautiful Soup 3 doesn't automatically decode for me:

>>> from BeautifulSoup import BeautifulSoup 

>>> soup = BeautifulSoup("<p>&pound;682m</p>") 

>>> text = soup.find("p").string 

>>> print text 

&pound;682m

How can I decode the HTML entities in text to get "£682m" instead of "&pound;682m".

1 Answer

0 votes
by (94.3k points)

You can use html.unescape()to decode HTML entities in Python string:

import html 

print(html.unescape('&pound;682m'))

Related questions

0 votes
1 answer
0 votes
1 answer
asked Oct 9 in Python by Sammy (41.4k points)
0 votes
1 answer
asked Jul 22 in Java by Shubham (3.9k points)
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...