0 votes
1 view
in Python by (47.8k points)

I have a Unicode string in Python, and I would like to remove all the accents (diacritics).

I found on the Web an elegant way to do this in Java:

  1. convert the Unicode string to its long normalized form (with a separate character for letters and diacritics)

  2. remove all the characters whose Unicode type is "diacritic".

Do I need to install a library such as pyICU or is this possible with just the python standard library? And what about python 3?

Important note: I would like to avoid code with an explicit mapping from accented characters to their non-accented counterpart.

1 Answer

0 votes
by (107k points)

The best way to remove accents in a Python Unicode string is to Unidecode, it is the correct answer for this. It renders any Unicode string into the closest possible representation in ASCII text.

An example that illustrates the use of unidecode:-

from unidecode import unidecode

accented_string = u'Málaga'

unaccented_string = unidecode.unidecode(accented_string)

Related questions

0 votes
1 answer
asked Sep 27, 2019 in Python by Sammy (47.8k points)
0 votes
1 answer
asked Feb 10, 2020 in Python by Rajesh Malhotra (19.9k points)
0 votes
2 answers
Welcome to Intellipaat Community. Get your technical queries answered by top developers !