How to perform NER on true case, then lemmatization on lower case, with spaCy

Question

asked Jul 31, 2019 in Python by Rajesh Malhotra (19.9k points)

I try to lemmatize a text using spaCy 2.0.12 with the French model fr_core_news_sm. Morevoer, I want to replace people names by an arbitrary sequence of characters, detecting such names using token.ent_type_ == 'PER'. Example outcome would be "Pierre aime les chiens" -> "~PER~ aimer chien".

The problem is I can't find a way to do both. I only have these two partial options:

I can feed the pipeline with the original text: doc = nlp(text). Then, the NER will recognize most people names but the lemmas of words starting with a capital won't be correct. For example, the lemmas of the simple question "Pouvons-nous faire ça?" would be ['Pouvons', '-', 'se', 'faire', 'ça', '?'], where "Pouvons" is still an inflected form.

I can feed the pipeline with the lower case text: doc = nlp(text.lower()). Then my previous example would correctly display ['pouvoir', '-', 'se', 'faire', 'ça', '?'], but most people names wouldn't be recognized as entities by the NER, as I guess a starting capital is a useful indicator for finding entities.

My idea would be to perform the standard pipeline (tagger, parser, NER), then lowercase, and then lemmatize only at the end.

However, lemmatization doesn't seem to have its own pipeline component and the documentation doesn't explain how and where it is performed.

So my question is: how to choose when to perform the lemmatization and which input to give to it?

1 Answer

Related questions

0 votes

2 answers

How to convert upper case letters to lower case

asked Sep 16, 2019 in Python by Sammy (47.6k points)

0 votes

1 answer

Find duplicates in column x, then remove row that has lower value in column y

asked Apr 29, 2020 in R Programming by ashely (50.2k points)

0 votes

1 answer

Convert JavaScript string to be all lower case

asked Apr 13, 2021 in Java by sheela_singh (9.5k points)

0 votes

1 answer

converting to upper and lower case in java

asked Feb 17, 2021 in Java by Jake (7k points)

+2 votes

1 answer

How do I perform an IF…THEN in an SQL SELECT?

asked Jun 26, 2019 in SQL by Tech4ever (20.3k points)

Anirudh Singh · Answer 1 · 2019-07-31T05:14:42+0000

You can use the most recent version of spacy instead. The French lemmatizer has been improved a lot in 2.1. So your issue would be resolved by upgrading the spacy package.

To know more about this you can have a look at the following video:-

How to perform NER on true case, then lemmatization on lower case, with spaCy

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources