It depends upon the nature of your project i.e
Natural language processing and Computational linguistics can be used for information extraction. They provide tools to extract features from the text information and apply training, scoring, or classification.
Some projects for this topic are: POS (part-of-speech) tagging, and named entity recognition (ability to recognize names, places, and dates from the plain text). The main part of the information extraction is search. It includes:
Identify and mark the sentence, phrase, and paragraph boundaries
Acronym normalization and tagging
Lemmatization / Stemming
You can learn more about search here.