Introduction to Coreference Resolution in NLP
What is Natural Language Processing(NLP)?
Most probably you may have experience of using Google Home with voice commands or finding something on the internet using Siri on your iPhone. These are two practical applications built using NLP.
Natural Language Processing is a branch of Artificial Intelligence that makes it possible for computers to interpret, understand and manipulate human language. It tries to fill the gap between computer understanding and human communication. Speech recognition, spelling checking, keyword search, and many more techniques come under NLP.
What is Coreference Resolution?
Coreference Resolution comes with NLP and it tries to find all linguistic expressions in a given text that refer to the same real-world entity. This is how it works.
Suppose you have to find the pronouns in a sentence and replace them with relevant nouns. Coreference resolution can be used to do that. It finds and groups the words which refer to the same entities and replaces pronouns with noun phrases. Let’s consider an example for better understanding.
Consider the following sentence.
“I gave my laptop to Andrew because he told me that he needs it to do his assignment” Peter said.
In Coreference Resolution first, it groups the words into several groups by considering entities. In this sentence the main entities are
- Andrew
- Peter
- Peter’s Laptop
According to those entities, it can divide nouns and pronouns into several groups.
After that, it replaces all the pronouns in the sentence with relevant nouns.
When do we use Coreference Resolution?
Coreference resolution is using in a variety of NLP tasks such as,
- Text understanding
- Document summarization
- Information extraction
- Sentiment analysis
- Machine translation
Pre-processing in Coreference resolution
Coreference resolution requires a pre-processing pipeline that consists of a variety of NLP tasks. It may includes
Named Entity Recognition: which finds and classifies named entities in a text into predefined categories such as locations, organizations or names of persons.
Entity Linking or Wikification: which aligns textual mentions of named entities to their corresponding entries in a knowledge base.
Part-of-Speech Tagging: a process of assigning parts of speech to words in a sentence.
Stemming: a process of cutting off the beginning or end of the word, using a list of common prefixes and suffixes that can be found in an inflected word.
Lemmatization: do a morphological analysis of words with the aim of remove inflectional endings and return the base or dictionary form of a word