How to resolve Coreference Resolution using Python?

Kaveesha Baddage
3 min readJun 27, 2021

Coreference resolution is a challenging part of natural language processing which is used in information retrieval, machine translation, semantic search, and other decision support systems. If you are not familiar with ‘Coreference Resolution’ you could get a better understanding of that from this article.

We do Coreference Resolution with the aim of find, group and then substitute any ambiguous expressions with real-world entities they are referring to. This article is focused on how we can simply do the Coreference Resolution’ using the Python programming language.

Several open-source libraries can be used to resolve Coreference Resolution. This example is done using spaCy and NeuralCoref modules. spaCy is an open-source software library for advanced natural language processing. And NeuralCoref is a pipeline extension for spaCy which was introduced by Huggingface .

I used the following sample web text to try Coreference Resolution.

Alex said that he wants to go his home. Then Peter gave the key of his car to Alex saying that he needs it back on Monday.

And I used the following source code which was written using the previously mentioned python libraries.

image representation of the sample source code

Initially, it imports the relevant libraries into the runtime. Then it loads the model which returns a language object containing all components and data needed to process the text. Here I used the ‘en_core_web_sm’ pipeline package.

spaCy pipeline packages have a naming convention of [lang]_[name]. According to that, ‘en’ is belongs to [lang] component and ‘core_web_sm’ belongs to [name] component. Furthermore [name] is divided into three components as follows.

  1. Type: Describe the pipeline’s capabilities

Available types

  • Core — a general-purpose pipeline with vocabulary, syntax, entities and word vectors
  • Dep — a pipeline only with vocabulary and syntax

2. Genre: Describe the type of text that the pipeline is trained on

Available Genres

  • web
  • news

3. Size: Describe the pipeline size

Available Sizes

  • sm
  • md
  • lg

According to the naming conversion, the selected smaller pipeline package ‘‘en_core_web_sm’’ can be used in web texts in the English language as a general-purpose pipeline with vocabulary, syntax, entities, and word vectors.

After that NeuralCoref is instantiated and add it to SpaCY’s pipeline of annotations. Finally, it used the following NeuralCoref attributes to resolve the Coreference Resolution and get the results.

Input for the Coreference Resolution solving program as follows.

Alex said that he wants to go his home. Then Peter gave the key of his car to Alex saying that he needs it back on Monday.

And below shows the final output after resolving the Coreference Resolution.

Alex said that Alex wants to go Alex home. Then Peter gave the key of Peter car to Alex saying that Peter needs it back on Monday.

As you can see It finds and groups the words which refer to the same entities and replaces pronouns with noun phrases.

Please find the Google Colaboratory notebook contains the above example.

--

--

Kaveesha Baddage

Software Engineer | AWS Certified Solution Architect - Associate