Page Header Logo
TEI of Athens eJournals

Text Segmentation Using Named Entity Recognition and Co-Reference Resolution in Greek Texts

Pavlina Fragkou

Abstract


In this paper we examine the benefit of performing named entity recognition and co-reference resolution to a Greek corpus used for text segmentation. Segments consist of portions among one of the 300 documents published by ten different authors in the Greek newspaper "To Vima". The aim here is to examine whether the combination of text segmentation and information extraction (and most specifically the named entity recognition and co-reference resolution steps) can prove to be beneficial for the identification of the various topics that appear in a document. Named entity recognition was performed using an already existing tool which was trained on a similar corpus. The produced annotations were manually corrected and enriched in order to cover four types of named entities (i.e. person name, organization, location and time). Coreference resolution and most specifically substitution of every reference of the same instance with the same named entity identifier was performed in a subsequent step. The evaluation using three well known text segmentation algorithms leads to the conclusion that, the benefit highly depends on the segment's topic, the number of named entity instances appearing in it, as well as the segment's length.

Keywords


Text segmentation, Named entity recognition, Co-reference resolution, Information extraction

Full Text: PDF

Refbacks

  • There are currently no refbacks.

The application for presenting electronic journals TEI developed within subproject 2 "electronic publishing service" the Act "Development Services Digital Library of TEI" and financed by the operational program "Digital Convergence", NSRF 2007-2013.