Veselin Stoyanov and James Mayfield; Tan Xu and Douglas W. Oard; Dawn Lawrie; Tim Oates and Tim Finin
Abstract. Entity linking refers to the task of assigning mentions in documents to their corresponding knowledge base entities. Entity linking is a central step in knowledge-base population. Current entity linking systems do not explicitly model the discourse context in which communication occurs. Nevertheless, the notion of shared context is central to the linguistic theory of pragmatics and plays a crucial role in Grice’s cooperative communication principle. Furthermore, modeling context facilitates joint resolution of entities, an important problem in entity linking yet to be addressed satisfactorily. This paper describes an approach to context-aware entity linking.
Given a mention of an entity in a document and a set of known entities in a knowledge base (KB), the entity linking task is to find the entity ID of the mentioned entity or return NIL if the said entity was previously unknown. Entity linking is a crucial requirement for knowledge-base population; without it, accurately extracted attributes and relationships cannot be correctly inserted into an existing KB.
Shared tasks have driven recent research in entity linking at various international conferences (Huang et al., 2008; McNamee and Dang, 2009). The TAC Knowledge Base Population track (Ji et al., 2011) provides a representative example. Participants are provided with a knowledge base derived from Wikipedia Infoboxes. Each query comprises a text document and a mention string found in that document.
The entity linking system must determine whether the entity referred to by the mention is represented in the KB, and if so, which entity it represents. State-of-the-art entity linking systems are quite good at linking person names (Ji et al., 2011). They rely on a variety of Machine Learning approaches. They may also incorporate different external resources such as name Gazetteers (Burman et al., 2011), precompiled estimates of entity popularities (Han and Sun, 2011) and modules trained to recognize name and acronym matches (Zhang et al., 2011).
Complete technical paper available as a PDF.