Veselin Stoyanov and James Mayfield; Tan Xu and Douglas W. Oard; Dawn Lawrie; Tim Oates and Tim Finin
Abstract—Entity linking refers to the task of assigning mentions in documents to their corresponding knowledge base entities. Entity linking is a central step in knowledge base population. Current entity linking systems do not explicitly model the discourse context in which the communication occurs. Nevertheless, the notion of shared context is central to the linguistic theory of pragmatics and plays a crucial role in Grice’s cooperative communication principle. Furthermore, modeling context facilitates joint resolution of entities, an important problem in entity linking yet to be addressed satisfactorily. This paper describes an approach to context-aware entity linking.
Given a mention of an entity in a document and a set of known entities in a knowledge base (KB), the entity linking task is to find the entity ID of the mentioned entity, or return NIL if the mentioned entity was previously unknown. Entity linking is a key requirement for knowledge base population; without it, accurately extracted attributes and relationships cannot be correctly inserted into an existing KB.
Recent research in entity linking has been driven by shared tasks at a variety of international conferences (Huang et al., 2008; McNamee and Dang, 2009). The TAC Knowledge Base Population track (Ji et al., 2011) provides a representative example. Participants are provided with a knowledge base derived from Wikipedia Infoboxes. Each query comprises a text document and a mention string found in that document.
The entity linking system must determine whether the entity referred to by the mention is represented in the KB, and if so, which entity it represents. State-of-the-art entity linking systems are quite good at linking person names (Ji et al., 2011). They rely on a variety of Machine Learning approaches and may incorporate different external resources such as name Gazetteers (Burman et al., 2011), precompiled estimates of entity popularities (Han and Sun, 2011) and modules trained to recognize name and acronym matches (Zhang et al., 2011).
Complete technical paper available as a PDF.