Within ULM-1, Rubén Izquierdo & Marten Postma focus on Word Sense Disambiguation (WSD).
Word Sense Disambiguation is defined as the task of deciding on the meaning of a word by computers. Humans are very good at this task. So good, in fact, that they do not even notice how complicated the task is. For example, the sentence “After going for a run, the man took a shower.” is perfectly understandable and would not be considered difficult by most. However, for a computer, this sentence alone has more than 1.3 million meaning combinations.
For the English sentence “After going for a run, the man took a shower”, the number of possible meaning combinations are shown. The number evokes how complex it would be for a machine to understand natural language sentences.
The ambiguity of natural language is a problem that is not well understood. We do not have a clear idea about the size and complexity of the problem. We are investigating the problem systematically by determining the relation between 3 variables: Word (W) – Meaning (M) – context (C). The goal is to more properly define this complex relation and apply this knowledge to come closer to the optimal solution for the Word Sense Disambiguation task.
|
Resources
In order to get a better understanding the ambiguity of natural language, several resources have been created within the project:
- WSD-gold-standards-analysis: collection of Ipython notebooks to analyze and visualize the gold standards of WSD test data
- WordNetMapper: This repo provides the possibility to map between lexical keys | offsets | ilidefs from one wordnet version to the other [“16″,”17″,”171″,”20″,”21″,”30”]. It makes use of the index.sense files from WordNet (http://wordnet.princeton.edu/) and the automatically generated mappings between WordNet offsets (http://nlp.lsi.upc.edu/tools/download-map.php)
- Semantic_class_manager: python module for accessing to diverse semantic classes: BLC, WordNet Domains and SuperSense
- Sval_systems: output from participating systems on WSD all-words tasks in a common format
Publications
The following publications are the main publications from the ULM project 1:
- Marten Postma, Ruben Izquierdo, Eneko Agirre, German Rigau, and Piek Vossen. Addressing the mfs bias in mfs systems. In LREC 2016, 2016.
- Marten Postma, Ruben Izquierdo, and Piek Vossen. Vua-background : When to use background information to perform word sense disambiguation. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 345–349, Denver, Colorado, June 2015. Association for Computational Linguistics.
- Ruben Izquierdo, Marten Postma, and Piek Vossen. Topic modeling and word sense disambiguation on the ancora corpus. In Procesamiento del Lenguaje Natural, 55:15–22, 2015.
|