Project 2 — Perception & Description of Images

Within ULM-2, Emiel van Miltenburg focusses on the perception of images and sounds.

If we ever want to communicate with computers, it is essential that they understand the close ties between the language we use and the world around us. In this second Spinoza project, we study how people talk about sounds and images, in order to understand what it takes for machines to do the same. In this work, we emphasize the role of world knowledge and everyday expectations. When you ask people to describe a sound or an image, people rarely take those sounds and images at face value. Rather, they start to interpret and (re-)contextualize whatever is presented to them. One of the challenges in this project is to show how the language people use to describe sounds and images can be traced back to their perspective on the world.

Picture of a herd of sheep, with a shepherd and a mule, in a rural landscape.
 
Picture of sheep overlaid with eye-tracking data.
Picture of a herd of sheep, with a shepherd and a mule, in a rural landscape.
 
Picture of sheep overlaid with eye-tracking data.
Photo by Jacinta Lluch Valero (CC BY-SA 2.0)

The Dutch Image Description and Eye-tracking Corpus (DIDEC) contains 307 images from the MS COCO dataset, provided with eye-tracking data and spoken descriptions.

Datasets

The person that took the picture may have a very different background or knowledge about the context
 
than the person who’s labelling the image.

Selected publications

Semantic relations between Freesound tags — Demo by Emiel van MiltenburgSound similarity by Emiel van Miltenburg. The graph shows tags from the Freesound Database, clustered by their similarity.