Grand Challenges

Developing systems that can robustly understand human-human communication or respond to human input requires identifying the best algorithms and their failure modes. In fields such as computer vision, speech recognition, and computational linguistics, the availability of datasets and common tasks have led to great progress. This year we invited the ICMI community to collectively define and tackle scientific Grand Challenges in multimodal interaction for the next 5 years. We received a good response to the call, and we are hosting three Challenge events at the ICMI 2012 conference. Multimodal Grand Challenges are driven by ideas that are bold, innovative, and inclusive. We hope they will inspire new ideas in the ICMI community and create momentum for future collaborative work.

Grand Challenge Chairs

Daniel Gatica-Perez
Idiap Research Institute

Stefanie Tellex
Massachusetts Institute of Technology

2nd International Audio/Visual Emotion Challenge and Workshop - AVEC 2012

The Audio/Visual Emotion Challenge and Workshop (AVEC 2012) will be the second competition event aimed at comparison of multimedia processing and machine learning methods for automatic audio, visual and audiovisual emotion analysis, with all participants competing under strictly the same conditions. The goal of the challenge is to provide a common benchmark test set for individual multimodal information processing and to bring together the audio and video emotion recognition communities, to compare the relative merits of the two approaches to emotion recognition under well-defined and strictly comparable conditions and establish to what extent fusion of the approaches is possible and beneficial. A second motivation is the need to advance emotion recognition systems to be able to deal with naturalistic behavior in large volumes of un-segmented, non-prototypical and non-preselected data as this is exactly the type of data that both multimedia retrieval and human-machine/human-robot communication interfaces have to face in the real world.

We are calling for teams to participate in emotion recognition from acoustic audio analysis, linguistic audio analysis, video analysis, or any combination of these. As benchmarking database the SEMAINE database of naturalistic dialogues will be used. Emotion will have to be recognized in terms of continuous time, continuous valued dimensional affect in four dimensions: arousal, expectation, power and valence. Besides participation in the Challenge we are calling for papers addressing the overall topics of this workshop, in particular works that address the differences between audio and video processing of emotive data, and the issues concerning combined audio-visual emotion recognition.

Please visit our website for more information:


Björn Schuller
Institute for Man-Machine Communication, Technische Universität München, Germany

Michel Valstar
Intelligent Behaviour Understanding Group, Imperial College London, U.K.

Roddy Cowie
School of Psychology, Queen's University Belfast, U.K.

Maja Pantic
Intelligent Behaviour Understanding Group, Imperial College London, U.K.

Important dates:

  • Paper submission: July 21, 2012 July 31, 2012
  • Notification of acceptance: August 6, 2012 August 14, 2012
  • Camera ready paper: August 15, 2012 August 18, 2012
  • Workshop: October 22, 2012

Haptic Voice Recognition Grand Challenge

Haptic Voice Recognition (HVR) Grand Challenge 2012 is a research oriented competition designed to bring together researchers across multiple disciplines to work on Haptic Voice Recognition (HVR), a novel multimodal text entry method for modern mobile devices. HVR combines both voice and touch inputs to achieve better efficiency and robustness. As it is now a commonplace that modern portable devices are equipped with both microphones and touchscreen display, it will be interesting to explore possible ways of enhancing text entry on these devices by combining information obtained from these sensors. The purpose of this grand challenge is to define a set of common challenge tasks for researchers to work on in order to address the challenges faced and to bring the technology to the next frontier. Basic tools and setups are also provided to lower the entry barrier so that research teams can participate in this grand challenge without having to work on all aspects of the system. This grand challenge will also be accompanied by a workshop, which will be held at the International Conference on Multimodal Interaction (ICMI) 2012. Participants are given the opportunity to share their innovative findings and engage in discussions concerning current and future research directions during the workshop. Participants will also submit papers to the workshop to report their research findings. Accepted papers will be included in the proceedings of ICMI 2012.

Please visit our website for more information:


Dr. Khe Chai SIM
School of Computing, National University of Singapore

Dr. Shengdong Zhao
School of Computing, National University of Singapore

Dr. Kai Yu
Engineering Department, Cambridge University

Dr. Hank Liao
Google Inc.

Important dates:

  • Release of development data: March 1, 2012
  • Release of challenge data: July 1, 2012
  • Paper submission: July 31, 2012
  • Notification of acceptance: August 13, 2012
  • Camera ready paper: August 20, 2012

D-META Grand Challenge: Data sets for Multimodal Evaluation of Tasks and Annotations

The D-META (Data sets for Multimodal Evaluation of Tasks and Annotations) Grand Challenge, sets up the basis for comparison, analysis, and further improvement of multimodal data annotations and multimodal interactive systems. Such machine learning-based challenges do not exist in the multimodal interaction community. The main goal of this Grand Challenge is to foster research and development in multimodal communication and to further elaborate algorithms and techniques for building various multimodal applications. Held by two coupled pillars, method benchmarking and annotation evaluation, the D-META challenge envisions a starting point for transparent and publicly available application and annotation evaluation on multimodal data sets.

Please visit our website for more information:


Xavier Alameda-Pineda
Perception Team, INRIA Rhône-Alpes, University of Grenoble, France

Dirk Heylen
Human Media Interaction, University of Twente, The Netherlands

Kristiina Jokinen
Department of Behavioural, University of Helsinki, Finland

Important dates:

  • Paper submission: July 31, 2012
  • Notification of acceptance: August 24, 2012
  • Camera ready paper: September 14, 2012

BCI Grand Challenge Brain-Computer Interfaces as intelligent sensors for enhancing Human-Computer Interaction

The field of physiological computing consists of systems that use data from the human nervous system as control input to a technological system. Traditionally these systems have been grouped into two categories, those where physiological data is used as a form of input control and a second where spontaneous changes in physiology are used to monitor the psychological state of the user. Brain-Computer Interfaces (BCI) are traditionally conceived as a control for interfaces, a device that allows you to "act on" external devices as a form of input control. However, most BCI do not provide a reliable and efficient means of input control and are difficult to learn and use relative to other available modes. We propose to change the conceptual use of "BCI as an actor" (input control) into "BCI as an intelligent sensor'' (monitor). This shift of emphasis promotes the capacity of BCI to represent spontaneous changes in the state of the user in order to induce intelligent adaptation at the interface. BCIs can be increasingly used as intelligent sensors, which "read" passive signals from the nervous system and infer user states to adapt human-computer, human-robot, or human-human interaction (resp. HCI, HRI, HHI). This perspective on BCI challenges researchers to understand how information about the user state should support different types of interaction dynamics, from supporting the goals and needs of the user to conveying state information to other users. What adaptation to which user state constitutes opportune support? How does the feedback of the changing HCI and HRI affect brain signals? Many research challenges need to be tackled here.

Please visit the BCI Grand Challenge website for more information:


Femke Nijboer
University of Twente, The Netherlands

Mannes Poel
University of Twente, The Netherlands

Anton Nijholt
University of Twente, The Netherlands

Egon L. van den Broek
TNO Technical Sciences, The Netherlands

Stephen Fairclough
Liverpool John Moore University, United Kingdom

Important dates:

  • Deadline for submission: June 15 June30 , 2012
  • Notification of acceptance: July 7, 2012
  • Final papers due: August 15, 2012

ICMI 2012 ACM International Conference on Multimodal Interaction. 22-26th October 2012, Santa Monica, Californica. Copyright © 2010-2023   |