The multimodal recognition of eating condition - whether a person is eating or not - and if yes, which food type, is a new research domain in the area of speech and video processing that has many promising applications for future multimodal interfaces such as: adapting speech recognition or lip reading systems to different eating conditions (e.g. dictation systems), health (e.g. ingestive behaviour), or security monitoring (e.g., when eating is not allowed).

We therefore invite for participation in the first open, audio-visual challenge under strictly comparable conditions, namely audio-visual classification of eating conditions and leverage the audio-visual iHEARu-EAT database.

We define three Sub-Challenges based on classification tasks in which participants are encouraged to use speech and/or video recordings:

  1. Food-type Sub-Challenge: Perform seven-class food classification per utterance
  2. Food-likability Sub-Challenge: Recognize the subjects' food likability rating
  3. Chew and Speak Sub-Challenge: Recognize the level of difficulty to speak while eating

The participants are free to provide results to one or several Sub-Challenges. All Sub-Challenges allow participants to find their own acoustic/visual features and/or their own machine learning model. Standard acoustic and visual feature sets will be provided featuring recent end-to-end deep learning and more conventional bags of cross-modal words that may be used by the participants.

For futher important information, dates and updates please visit the EAT website:

The sixth Emotion Recognition in the Wild (EmotiW) 2018 challenge will be held at ACM International Conference on Multimodal Interaction (ICMI) 2018, Colarado. EmotiW 2018 consists of three sub-challenges:

  1. Engagement in the Wild
  2. Group-based Emotion Recognition
  3. Audio-video Emotion Recognition

Please visit the EmotiW 2018 website for important dates, information and updates:

