The ICMI Eating Analysis & Tracking Challenge (ICMI 2018 EAT)
6th Emotion Recognition in the Wild Challenge (EmotiW)

The ICMI Eating Analysis & Tracking Challenge (ICMI 2018 EAT)

The multimodal recognition of eating condition - whether a person is eating or not - and if yes, which food type, is not yet a fully established area in the domain of speech and video processing. However, research in this domain has many promising applications for future multimodal interfaces such as: adapting speech recognition or lip reading systems to different eating conditions (e.g. dictation systems), health (e.g. ingestive behaviour), or security monitoring (e.g., when eating is not allowed).

To this regard, we propose a new classification task in the area of user data analysis, namely audio-visual classification of user eating conditions and leverage the audio-visual iHEARu-EAT database [1, 2] which has been made available for research purposes upon request.

This database contains around 1.4 k utterances (2.9 hours of audio/video recordings) taken from 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers) while eating six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps). The data contains read speech as well as spontaneous speech combined with transcriptions.

We define three different Sub-Challenges based on classification tasks in which participants are encouraged to use speech and/or video recordings:

  • Food-type Sub-Challenge: Perform seven-class food classification per utterance (real-world scenario: enhancement of automatic speech recognition and lip reading under eating conditions)
  • Food-likability Sub-Challenge: Recognize the degree of likeability (real-world scenario: market testing in food industry and in advertisements)
  • Chew and Speak Sub-Challenge: Recognize the level of difficulty to speak while eating (real-world scenario: detection of speaking disorders)

Participants in the ICMI 2018 EAT Challenge can perform one or more of these Sub-Challenges. For all Sub-Challenges, a target class label has to be predicted per audiovisual clip, where each file contains one full speech utterance. Participants can employ their own features and machine learning algorithms; however, a standard feature set is provided that may be used for both audio and video data. Baseline predictions as well as the baseline code in the three given Sub-Challenges will be provided.

EAT Important Dates:

Release of training data and evaluation script April 4, 2018
Release of the test data May 16, 2018
Paper submission deadline May 30, 2018
Notifications July 18, 2018
Camera ready July 31, 2018

[1] S. Hantke, F. Weninger, R. Kurle, F. Ringeval, A. Batliner, A. El-Desoky Mousa, and B. Schuller, “I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Types, Use-Cases, and Impact on ASR Performance,” PLoS ONE, vol. 11, no. 5, pp. 1–24, May 2016.

[2] B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Ho¨nig, J. R. Orozco-Arroyave, E. N¨oth, Y. Zhang, and F. Weninger, “The INTERSPEECH 2015 Computational Paralinguistics Challenge: Degree of Nativeness, Parkinson’s & Eating Condition,” in Proceedings INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association. Dresden, Germany: ISCA, September 2015, pp. 478–482.

6th Emotion Recognition in the Wild Challenge (EmotiW)

The sixth Emotion Recognition in the Wild (EmotiW) 2018 challenge will be held at ACM International Conference on Multimodal Interaction (ICMI) 2018, Colarado. EmotiW 2018 consists of three sub-challenges:

  1. Engagement in the Wild
  2. Group-based Emotion Recognition
  3. Audio-video Emotion Recognition

Please visit the EmotiW 2018 website for important dates, information and updates:

ICMI 2018 ACM International Conference on Multimodal Interaction. Copyright © 2018-2018