Platinum Sponsors

Gold Sponsors



Silver Sponsors

Bronze Sponsors

Proceedings of the 20th ACM International Conference on Multimodal Interaction

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Full Citation in the ACM Digital Library

SESSION: Keynote & Invited Talks

A Multimodal Approach to Understanding Human Vocal Expressions and Beyond

Narayanan Shrikanth (Shri)

Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in ...

Using Technology for Health and Wellbeing

Czerwinski Mary

Abstract: How can we create technologies to help us reflect on and change our behavior, improving our health and overall wellbeing? In this talk, I will briefly describe the last several years of work our research team has been doing in this area. We ...

Reinforcing, Reassuring, and Roasting: The Forms and Functions of the Human Smile

Niedenthal Paula M.

What are facial expressions for? In social-functional accounts, they are efficient adaptations that are used flexibly to address the problems inherent to successful social living. Facial expressions both broadcast emotions and regulate the emotions of ...

Put That There: 20 Years of Research on Multimodal Interaction

Crowley James

Humans interact with the world using five major senses: sight, hearing, touch, smell, and taste. Almost all interaction with the environment is naturally multimodal, as audio, tactile or paralinguistic cues provide confirmation for physical actions and ...

SESSION: Session 1: Multiparty Interaction

Multimodal Dialogue Management for Multiparty Interaction with Infants

Nasihati Gilani Setareh

We present dialogue management routines for a system to engage in multiparty agent-infant interaction. The ultimate purpose of this research is to help infants learn a visual sign language by engaging them in naturalistic and socially contingent ...

Predicting Group Performance in Task-Based Interaction

Murray Gabriel

We address the problem of automatically predicting group performance on a task, using multimodal features derived from the group conversation. These include acoustic features extracted from the speech signal, and linguistic features derived from the ...

Multimodal Modeling of Coordination and Coregulation Patterns in Speech Rate during Triadic Collaborative Problem Solving

Stewart Angela E.B.

We model coordination and coregulation patterns in 33 triads engaged in collaboratively solving a challenging computer programming task for approximately 20 minutes. Our goal is to prospectively model speech rate (words/sec) - an important signal of ...

Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill Level

Ishii Ryo

We explored the gaze behavior towards the end of utterances and dialogue act (DA), i.e., verbal-behavior information indicating the intension of an utterance, during turn-keeping/changing to estimate empathy skill levels in multiparty discussions. This ...

SESSION: Session 2: Physiological Modeling

Automated Affect Detection in Deep Brain Stimulation for Obsessive-Compulsive Disorder: A Pilot Study

Cohn Jeffrey F.

Automated measurement of affective behavior in psychopathology has been limited primarily to screening and diagnosis. While useful, clinicians more often are concerned with whether patients are improving in response to treatment. Are symptoms abating, ...

Smell-O-Message: Integration of Olfactory Notifications into a Messaging Application to Improve Users' Performance

Maggioni Emanuela

Smell is a powerful tool for conveying and recalling information without requiring visual attention. Previous work identified, however, some challenges caused by user's unfamiliarity with this modality and complexity in the scent delivery. We are now ...

Generating fMRI-Enriched Acoustic Vectors using a Cross-Modality Adversarial Network for Emotion Recognition

Chao Gao-Yi

Automatic emotion recognition has long been developed by concentrating on modeling human expressive behavior. At the same time, neuro-scientific evidences have shown that the varied neuro-responses (i.e., blood oxygen level-dependent (BOLD) signals ...

Adaptive Review for Mobile MOOC Learning via Multimodal Physiological Signal Sensing - A Longitudinal Study

Pham Phuong

Despite the great potential, Massive Open Online Courses (MOOCs) face major challenges such as low retention rate, limited feedback, and lack of personalization. In this paper, we report the results of a longitudinal study on AttentiveReview2, a ...

Olfactory Display Prototype for Presenting and Sensing Authentic and Synthetic Odors

Salminen Katri

The aim was to study if odors evaporated by an olfactory display prototype can be used to affect participants' cognitive and emotionrelated responses to audio-visual stimuli, and whether the display can benefit from objective measurement of the odors. ...

SESSION: Session 3: Sound and Interaction

Evaluation of Real-time Deep Learning Turn-taking Models for Multiple Dialogue Scenarios

Lala Divesh

The task of identifying when to take a conversational turn is an important function of spoken dialogue systems. The turn-taking system should also ideally be able to handle many types of dialogue, from structured conversation to spontaneous and ...

Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics

Oviatt Sharon

This paper presents a summary and critical reflection on ten major opportunities and challenges for advancing the field of multimodal learning analytics (MLA). It identifies emerging technology trends likely to disrupt learning analytics, challenges ...

If You Ask Nicely: A Digital Assistant Rebuking Impolite Voice Commands

Bonfert Michael

Digital home assistants have an increasing influence on our everyday lives. The media now reports how children adapt the consequential, imperious language style when talking to real people. As a response to this behavior, we considered a digital ...

Detecting User's Likes and Dislikes for a Virtual Negotiating Agent

Langlet Caroline

This article tackles the issue of the detection of the user's likes and dislikes in a negotiation with a virtual agent for helping the creation of a model of user's preferences. We introduce a linguistic model of user's likes and dislikes as they are ...

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

Sterpu George

Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes ...

SESSION: Session 4: Touch and Gesture

Smart Arse: Posture Classification with Textile Sensors in Trousers

Skach Sophie

Body posture is a good indicator of, amongst other things, people's state of arousal, focus of attention and level of interest in a conversation. Posture is conventionally measured by observation and hand coding of videos or, more recently, through ...

!FTL, an Articulation-Invariant Stroke Gesture Recognizer with Controllable Position, Scale, and Rotation Invariances

Vanderdonckt Jean

Nearest neighbor classifiers recognize stroke gestures by computing a (dis)similarity between a candidate gesture and a training set based on points, which may require normalization, resampling, and rotation to a reference before processing. To ...

Pen + Mid-Air Gestures: Eliciting Contextual Gestures

Aslan Ilhan

Combining mid-air gestures with pen input for bi-manual input on tablets has been reported as an alternative and attractive input technique in drawing applications. Previous work has also argued that mid-air gestural input can cause discomfort and arm ...

Hand, Foot or Voice: Alternative Input Modalities for Touchless Interaction in the Medical Domain

Hatscher Benjamin

During medical interventions, direct interaction with medical image data is a cumbersome task for physicians due to the sterile environment. Even though touchless input via hand, foot or voice is possible, these modalities are not available for these ...

SESSION: Session 5: Human Behavior

How to Shape the Humor of a Robot - Social Behavior Adaptation Based on Reinforcement Learning

Weber Klaus

A shared sense of humor can result in positive feelings associated with amusement, laughter, and moments of bonding. If robotic companions could acquire their human counterparts' sense of humor in an unobtrusive manner, they could improve their skills ...

Using Interlocutor-Modulated Attention BLSTM to Predict Personality Traits in Small Group Interaction

Lin Yun-Shao

Small group interaction occurs often in workplace and education settings. Its dynamic progression is an essential factor in dictating the final group performance outcomes. The personality of each individual within the group is reflected in his/her ...

Toward Objective, Multifaceted Characterization of Psychotic Disorders: Lexical, Structural, and Disfluency Markers of Spoken Language

Vail Alexandria K.

Psychotic disorders are forms of severe mental illness characterized by abnormal social function and a general sense of disconnect with reality. The evaluation of such disorders is often complex, as their multifaceted nature is often difficult to ...

Multimodal Interaction Modeling of Child Forensic Interviewing

Ardulov Victor

Constructing computational models of interactions during Forensic Interviews (FI) with children presents a unique challenge in being able to maximize complete and accurate information disclosure, while minimizing emotional trauma experienced by the ...

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

Roddy Matthew

In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into ...

SESSION: Session 6: Artificial Agents

Estimating Visual Focus of Attention in Multiparty Meetings using Deep Convolutional Neural Networks

Otsuka Kazuhiro

Convolutional neural networks (CNNs) are employed to estimate the visual focus of attention (VFoA), also called gaze direction , in multiparty face-to-face meetings on the basis of multimodal nonverbal behaviors including head pose, direction of the ...

Detecting Deception and Suspicion in Dyadic Game Interactions

Ondras Jan

In this paper we focus on detection of deception and suspicion from electrodermal activity (EDA) measured on left and right wrists during a dyadic game interaction. We aim to answer three research questions: (i) Is it possible to reliably distinguish ...

Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

Shukla Abhinav

Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give ...

Automatic Recognition of Affective Laughter in Spontaneous Dyadic Interactions from Audiovisual Signals

Kantharaju Reshmashree B.

Laughter is a highly spontaneous behavior that frequently occurs during social interactions. It serves as an expressive-communicative social signal which conveys a large spectrum of affect display. Even though many studies have been performed on the ...

Population-specific Detection of Couples' Interpersonal Conflict using Multi-task Learning

Gujral Aditya

The inherent diversity of human behavior limits the capabilities of general large-scale machine learning systems, that usually require ample amounts of data to provide robust descriptors of the outcomes of interest. Motivated by this challenge, ...

SESSION: Poster Session 1

I Smell Trouble: Using Multiple Scents To Convey Driving-Relevant Information

Dmitrenko Dmitrijs

Cars provide drivers with task-related information (e.g. "Fill gas") mainly using visual and auditory stimuli. However, those stimuli may distract or overwhelm the driver, causing unnecessary stress. Here, we propose olfactory stimulation as a novel ...

"Honey, I Learned to Talk": Multimodal Fusion for Behavior Analysis

Tseng Shao-Yen

In this work we analyze the importance of lexical and acoustic modalities in behavioral expression and perception. We demonstrate that this importance relates to the amount of therapy, and hence communication training, that a person received. It also ...

TapTag: Assistive Gestural Interactions in Social Media on Touchscreens for Older Adults

Pandya Shraddha

Older adults want to live independently and at the same time stay socially active. We conducted contextual inquiry to understand what usability problems they face while interacting with social media on touch screen devices. We found that it is hard for ...

Gazeover -- Exploring the UX of Gaze-triggered Affordance Communication for GUI Elements

Aslan Ilhan

The user experience (UX) of graphical user interfaces (GUIs) often depends on how clearly visual designs communicate/signify "affordances", such as if an element on the screen can be pushed, dragged, or rotated. Especially for novice users figuring out ...

Dozing Off or Thinking Hard?: Classifying Multi-dimensional Attentional States in the Classroom from Video

Putze Felix

In this paper, we extract features of head pose, eye gaze, and facial expressions from video to estimate individual learners' attentional states in a classroom setting. We concentrate on the analysis of different definitions for a student's attention ...

Sensing Arousal and Focal Attention During Visual Interaction

Matthews Oludamilare

There are many mechanisms to sense arousal. Most of them are either intrusive, prone to bias, costly, require skills to set-up or do not provide additional context to the user's measure of arousal. We present arousal detection through the analysis of ...

Path Word: A Multimodal Password Entry Method for Ad-hoc Authentication Based on Digits' Shape and Smooth Pursuit Eye Movements

Almoctar Hassoumi

We present PathWord (PATH passWORD), a multimodal digit entry method for ad-hoc authentication based on known digits shape and user relative eye movements. PathWord is a touch-free, gaze-based input modality, which attempts to decrease shoulder surfing ...

Towards Attentive Speed Reading on Small Screen Wearable Devices

Guo Wei

Smart watches can enrich everyday interactions by providing both glanceable information and instant access to frequent tasks. However, reading text messages on a 1.5-inch small screen is inherently challenging, especially when a user's attention is ...

Understanding Mobile Reading via Camera Based Gaze Tracking and Kinematic Touch Modeling

Guo Wei

Despite the ubiquity and rapid growth of mobile reading activities, researchers and practitioners today either rely on coarse-grained metrics such as click-through-rate (CTR) and dwell time, or expensive equipment such as gaze trackers to understand ...

Inferring User Intention using Gaze in Vehicles

Jiang Yu-Sian

Motivated by the desire to give vehicles better information about their drivers, we explore human intent inference in the setting of a human driver riding in a moving vehicle. Specifically, we consider scenarios in which the driver intends to go to or ...

EyeLinks: A Gaze-Only Click Alternative for Heterogeneous Clickables

Figueiredo Pedro

In this paper, we introduce a novel gaze-only interaction technique called EyeLinks, which was designed i) to support various types of discrete clickables (e.g. textual links, buttons, images, tabs, etc.); ii) to be easy to learn and use; iii) to ...

EEG-based Evaluation of Cognitive Workload Induced by Acoustic Parameters for Data Sonification

Bilalpur Maneesh

Data Visualization has been receiving growing attention recently, with ubiquitous smart devices designed to render information in a variety of ways. However, while evaluations of visual tools for their interpretability and intuitiveness have been ...

A Multimodal Approach for Predicting Changes in PTSD Symptom Severity

Mallol-Ragolta Adria

The rising prevalence of mental illnesses is increasing the demand for new digital tools to support mental wellbeing. Numerous collaborations spanning the fields of psychology, machine learning and health are building such tools. Machine-learning models ...

Floor Apportionment and Mutual Gazes in Native and Second-Language Conversation

Umata Ichiro

Quantitative analysis of gazes between a speaker and listeners was conducted from the viewpoint of mutual activities in floor apportionment, with the assumption that mutual gaze plays an important role in coordinating speech interaction. We conducted ...

Estimating Head Motion from Egocentric Vision

Tsutsui Satoshi

The recent availability of lightweight, wearable cameras allows for collecting video data from a "first-person' perspective, capturing the visual world of the wearer in everyday interactive contexts. In this paper, we investigate how to exploit ...

A Multimodal-Sensor-Enabled Room for Unobtrusive Group Meeting Analysis

Bhattacharya Indrani

Group meetings can suffer from serious problems that undermine performance, including bias, "groupthink", fear of speaking, and unfocused discussion. To better understand these issues, propose interventions, and thus improve team performance, we need to ...

Multimodal Analysis of Client Behavioral Change Coding in Motivational Interviewing

Aswamenakul Chanuwas

Motivational Interviewing (MI) is a widely disseminated and effective therapeutic approach for behavioral disorder treatment. Over the past decade, MI research has identified client language as a central mediator between therapist skills and subsequent ...

End-to-end Learning for 3D Facial Animation from Speech

Pham Hai Xuan

We present a deep learning framework for real-time speech-driven 3D facial animation from speech audio. Our deep neural network directly maps an input sequence of speech spectrograms to a series of micro facial action unit intensities to drive a 3D ...

SESSION: Poster Session 2

Joint Discrete and Continuous Emotion Prediction Using Ensemble and End-to-End Approaches

AlBadawy Ehab A.

This paper presents a novel approach in continuous emotion prediction that characterizes dimensional emotion labels jointly with continuous and discretized representations. Continuous emotion labels can capture subtle emotion variations, but their ...

The Multimodal Dataset of Negative Affect and Aggression: A Validation Study

Lefter Iulia

Within the affective computing and social signal processing communities, increasing efforts are being made in order to collect data with genuine (emotional) content. When it comes to negative emotions and even aggression, ethical and privacy related ...

Keep Me in the Loop: Increasing Operator Situation Awareness through a Conversational Multimodal Interface

Robb David A.

Autonomous systems are designed to carry out activities in remote, hazardous environments without the need for operators to micro-manage them. It is, however, essential that operators maintain situation awareness in order to monitor vehicle status and ...

Simultaneous Multimodal Access to Wheelchair and Computer for People with Tetraplegia

Sahadat Nazmus

Existing assistive technologies often capture and utilize a single remaining ability to assist people with tetraplegia which is unable to do complex interaction efficiently. In this work, we developed a multimodal assistive system (MAS) to utilize ...

Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection

Schmidt Philip

Affect recognition aims to detect a person's affective state based on observables, with the goal to e.g. improve human-computer interaction. Long-term stress is known to have severe implications on wellbeing, which call for continuous and automated ...

Enhancing Multiparty Cooperative Movements: A Robotic Wheelchair that Assists in Predicting Next Actions

Fukuda Hisato

When an automatic wheelchair or a self-carrying robot moves along with human agents, prediction for the next possible actions by the participating agents, play an important role in realization of successful cooperation among them. In this paper, we ...

Multimodal Representation of Advertisements Using Segment-level Autoencoders

Somandepalli Krishna

Automatic analysis of advertisements (ads) poses an interesting problem for learning multimodal representations. A promising direction of research is the development of deep neural network autoencoders to obtain inter-modal and intra-modal ...

Survival at the Museum: A Cooperation Experiment with Emotionally Expressive Virtual Characters

Torre Ilaria

Correctly interpreting an interlocutor's emotional expression is paramount to a successful interaction. But what happens when one of the interlocutors is a machine? The facilitation of human-machine communication and cooperation is of growing importance ...

Human, Chameleon or Nodding Dog?

Zhang Leshao

Immersive virtual environments (IVEs) present rich possibilities for the experimental study of non-verbal communication. Here, the 'digital chameleon' effect, -which suggests that a virtual speaker (agent) is more persuasive if they mimic their ...

A Generative Approach for Dynamically Varying Photorealistic Facial Expressions in Human-Agent Interactions

Huang Yuchi

This paper presents an approach for generating photorealistic video sequences of dynamically varying facial expressions in human-agent interactions. To this end, we study human-human interactions to model the relationship and influence of one individual'...

Predicting ADHD Risk from Touch Interaction Data

Mock Philipp

This paper presents a novel approach for automatic prediction of risk of ADHD in schoolchildren based on touch interaction data. We performed a study with 129 fourth-grade students solving math problems on a multiple-choice interface to obtain a large ...

Exploring the Design of Audio-Kinetic Graphics for Education

Muehlbradt Annika

Creating tactile representations of visual information, especially moving images, is difficult due to a lack of available tactile computing technology and a lack of tools for authoring tactile information. To address these limitations, we developed a ...

RainCheck: Overcoming Capacitive Interference Caused by Rainwater on Smartphones

Tung Ying-Chao

Modern smartphones are built with capacitive-sensing touchscreens, which can detect anything that is conductive or has a dielectric differential with air. The human finger is an example of such a dielectric, and works wonderfully with such touchscreens. ...

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Liang Paul Pu

Emotion recognition is a core research area at the intersection of artificial intelligence and human communication analysis. It is a significant technical challenge since humans display their emotions through complex idiosyncratic combinations of the ...

Improving Object Disambiguation from Natural Language using Empirical Models

Prendergast Daniel

Robots, virtual assistants, and other intelligent agents need to effectively interpret verbal references to environmental objects in order to successfully interact and collaborate with humans in complex tasks. However, object disambiguation can be a ...

Tactile Sensitivity to Distributed Patterns in a Palm

Son Bukun

Tactile information in a palm is a necessary component in manipulating and perceiving large or heavy objects. Noting this, we investigate human sensitivity to tactile haptic feedback in a palm for an improved user interface design. To provide ...

Listening Skills Assessment through Computer Agents

Tanaka Hiroki

Social skills training, performed by human trainers, is a well-established method for obtaining appropriate skills in social interaction. Previous work automated the process of social skills training by developing a dialogue system that teaches social ...

SESSION: Doctoral Consortium (alphabetically by author's last name)

Using Data-Driven Approach for Modeling Timing Parameters of American Sign Language

Al-khazraji Sedeeq

While many organizations provide a website in multiple languages, few provide a sign-language version for deaf users, many of whom have lower written-language literacy. Rather than providing difficult-to-update videos of humans, a more practical ...

Unobtrusive Analysis of Group Interactions without Cameras

Bhattacharya Indrani

Group meetings are often inefficient, unorganized and poorly documented. Factors including "group-think," fear of speaking, unfocused discussion, and bias can affect the performance of a group meeting. In order to actively or passively facilitate group ...

Multimodal and Context-Aware Interaction in Augmented Reality for Active Assistance

Brun Damien

Augmented reality eyewear devices (e.g. glasses, headsets) are poised to become ubiquitous in a similar way than smartphones, by providing a quicker and more convenient access to information. There is theoretically no limit to their applicative area and ...

Interpretable Multimodal Deception Detection in Videos

Karimi Hamid

There are various real-world applications such as video ads, airport screenings, courtroom trials, and job interviews where deception detection can play a crucial role. Hence, there are immense demands on deception detection in videos. Videos contain ...

Attention Network for Engagement Prediction in the Wild

Kaur Amanjot

Analysis of the student engagement in an e-learning environment would facilitate effective task accomplishment and learning. Generally, engagement/disengagement can be estimated from facial expressions, body movements and gaze pattern. The focus of this ...

Data Driven Non-Verbal Behavior Generation for Humanoid Robots

Kucherenko Taras

Social robots need non-verbal behavior to make an interaction pleasant and efficient. Most of the models for generating non-verbal behavior are rule-based and hence can produce a limited set of motions and are tuned to a particular scenario. In contrast,...

Multi-Modal Multi sensor Interaction between Human andHeterogeneous Multi-Robot System

Mahi S M Al

I introduce a novel multi-modal multi-sensor interaction method between humans and heterogeneous multi-robot systems. I have also developed a novel algorithm to control heterogeneous multi-robot systems. The proposed algorithm allows the human operator ...

Responding with Sentiment Appropriate for the User's Current Sentiment in Dialog as Inferred from Prosody and Gaze Patterns

Nath Anindita

Multi-modal sentiment detection from natural video/audio streams has recently received much attention. I propose to use this multi-modal information to develop a technique, Sentiment Coloring , that utilizes the detected sentiments to generate effective ...

Strike A Pose: Capturing Non-Verbal Behaviour with Textile Sensors

Skach Sophie

This work searches to explore the potential of textile sensing systems as a new modality of capturing social behaviour. Hereby, the focus lies on evaluating the performance of embedded pressure sensors as reliable detectors for social cues, such as ...

Large Vocabulary Continuous Audio-Visual Speech Recognition

Sterpu George

We like to conversate with other people using both sounds and visuals, as our perception of speech is bimodal. Essentially echoing the same speech structure, we manage to integrate the two modalities and often understand the message better than with the ...

Multimodal Teaching and Learning Analytics for Classroom and Online Educational Settings

Thomas Chinchu

Automatic analysis of teacher student interactions is an interesting research problem in social computing. Such interactions happen in both online and class room settings. While teaching effectiveness is the goal in both settings, the mechanism to ...

Modeling Empathy in Embodied Conversational Agents: Extended Abstract

Yalçin Özge Nilay

This paper is intended to outline the PhD research that is aimed to model empathy in embodied conversational systems. Our goal is to determine the requirements for implementation of an empathic interactive agent and develop evaluation methods that is ...

SESSION: Demo and Exhibit Session

EVA: A Multimodal Argumentative Dialogue System

Rach Niklas

This work introduces EVA, a multimodal argumentative Dialogue System that is capable of discussing controversial topics with the user. The interaction is structured as an argument game in which the user and the system select respective moves in order to ...

Online Privacy-Safe Engagement Tracking System

Zhang Cheng

Tracking learners' engagement is useful for monitoring their learning quality. With an increasing number of online video courses, a system that can automatically track learners' engagement is expected to significantly help in improving the outcomes of ...

Multimodal Control of Lighter-Than-Air Agents

Lofaro Daniel

This work describes our approach to controlling lighter-than-air agents using multimodal control via a wearable device. Tactile and gesture interfaces on a smart watch are used to control the motion and altitude of these semi-autonomous agents. The ...

MIRIAM: A Multimodal Interface for Explaining the Reasoning Behind Actions of Remote Autonomous Systems

Hastie Helen

Autonomous systems in remote locations have a high degree of autonomy and there is a need to explain what they are doing and why , in order to increase transparency and maintain trust. This is particularly important in hazardous, high-risk scenarios. ...

SESSION: EAT Grand Challenge

EAT -: The ICMI 2018 Eating Analysis and Tracking Challenge

Hantke Simone

The multimodal recognition of eating condition - whether a person is eating or not - and if yes, which food type, is a new research domain in the area of speech and video processing that has many promising applications for future multimodal interfaces ...

SAAMEAT: Active Feature Transformation and Selection Methods for the Recognition of User Eating Conditions

Haider Fasih

Automatic recognition of eating conditions of humans could be a useful technology in health monitoring. The audio-visual information can be used in automating this process, and feature engineering approaches can reduce the dimensionality of audio-visual ...

Exploring A New Method for Food Likability Rating Based on DT-CWT Theory

Guo Ya'nan

In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform ...

Deep End-to-End Representation Learning for Food Type Recognition from Speech

Sertolli Benjamin

The use of Convolutional Neural Networks (CNN) pre-trained for a particular task, as a feature extractor for an alternate task, is a standard practice in many image classification paradigms. However, to date there have been comparatively few works ...

Functional-Based Acoustic Group Feature Selection for Automatic Recognition of Eating Condition

Pir Dara

This paper presents the novel Functional-based acoustic Group Feature Selection (FGFS) method for automatic eating condition recognition addressed in the ICMI 2018 Eating Analysis and Tracking Challenge's Food-type Sub-Challenge. The Food-type Sub-...

SESSION: EmotiW Grand Challenge

Video-based Emotion Recognition Using Deeply-Supervised Neural Networks

Fan Yingruo

Emotion recognition (ER) based on natural facial images/videos has been studied for some years and considered a comparatively hot topic in the field of affective computing. However, it remains a challenge to perform ER in the wild, given the noises ...

An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets

Vielzeuf Valentin

This paper presents a light-weight and accurate deep neural model for audiovisual emotion recognition. To design this model, the authors followed a philosophy of simplicity, drastically limiting the number of parameters to learn from the target datasets,...

Deep Recurrent Multi-instance Learning with Spatio-temporal Features for Engagement Intensity Prediction

Yang Jianfei

This paper elaborates the winner approach for engagement intensity prediction in the EmotiW Challenge 2018. The task is to predict the engagement level of a subject when he or she is watching an educational video in diverse conditions and different ...

Automatic Engagement Prediction with GAP Feature

Niu Xuesong

In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose ...

Predicting Engagement Intensity in the Wild Using Temporal Convolutional Network

Thomas Chinchu

Engagement is the holy grail of learning whether it is in a classroom setting or an online learning platform. Studies have shown that engagement of the student while learning can benefit students as well as the teacher if the engagement level of the ...

An Attention Model for Group-Level Emotion Recognition

Gupta Aarush

In this paper we propose a new approach for classifying the global emotion of images containing groups of people. To achieve this task, we consider two different and complementary sources of information: i) a global representation of the entire image (...

An Ensemble Model Using Face and Body Tracking for Engagement Detection

Chang Cheng

Precise detection and localization of learners' engagement levels are useful for monitoring their learning quality. In the emotiW Challenge's engagement detection task, we proposed a series of novel improvements, including (a) a cluster-based framework ...

Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network

Khan Ahmed Shehab

Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. ...

Multi-Feature Based Emotion Recognition for Video Clips

Liu Chuanhe

In this paper, we present our latest progress in Emotion Recognition techniques, which combines acoustic features and facial features in both non-temporal and temporal mode. This paper presents the details of our techniques used in the Audio-Video ...

Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions

Guo Xin

This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, ...

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues

Wang Kai

This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, ...

Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild

Lu Cheng

The difficulty of emotion recognition in the wild (EmotiW) is how to train a robust model to deal with diverse scenarios and anomalies. The Audio-video Sub-challenge in EmotiW contains audio-video short clips with several emotional labels and the task ...

EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction

Dhall Abhinav

This paper details the sixth Emotion Recognition in the Wild (EmotiW) challenge. EmotiW 2018 is a grand challenge in the ACM International Conference on Multimodal Interaction 2018, Colarado, USA. The challenge aims at providing a common platform to ...

SESSION: Workshop Summaries

3rd International Workshop on Multisensory Approaches to Human-Food Interaction

Nijholt Anton

This is the introduction paper to the third version of the workshop on 'Multisensory Approaches to Human-Food Interaction' organized at the 20th ACM International Conference on Multimodal Interaction in Boulder, Colorado, on October 16th, 2018. This ...

Group Interaction Frontiers in Technology

Murray Gabriel

Analysis of group interaction and team dynamics is an important topic in a wide variety of fields, owing to the amount of time that individuals typically spend in small groups for both professional and personal purposes, and given how crucial group ...

Modeling Cognitive Processes from Multimodal Signals

Putze Felix

Multimodal signals allow us to gain insights into internal cognitive processes of a person, for example: speech and gesture analysis yields cues about hesitations, knowledgeability, or alertness, eye tracking yields information about a person's focus of ...

Human-Habitat for Health (H3): Human-habitat Multimodal Interaction for Promoting Health and Well-being in the Internet of Things Era

Chaspari Theodora

This paper presents an introduction to the "Human-Habitat for Health (H3): Human-habitat multimodal interaction for promoting health and well-being in the Internet of Things era" workshop, which was held at the 20th ACM International Conference on ...

International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (Workshop Summary)

Böck Ronald

In this paper a brief overview of the third workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. The paper is focussing on the main aspects intended to be discussed in the workshop reflecting the main scope of the ...

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors