26^th ACM International Conference on Multimodal Interaction
(4-8 Nov 2024)

Home

Registration

Important dates

Keynote Speakers

Conference Program

Proceedings

Companion Proceedings

Awards

Workshops

Grand Challenges

Tutorials

Special Sessions

Call for Papers

Doctoral Consortium

Blue Sky

Demos

Late-Breaking Results

New Initiatives

Presentation Guidelines

Camera Ready Instructions

Author Guidelines

Reviewer Guidelines

Call for Sponsors

People

Travel, Accomodation and Venue

About Costa Rica

For Families

Visa Information

Student Volunteers

Steering Committee

Platinum Sponsor

Silver Sponsors

Bronze Sponsor

Blue Sky Sponsor

Institutional Sponsor

Sessions

This is a tentative schedule for the Conference Program, we expect to make a few changes soon.

Adjunt Events Day 1 08:00-17:30 Doctoral Consortium
Session Chair: Yukiko Nakano

09:25 Opening and Welcome
Speaker: Micol Spitale

Session 1: Education
Chair: Micol Spitale

09:30 Enhancing Collaboration and Performance among EMS Students through Multimodal Learning Analytics
Vasundhara Joshi
Mentor: Catha Oertel

09:55 Video Game Technologies Applied for Teaching Assembly Language Programming
Ernesto Rivera
Mentor: Heloisa Candello

10:20 Coffee Break

Session 2: Robots and Touch
Chair: Micol Spitale

10:40 A Musical Robot for People with Dementia
Paul Raingeard de la Bletiere
Mentor: Muneeb Ahmad

11:05 Real-Time Trust Measurement in Human-Robot Interaction: Insights from Physiological Behaviours
Abdullah Alzahrani
Mentor: Sean Andrist

11:30 Designing Digital Multisensory Textile Experiences
Shu Zhong
Mentor: Heloisa Candello

12:00 Lunch and Interaction with Mentors

Session 3: Social Interaction Modeling
Chair: Micol Spitale

13:30 Towards Automatic Social Involvement Estimation
Zonghuan Li
Mentor: Alessandro Vinciarelli

13:55 Investigating Multi-Reservoir Computing for EEG-based Emotion Recognition
Anubhav Anubhav
Mentor: Albert Ali Salah

14:15 Modelling Social Intentions in Complex Conversational Settings
Ivan Kondyurin
Mentor: Alessandro Vinciarelli

Session 4: Multimodal and Trustworthy AI
Chair: Micol Spitale

14:40A Multimodal Understanding of the Eye-Mind Link
Megan Caruso
Mentor: Stacy Marsella

15:05 Towards Trustworthy and Efficient Diffusion Models
Jayneel Vora
Mentor: Albert Ali Salah

13:3 Coffee Break

15:45 Early Career Stories: Mentors share early career stories and advice, students can ask any questions to the mentors from career to work-life balance
Chair: Micol Spitale

17:00 Closing

Adjunt Events Day 1 13:30-17:30 Grand Challenge 1 (ERR)
Session Chair: Maria Teresa Parreira

13:30 Introduction to the ERR@HRI challenge and baseline paper presentation
Maria Teresa Parreira

13:45 A Time Series Classification Pipeline for Detecting Interaction Ruptures in HRI Based on User Reactions
Peter Tisnikar

14:05 PRISCA at ERR@HRI 2024: Multimodal Representation Learning for Detecting Interaction Ruptures in HRI
Silvia Rossi

14:25 Predicting Errors and Failures in Human-Robot Interaction from Multi-Modal Temporal Data
Ruben Janssens

14:45 Coffee Break

15:00 Keynote Speaker: TBD
Chair: Maia Stiber

15:30 Final discussion: Future Directions
Chair: Maia Stiber

16:00 Closing

Day 1 10:30-12:00 Oral Session 1: Multimodal and cross-modal learning
Session Chair: Yukiko Nakano

10:30-10:50 Mitigation of gender bias in automatic facial non-verbal behaviors generation for interactive social agents nominated for best paper award
A. Delbosc, M. Ochs, N. Sabouret, B. Ravenet, and S. Ayache

10:50-11:10 DoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text nominated for best paper award
F. Hasan, Y. Li, J. Foulds, S. Pan, B. Bhattacharjee

11:10-11:30 Do We Need To Watch It All? Efficient Job Interview Video Processing with Differentiable Masking
H. Le, S. Li, C. O. Mawalim, H. H. Huang, C. W. Leong, and S. Okada

11:30-11:50 A Model of Factors Contributing to the Success of Dialogical Explanations
M. Booshehri, H. Buschmeier, and P. Cimiano

Day 1 13:30-15:00 Oral Session 2: Human Communication Dynamics
Session Chair: Hendrik Buschmeier

13:30-13:50 Online Multimodal End-of-Turn Prediction for Three-party Conversations nominated for best paper award
M. C. Lee and Z. Deng

13:50-14:10 Decoding Contact: Automatic Estimation of Contact Signatures in Parent-Infant Free Play Interactions nominated for best paper award
M. Doyran, A. A. Salah, and R. Poppe

14:10-14:30 Leveraging Prosody as an Informative Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications
M. Knierim, S. Jain, M. H. Aydoğan, K. Mitra, K. Desai, A. Saran, and K. Baraka

14:30-14:50 SEMPI: A Database for Understanding Social Engagement in Video-Mediated Multiparty Interaction
M. Siniukov, Y. Yin, E. Fast, Y. Qi, A. Monga, A. Kim, and M. Soleymani

Day 1 15:30-16:30 Panel: Multimodal Research in Latin America
Session Chair: Prof. Daniel Gatica-Perez

Prof. Carlos Busso, University of Texas Dallas, USA
Dr. Heloisa Candelo, IBM Brazil
Prof. Monica Perusquia-Hernandez, Nara Institute of Science and Technology, Japan
Prof. Laura Cabrera-Quiros, TEC, Costa Rica

Day 1 16:30-18:00 Poster Presentations 1 (including DC posters)
Session Chair: Tariq Iqbal

Exploring the Alteration and Masking of Everyday Noise Sounds using Auditory Augmented Reality
I. A. Bustoni, M. McGill, and S. Brewster

The Plausibility Paradox on Interactions with Complex Virtual Objects in Virtual Environments
D. Alvarado-Chou and Y. Law

First-Person Perspective Induces Stronger Feelings of Awe and Presence Compared to Third-Person Perspective in Virtual Reality
H. Otsubo, A. Marquardt, M. Steininger, M. Lehnort, F. Dollack, Y. Hirao, M. Perusquia-Hernandez, H. Uchiyama, E. Kruijff, B. Riecke, and K. Kiyokawa

Poke Typing: Effects of Hand-Tracking Input and Key Representation on Mid-Air Text Entry Performance in Virtual Reality
M. Akhoroz and C. Yildirim

Is Distance a Modality? Multi-Label Learning for Speech-Based Joint Prediction of Attributed Traits and Perceived Distances in 3D Audio Immersive Environments
E. Fringi, N. Alareef, L. Picinali, S. Brewster, T. Guha, and A. Vinciarelli

Feeling Textiles through AI: An exploration into Multimodal Language Models and Human Perception Alignment
S. Zhong, E. Gatti, Y. Cho, and M. Obrist

SemanticTap: A Haptic Toolkit for Vibration Semantic Design of Smartphone
R. Zhang, Y. Li, and Y. Jiao

QuietSync: Integrating Multimodal Signals for Accurate Silent Speech Interaction with Head-Worn Devices
T. Srivastava, R. M. Winters, T. Gable, Y. T. Wang, T. LaScala, and I. Tashev

NearFetch: Enhancing Touch-Based Mobile Interaction on Public Displays with an Embedded Programmable NFC Array
Q. Cao, J. Zhang, S. Fan, J. Rong, M. Qi, Z. Duan, P. Zhao, L. Liu, Z. Zhou, and W. Chen

ScentHaptics: Augmenting the Haptic Experiences of Digital Mid-Air Textiles with Scent
C. Dawes, J. Xue, G. Brianza, P. Cornelio, R. Montano Murillo, E. Maggioni, and M. Obrist

LLM-powered Multimodal Insight Summarization for UX Testing
K. Turbeville, J. Muengtaweepongsa, S. Stevens, J. Moss, A. Pon, K. Lee, C. Mehra, J. Gutierrez Villalobos, and R. Kumar

Generalization Boost in Bimodal Classification via Data Fusion Trained on Sparse Datasets
W. Yu, D. Kolossa, and R. Nickel

A multimodal analysis of environmental stress experienced by older adults during outdoor walking trips: Implications for designing new intelligent technologies to enhance walkability in low-income Latino communities
R. Yupanqui, J. Sohn, Y. Kim, R. Flores, H. Lee, J. Kim, S. Lee, Y. Ham, C. Lee, and T. Chaspari

Day 1 16:30-18:00 Doctoral Consortium Papers Poster Session
Session Chair: Micol Spitale

Towards Trustworthy and Efficient Diffusion Models
Jayneel Vora

Video Game Technologies Applied for Teaching Assembly Language Programming
Ernesto Rivera

Towards Automatic Social Involvement Estimation
Zonghuan Li

A Musical Robot for People with Dementia
Paul Raingeard de la Bletiere

Investigating Multi-Reservoir Computing for EEG-based Emotion Recognition
Anubhav Anubhav

A Multimodal Understanding of the Eye-Mind Link
Megan Caruso

Real-Time Trust Measurement in Human-Robot Interaction: Insights from Physiological Behaviours
Abdullah Alzahrani

Designing Digital Multisensory Textile Experiences
Shu Zhong

Modelling Social Intentions in Complex Conversational Settings
Ivan Kondyurin

Enhancing Collaboration and Performance among EMS Students through Multimodal Learning Analytics
Vasundhara Joshi

Day 2 10:30-12:00 Oral Session 3: Affective Computing
Session Chair: Carlos Busso

10:30-10:50 Relating Students Cognitive Processes and Learner-Centered Emotions: An Advanced Deep Learning Approach
A. T S and G. Biswas

10:50-11:10 On Multimodal Emotion Recognition for Human-Chatbot Interaction in the Wild
N. Kovacevic, C. Holz, M. Gross, and R. Wampfler

11:10-11:30 Towards Automated Annotation of Infant-Caregiver Engagement Phases with Multimodal Foundation Models
D. Withanage Don, D. Schiller, T. Hallmen, S. Mertes, T. Baur, F. Lingenfelser, M. Müller, L. Kaubisch, C. Reck, and E. André

11:30-11:50 Emotion Recognition for Multimodal Recognition of Attachment in School-Age Children
A. Buker and A. Vinciarelli

Day 2 13:30-14:30 Oral Session 4: Special session on Personalization of Robot’s Multimodal Behavior
Session Chair: Silvia Rossi

13:30-13:50 Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models
A. Pereira, L. Marcinek, J. Miniotaite, S. Thunberg, E. Lagerstedt, J. Gustafson, G. Skantze, and B. Irfan

13:50-14:10 Predicting Human Intent to Interact with a Public Robot: The People Approaching Robots Database (PAR-D
S. Thompson, A. Lew, Y. Li, E. Stanish, A. Huang, R. Phanse, and M. Vázquez

14:10-14:30 M2RL: A Multimodal Multi-Interface Dataset for Robot Learning from Human Demonstrations
S. Hasan, M. Yasar, T. Iqbal

Day 2 15:00-17:00 Poster Presentations 2 & Demo Session
Session Chair: Monica Perusquia

Perceived Text Relevance Estimation Using Scanpaths and GNNs
A. Mohamed Selim, O. S. Bhatti, M. Barz, and D. Sonntag

Juicy Text: Onomatopoeia and Semantic Text Effects for Juicy Player Experiences
E. Fabre, K. Seaborn, A. Verhulst, Y. Itoh, and J. Rekimoto

Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation
E. Ghaleb, B. Khaertdinov, W. Pouw, M. Rasenberg, J. Holler, A, Ozyurek, R. Fernandez

Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement
M. Funk, S. Okada, and E. André

Exploring Interlocutor Gaze Interactions in Conversations based on Functional Spectrum Analysis
A. Tashiro, M. Imamura, S. Kumano, and K. Otsuka

Predictability of Understanding in Explanatory Interactions Based on Multimodal Cues
O. Turk, S. Lazarov, Y. Wang, H. Buschmeier, A. Grimminger, and P. Wagner

Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
T. Feng, D. Yang, D. Bose, and S. Narayanan

Automatic mild cognitive impairment estimation from the group conversation of coimagination method
S. Li, K. Kumagai, M. Otake-Matsuura, and S. Okada

Lip Abnormality Detection for Patients with Repaired Cleft Lip and Palate: A Lip Normalization Approach
K. Rosero, A. Salman, R. R. Hallac, and C. Busso

“Uh, This One?”: Leveraging Behavioral Signals for Detecting Confusion during Physical Tasks
M. Stiber, D. Bohus, and S. Andrist

Understanding Non-Verbal Irony Markers: Machine Learning Insights Versus Human Judgment
M. Spitale, F. Catania, and F. Panzeri

Day 2 15:00-17:00 Demo Session
Session Chair: Raj Tumuluri

An Adaptive GPT-4-powered Socially Interactive Agent for Conversing about Health
J. Molto, U. Visser, J. Fields, and C. Lisetti

An AI-Powered Interactive Interface to Enhance Accessibility of Interview Training for Military Veterans
R. C. Yarlagadda, P. Aggarwal, V. Jamadagni, G. Mahajani, P. Malasani, E. H. Nirjhar, and T. Chaspari

Combining Generative and Discriminative AI for High-Stakes Interview Practice
C. W. Leong, N. Jawahar, V. Basheerabad, T. Wörtwein, A. Emerson, and G. Sivan

Enhancing Biodiversity Monitoring: An Interactive Tool for Efficient Identification of Species in Large Bioacoustics Datasets
H. Kath, I. Troshani, B. Lüers, T. S. Gouvêa, and D. Sonntag

ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents
C. Schindler, D. Mayumi, Y. Matsuda, N. Rach, K. Yasumoto, and W. Minker

Let’s Dance Together! AI Dancers Can Dance to Your Favorite Music and Style
R. Ishii, S. Eitoku, S. Matsuo, M. Makiguchi, A. Hoshi, and L. P. Morency

Human Contact Annotator: Annotating Physical Contact in Dyadic Interactions
M. Doyran, A. A. Salah, and R. Poppe

Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intent
P. Nandy, S. O. Adalgeirsson, A. K. Sinha, T. Kraljic, M. Cleron, L. Shi, A. Singh, A. Chaudhary, A. Ganti, C. A. Melancon, S. Zhang, D. Robishaw, H. Ciurdar, J. Secor, K. A. Robertsen, K. Climer, M. Le, M. Venkatesan, P. Chi, P. Li, P. F. McDermott, R. Shim, S. Onsan, S. Vaishnav, and S. Guamán

Day 3 10:30-12:00 Oral Session 5: Biomedical Data Processing
Session Chair: Ali Etemad

Putting the “Brain” Back in the Eye-Mind Link: Aligning Eye Movements and Brain Activations During Naturalistic Reading
M. Caruso, R. Southwell, L. Hirshfield, and S. D’Mello

Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes
M. Sharma, C. Martínez, B. Wirth, A. Krüger, and P. Müller

Detecting Deception in Natural Environments Using Incremental Transfer Learning
M. Ahmad, A. Alzahrani, and S. Ahmad

Stressor Type Matters! — Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection
P. Prajod, B. Mahesh, and E. André

Day 3 14:30-15:00 Challenge Overview
Session Chair: Ronald Böck

Introduction to Grand Challenges 2024
Ronald Böck

Empathic Virtual Agent Challenge: Appraisal-based Recognition of Affective States
Safaa Azzakhnini

ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions
Micol Spitale

Day 3 15:30-16:30 Blue Sky Papers
Session Chair: Ali Etemad

15:30-15:50 AI as Modality in Human Augmentation: Toward New Forms of Multimodal Interaction with AI-Embodied Modalities
R.-D. Vatavu

15:50-16:10 RealSeal: Revolutionizing Media Authentication with Real-Time Realism Scoring
B. Radharapu and H. Krishna

16:10-16:30 Everything We Hear: Towards Tackling Misinformation in Podcasts
S. P. Cherumanal, U. Gadiraju and D. Spina

Day 3 16:30-18:00 Poster Presentations 3 & Late Breaking Results
Session Chair: Radoslaw Niewiadomski

Generating Facial Expression Sequences of Complex Emotions with Generative Adversarial Networks
Z. Belmekki, D. Gómez Jáuregui, P. Reuter, J. Li, J. C. Martin, K. Jenkins, and N. Couture

Envisioning Futures: How the Modality of AI Recommendations Impacts Conversation Flow in AR-enhanced Dialogue
S. Villa, Y. Weiss, M. Y. Lu, M. Ziarko, A. Schmidt, and J. Niess

Across Trials vs Subjects vs Contexts: A Multi-Reservoir Computing Approach for EEG Variations in Emotion Recognition
A. Anubhav and K. Fujiwara

Detecting Aware and Unaware Mind Wandering During Lecture Viewing: A Multimodal Machine Learning Approach Using Eye Tracking, Facial Videos and Physiological Data
B. Bühler, E. Bozkir, H. Deininger, P. Goldberg, P. Gerjets, U. Trautwein, and E. Kasneci

MR-Driven Near-Future Realities: Previewing Everyday Life Real-World Experiences Using Mixed Reality
F. Mathis, B. Myers, B. Lafreniere, M. Glueck, and D. Porpino Sobreira Marques

Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data
D. Ghose, O. Gitelson, and B. Scassellati

Anonymous-Corpus: A Multimodal Database for Understanding Video-Learning Experience
A. Salman, N. Wang, L. Martinez-Lucas, A. Vidal, and C. Busso

NapTune: Prompt-tuning for Mood Classification with Wearable Time-series along with Previous Night’s Sleep-related Measures
D. Shome, N. Montazeri Ghahjaverestan, and A. Etemad

Improving Usability of Data Charts in Multimodal Documents for Low Vision Users
Y. Prakash, A. Kolgar, Nayak, S. Alyaan, P. A. Khan, H. N. Lee, and V. Ashok

Participation Role-Driven Engagement Estimation of ASD Individuals in Neurodiverse Group Discussions
K. Stefanov, Y. Nakano, C. Kobayashi, I. Hoshina, T. Sakato, F. Nihei, C. Takayama, R. Ishii, and M. Tsujii

Detecting Autism from Head Movements using Kinesics
M. Gokmen, E. Sariyanidi, L. Yankowitz, C. J. Zampella, R. T. Schultz, and B. Tunc

Perception of Stress: A Comparative Multimodal Analysis of Time-Continuous Stress Ratings from Self and Observers
E. H. Nirjhar, W. Arthur Jr., and T. Chaspari

Day 3 16:30-18:00 Late Breaking Results
Session Chair: Ronald Böck

User-Defined Interaction for Very Low-Cost Head-Mounted Displays
Y. C. Law, H. Mendieta-Dávila, D. García-Fallas, R. G. Quiros, and M. Chacón-Rivas

Effects of Incoherence in Multimodal Explanations of Robot Failures
P. Pramanick, N. Federico, L. Raggioli, A. Rossi, and S. Rossi

Design and Preliminary Evaluation of a Stress Reflection System for High-Stress Training Environments
S. Akiri, V. Joshi, S. Taherzadeh, G. Williams, H. M. Mentis, and A. Kleinsmith

Haptic Feedback to Reduce Individual Differences in Corrective Actions for Skill Learning
S. Ono, N. Ninomiya, and H. Kanai

Towards Multimodality: Comparing Quantifications of Movement Coordination
C. Fan, V. Romero, A. Paxton, and T. Chowdhury

Unlocking the Potential of Multimodal Compositionality for Enhanced Recommendations through Sentiment Analysis
S. Nazir and M. Sadrzadeh

Enhancing Autism Spectrum Disorder Screening: Implementation and Pilot Testing of a Robot-Assisted Digital Tool
A. Di Nuovo and A. Kay

Understanding LLMs Ability to Aid Malware Analysts in Bypassing Evasion Techniques
M. Y. Wong, K. Valakuzhy, M. Ahamad, D. Blough, and F. Monrose

“Is This It?”: Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus, S. Andrist, Y. Bao, E. Horvitz, and A. Paradiso

An Audiotactile System for Accessible Graphs on a Coordinate Plane
C. Yang and P. Taele

Levels of Multimodal Interaction
A. K. Sinha, A. Olwal, and C. Kulkarni

Comparing Subjective Measures of Workload in Video Game Play: Evaluating the Test-Retest Reliability of the VGDS and NASA-TLX
E. Pretty, R. L. Martins Guarese, H. Fayek, and F. Zambetta

Towards Investigating Biases in Spoken Conversational Search
S. P. Cherumanal, J. R. Trippas, and D. Spina

Crossmodal Correspondences between Piquancy/Spiciness and Visual Shape
Y. Wang, M. Ohno, T. Narumi, and Y. ah Seong

Adjunct Events Day 2 09:00-12:00 Grand Challenge 2 (EVAC)
Session Chair: Safaa Azzakhnini

09:00 EVAC’2024 Opening & Challenge Introduction

09:40 EVAC’2024 Contribution – Johns Hopkins Center for Language and Speech Processing

10:30 EVAC’2024 Challenge Results

10:40 EVAC’2024 Panel Discussion

11:30 Closing

Papers not presented in the conference (Included in the main proceedings)

SMURF: Statistical Modality Uniqueness and Redundancy Factorization nominated for best paper award
Wörtwein, N. Allen, J. Cohn, L. P. Morency

The impact of auditory warning types and emergency obstacle avoidance takeover scenarios on takeover behavior
X. Li and Z. Xu

Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting
D. Gupta, A. Bhatti, S. Parmar, C. Dan, Y. Liu, B. Shen, and S. Lee

Nonverbal Dynamics in Dyadic Videoconferencing Interaction: The Role of Video Resolution and Conversational Quality
C. Diao, S. Arevalo Arboleda, and A. Raake