Sessions
This is a tentative schedule for the Conference Program, we expect to make a few changes soon.
Session Chair: Yukiko Nakano
09:25 Opening and Welcome
Speaker: Micol Spitale
Session 1: Education
Chair: Micol Spitale
09:30 Enhancing Collaboration and Performance among EMS Students through Multimodal Learning Analytics
Vasundhara Joshi
Mentor: Catha Oertel
09:55 Video Game Technologies Applied for Teaching Assembly Language Programming
Ernesto Rivera
Mentor: Heloisa Candello
10:20 Coffee Break
Session 2: Robots and Touch
Chair: Micol Spitale
10:40 A Musical Robot for People with Dementia
Paul Raingeard de la Bletiere
Mentor: Muneeb Ahmad
11:05 Real-Time Trust Measurement in Human-Robot Interaction: Insights from Physiological Behaviours
Abdullah Alzahrani
Mentor: Sean Andrist
11:30 Designing Digital Multisensory Textile Experiences
Shu Zhong
Mentor: Heloisa Candello
12:00 Lunch and Interaction with Mentors
Session 3: Social Interaction Modeling
Chair: Micol Spitale
13:30 Towards Automatic Social Involvement Estimation
Zonghuan Li
Mentor: Alessandro Vinciarelli
13:55 Investigating Multi-Reservoir Computing for EEG-based Emotion Recognition
Anubhav Anubhav
Mentor: Albert Ali Salah
14:15 Modelling Social Intentions in Complex Conversational Settings
Ivan Kondyurin
Mentor: Alessandro Vinciarelli
Session 4: Multimodal and Trustworthy AI
Chair: Micol Spitale
14:40A Multimodal Understanding of the Eye-Mind Link
Megan Caruso
Mentor: Stacy Marsella
15:05 Towards Trustworthy and Efficient Diffusion Models
Jayneel Vora
Mentor: Albert Ali Salah
13:3 Coffee Break
15:45 Early Career Stories: Mentors share early career stories and advice, students can ask any questions to the mentors from career to work-life balance
Chair: Micol Spitale
17:00 Closing
Adjunt Events Day 1 13:30-17:30 Grand Challenge 1 (ERR)
Session Chair: Maria Teresa Parreira
13:30 Introduction to the ERR@HRI challenge and baseline paper presentation
Maria Teresa Parreira
13:45 A Time Series Classification Pipeline for Detecting Interaction Ruptures in HRI Based on User Reactions
Peter Tisnikar
14:05 PRISCA at ERR@HRI 2024: Multimodal Representation Learning for Detecting Interaction Ruptures in HRI
Silvia Rossi
14:25 Predicting Errors and Failures in Human-Robot Interaction from Multi-Modal Temporal Data
Ruben Janssens
14:45 Coffee Break
15:00 Keynote Speaker: TBD
Chair: Maia Stiber
15:30 Final discussion: Future Directions
Chair: Maia Stiber
16:00 Closing
Day 1 10:30-12:00 Oral Session 1: Multimodal and cross-modal learning
Session Chair: Yukiko Nakano
10:30-10:50 Mitigation of gender bias in automatic facial non-verbal behaviors generation for interactive social agents nominated for best paper award
A. Delbosc, M. Ochs, N. Sabouret, B. Ravenet, and S. Ayache
10:50-11:10 DoubleDistillation: Enhancing LLMs for Informal Text Analysis using Multistage Knowledge Distillation from Speech and Text nominated for best paper award
F. Hasan, Y. Li, J. Foulds, S. Pan, B. Bhattacharjee
11:10-11:30 Do We Need To Watch It All? Efficient Job Interview Video Processing with Differentiable Masking
H. Le, S. Li, C. O. Mawalim, H. H. Huang, C. W. Leong, and S. Okada
11:30-11:50 A Model of Factors Contributing to the Success of Dialogical Explanations
M. Booshehri, H. Buschmeier, and P. Cimiano
Day 1 13:30-15:00 Oral Session 2: Human Communication Dynamics
Session Chair: Hendrik Buschmeier
13:30-13:50 Online Multimodal End-of-Turn Prediction for Three-party Conversations nominated for best paper award
M. C. Lee and Z. Deng
13:50-14:10 Decoding Contact: Automatic Estimation of Contact Signatures in Parent-Infant Free Play Interactions nominated for best paper award
M. Doyran, A. A. Salah, and R. Poppe
14:10-14:30 Leveraging Prosody as an Informative Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications
M. Knierim, S. Jain, M. H. Aydoğan, K. Mitra, K. Desai, A. Saran, and K. Baraka
14:30-14:50 SEMPI: A Database for Understanding Social Engagement in Video-Mediated Multiparty Interaction
M. Siniukov, Y. Yin, E. Fast, Y. Qi, A. Monga, A. Kim, and M. Soleymani
Day 1 15:30-16:30 Panel: Multimodal Research in Latin America
Session Chair: Prof. Daniel Gatica-Perez
Prof. Carlos Busso, University of Texas Dallas, USA
Dr. Heloisa Candelo, IBM Brazil
Prof. Monica Perusquia-Hernandez, Nara Institute of Science and Technology, Japan
Prof. Laura Cabrera-Quiros, TEC, Costa Rica
Day 1 16:30-18:00 Poster Presentations 1 (including DC posters)
Session Chair: Tariq Iqbal
Exploring the Alteration and Masking of Everyday Noise Sounds using Auditory Augmented Reality
I. A. Bustoni, M. McGill, and S. Brewster
The Plausibility Paradox on Interactions with Complex Virtual Objects in Virtual Environments
D. Alvarado-Chou and Y. Law
First-Person Perspective Induces Stronger Feelings of Awe and Presence Compared to Third-Person Perspective in Virtual Reality
H. Otsubo, A. Marquardt, M. Steininger, M. Lehnort, F. Dollack, Y. Hirao, M. Perusquia-Hernandez, H. Uchiyama, E. Kruijff, B. Riecke, and K. Kiyokawa
Poke Typing: Effects of Hand-Tracking Input and Key Representation on Mid-Air Text Entry Performance in Virtual Reality
M. Akhoroz and C. Yildirim
Is Distance a Modality? Multi-Label Learning for Speech-Based Joint Prediction of Attributed Traits and Perceived Distances in 3D Audio Immersive Environments
E. Fringi, N. Alareef, L. Picinali, S. Brewster, T. Guha, and A. Vinciarelli
Feeling Textiles through AI: An exploration into Multimodal Language Models and Human Perception Alignment
S. Zhong, E. Gatti, Y. Cho, and M. Obrist
SemanticTap: A Haptic Toolkit for Vibration Semantic Design of Smartphone
R. Zhang, Y. Li, and Y. Jiao
QuietSync: Integrating Multimodal Signals for Accurate Silent Speech Interaction with Head-Worn Devices
T. Srivastava, R. M. Winters, T. Gable, Y. T. Wang, T. LaScala, and I. Tashev
NearFetch: Enhancing Touch-Based Mobile Interaction on Public Displays with an Embedded Programmable NFC Array
Q. Cao, J. Zhang, S. Fan, J. Rong, M. Qi, Z. Duan, P. Zhao, L. Liu, Z. Zhou, and W. Chen
ScentHaptics: Augmenting the Haptic Experiences of Digital Mid-Air Textiles with Scent
C. Dawes, J. Xue, G. Brianza, P. Cornelio, R. Montano Murillo, E. Maggioni, and M. Obrist
LLM-powered Multimodal Insight Summarization for UX Testing
K. Turbeville, J. Muengtaweepongsa, S. Stevens, J. Moss, A. Pon, K. Lee, C. Mehra, J. Gutierrez Villalobos, and R. Kumar
Generalization Boost in Bimodal Classification via Data Fusion Trained on Sparse Datasets
W. Yu, D. Kolossa, and R. Nickel
A multimodal analysis of environmental stress experienced by older adults during outdoor walking trips: Implications for designing new intelligent technologies to enhance walkability in low-income Latino communities
R. Yupanqui, J. Sohn, Y. Kim, R. Flores, H. Lee, J. Kim, S. Lee, Y. Ham, C. Lee, and T. Chaspari
Day 1 16:30-18:00 Doctoral Consortium Papers Poster Session
Session Chair: Micol Spitale
Towards Trustworthy and Efficient Diffusion Models
Jayneel Vora
Video Game Technologies Applied for Teaching Assembly Language Programming
Ernesto Rivera
Towards Automatic Social Involvement Estimation
Zonghuan Li
A Musical Robot for People with Dementia
Paul Raingeard de la Bletiere
Investigating Multi-Reservoir Computing for EEG-based Emotion Recognition
Anubhav Anubhav
A Multimodal Understanding of the Eye-Mind Link
Megan Caruso
Real-Time Trust Measurement in Human-Robot Interaction: Insights from Physiological Behaviours
Abdullah Alzahrani
Designing Digital Multisensory Textile Experiences
Shu Zhong
Modelling Social Intentions in Complex Conversational Settings
Ivan Kondyurin
Enhancing Collaboration and Performance among EMS Students through Multimodal Learning Analytics
Vasundhara Joshi
Day 2 10:30-12:00 Oral Session 3: Affective Computing
Session Chair: Carlos Busso
10:30-10:50 Relating Students Cognitive Processes and Learner-Centered Emotions: An Advanced Deep Learning Approach
A. T S and G. Biswas
10:50-11:10 On Multimodal Emotion Recognition for Human-Chatbot Interaction in the Wild
N. Kovacevic, C. Holz, M. Gross, and R. Wampfler
11:10-11:30 Towards Automated Annotation of Infant-Caregiver Engagement Phases with Multimodal Foundation Models
D. Withanage Don, D. Schiller, T. Hallmen, S. Mertes, T. Baur, F. Lingenfelser, M. Müller, L. Kaubisch, C. Reck, and E. André
11:30-11:50 Emotion Recognition for Multimodal Recognition of Attachment in School-Age Children
A. Buker and A. Vinciarelli
Day 2 13:30-14:30 Oral Session 4: Special session on Personalization of Robot’s Multimodal Behavior
Session Chair: Silvia Rossi
13:30-13:50 Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models
A. Pereira, L. Marcinek, J. Miniotaite, S. Thunberg, E. Lagerstedt, J. Gustafson, G. Skantze, and B. Irfan
13:50-14:10 Predicting Human Intent to Interact with a Public Robot: The People Approaching Robots Database (PAR-D
S. Thompson, A. Lew, Y. Li, E. Stanish, A. Huang, R. Phanse, and M. Vázquez
14:10-14:30 M2RL: A Multimodal Multi-Interface Dataset for Robot Learning from Human Demonstrations
S. Hasan, M. Yasar, T. Iqbal
Day 2 15:00-17:00 Poster Presentations 2 & Demo Session
Session Chair: Monica Perusquia
Perceived Text Relevance Estimation Using Scanpaths and GNNs
A. Mohamed Selim, O. S. Bhatti, M. Barz, and D. Sonntag
Juicy Text: Onomatopoeia and Semantic Text Effects for Juicy Player Experiences
E. Fabre, K. Seaborn, A. Verhulst, Y. Itoh, and J. Rekimoto
Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation
E. Ghaleb, B. Khaertdinov, W. Pouw, M. Rasenberg, J. Holler, A, Ozyurek, R. Fernandez
Multilingual Dyadic Interaction Corpus NoXi+J: Toward Understanding Asian-European Non-verbal Cultural Characteristics and their Influences on Engagement
M. Funk, S. Okada, and E. André
Exploring Interlocutor Gaze Interactions in Conversations based on Functional Spectrum Analysis
A. Tashiro, M. Imamura, S. Kumano, and K. Otsuka
Predictability of Understanding in Explanatory Interactions Based on Multimodal Cues
O. Turk, S. Lazarov, Y. Wang, H. Buschmeier, A. Grimminger, and P. Wagner
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
T. Feng, D. Yang, D. Bose, and S. Narayanan
Automatic mild cognitive impairment estimation from the group conversation of coimagination method
S. Li, K. Kumagai, M. Otake-Matsuura, and S. Okada
Lip Abnormality Detection for Patients with Repaired Cleft Lip and Palate: A Lip Normalization Approach
K. Rosero, A. Salman, R. R. Hallac, and C. Busso
“Uh, This One?”: Leveraging Behavioral Signals for Detecting Confusion during Physical Tasks
M. Stiber, D. Bohus, and S. Andrist
Understanding Non-Verbal Irony Markers: Machine Learning Insights Versus Human Judgment
M. Spitale, F. Catania, and F. Panzeri
Day 2 15:00-17:00 Demo Session
Session Chair: Raj Tumuluri
An Adaptive GPT-4-powered Socially Interactive Agent for Conversing about Health
J. Molto, U. Visser, J. Fields, and C. Lisetti
An AI-Powered Interactive Interface to Enhance Accessibility of Interview Training for Military Veterans
R. C. Yarlagadda, P. Aggarwal, V. Jamadagni, G. Mahajani, P. Malasani, E. H. Nirjhar, and T. Chaspari
Combining Generative and Discriminative AI for High-Stakes Interview Practice
C. W. Leong, N. Jawahar, V. Basheerabad, T. Wörtwein, A. Emerson, and G. Sivan
Enhancing Biodiversity Monitoring: An Interactive Tool for Efficient Identification of Species in Large Bioacoustics Datasets
H. Kath, I. Troshani, B. Lüers, T. S. Gouvêa, and D. Sonntag
ARCADE: An Augmented Reality Display Environment for Multimodal Interaction with Conversational Agents
C. Schindler, D. Mayumi, Y. Matsuda, N. Rach, K. Yasumoto, and W. Minker
Let’s Dance Together! AI Dancers Can Dance to Your Favorite Music and Style
R. Ishii, S. Eitoku, S. Matsuo, M. Makiguchi, A. Hoshi, and L. P. Morency
Human Contact Annotator: Annotating Physical Contact in Dyadic Interactions
M. Doyran, A. A. Salah, and R. Poppe
Bespoke: Using LLM agents to generate just-in-time interfaces by reasoning about user intent
P. Nandy, S. O. Adalgeirsson, A. K. Sinha, T. Kraljic, M. Cleron, L. Shi, A. Singh, A. Chaudhary, A. Ganti, C. A. Melancon, S. Zhang, D. Robishaw, H. Ciurdar, J. Secor, K. A. Robertsen, K. Climer, M. Le, M. Venkatesan, P. Chi, P. Li, P. F. McDermott, R. Shim, S. Onsan, S. Vaishnav, and S. Guamán
Day 3 10:30-12:00 Oral Session 5: Biomedical Data Processing
Session Chair: Ali Etemad
Putting the “Brain” Back in the Eye-Mind Link: Aligning Eye Movements and Brain Activations During Naturalistic Reading
M. Caruso, R. Southwell, L. Hirshfield, and S. D’Mello
Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes
M. Sharma, C. Martínez, B. Wirth, A. Krüger, and P. Müller
Detecting Deception in Natural Environments Using Incremental Transfer Learning
M. Ahmad, A. Alzahrani, and S. Ahmad
Stressor Type Matters! — Exploring Factors Influencing Cross-Dataset Generalizability of Physiological Stress Detection
P. Prajod, B. Mahesh, and E. André
Day 3 14:30-15:00 Challenge Overview
Session Chair: Ronald Böck
Introduction to Grand Challenges 2024
Ronald Böck
Empathic Virtual Agent Challenge: Appraisal-based Recognition of Affective States
Safaa Azzakhnini
ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions
Micol Spitale
Day 3 15:30-16:30 Blue Sky Papers
Session Chair: Ali Etemad
15:30-15:50 AI as Modality in Human Augmentation: Toward New Forms of Multimodal Interaction with AI-Embodied Modalities
R.-D. Vatavu
15:50-16:10 RealSeal: Revolutionizing Media Authentication with Real-Time Realism Scoring
B. Radharapu and H. Krishna
16:10-16:30 Everything We Hear: Towards Tackling Misinformation in Podcasts
S. P. Cherumanal, U. Gadiraju and D. Spina
Day 3 16:30-18:00 Poster Presentations 3 & Late Breaking Results
Session Chair: Radoslaw Niewiadomski
Generating Facial Expression Sequences of Complex Emotions with Generative Adversarial Networks
Z. Belmekki, D. Gómez Jáuregui, P. Reuter, J. Li, J. C. Martin, K. Jenkins, and N. Couture
Envisioning Futures: How the Modality of AI Recommendations Impacts Conversation Flow in AR-enhanced Dialogue
S. Villa, Y. Weiss, M. Y. Lu, M. Ziarko, A. Schmidt, and J. Niess
Across Trials vs Subjects vs Contexts: A Multi-Reservoir Computing Approach for EEG Variations in Emotion Recognition
A. Anubhav and K. Fujiwara
Detecting Aware and Unaware Mind Wandering During Lecture Viewing: A Multimodal Machine Learning Approach Using Eye Tracking, Facial Videos and Physiological Data
B. Bühler, E. Bozkir, H. Deininger, P. Goldberg, P. Gerjets, U. Trautwein, and E. Kasneci
MR-Driven Near-Future Realities: Previewing Everyday Life Real-World Experiences Using Mixed Reality
F. Mathis, B. Myers, B. Lafreniere, M. Glueck, and D. Porpino Sobreira Marques
Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data
D. Ghose, O. Gitelson, and B. Scassellati
Anonymous-Corpus: A Multimodal Database for Understanding Video-Learning Experience
A. Salman, N. Wang, L. Martinez-Lucas, A. Vidal, and C. Busso
NapTune: Prompt-tuning for Mood Classification with Wearable Time-series along with Previous Night’s Sleep-related Measures
D. Shome, N. Montazeri Ghahjaverestan, and A. Etemad
Improving Usability of Data Charts in Multimodal Documents for Low Vision Users
Y. Prakash, A. Kolgar, Nayak, S. Alyaan, P. A. Khan, H. N. Lee, and V. Ashok
Participation Role-Driven Engagement Estimation of ASD Individuals in Neurodiverse Group Discussions
K. Stefanov, Y. Nakano, C. Kobayashi, I. Hoshina, T. Sakato, F. Nihei, C. Takayama, R. Ishii, and M. Tsujii
Detecting Autism from Head Movements using Kinesics
M. Gokmen, E. Sariyanidi, L. Yankowitz, C. J. Zampella, R. T. Schultz, and B. Tunc
Perception of Stress: A Comparative Multimodal Analysis of Time-Continuous Stress Ratings from Self and Observers
E. H. Nirjhar, W. Arthur Jr., and T. Chaspari
Day 3 16:30-18:00 Late Breaking Results
Session Chair: Ronald Böck
User-Defined Interaction for Very Low-Cost Head-Mounted Displays
Y. C. Law, H. Mendieta-Dávila, D. García-Fallas, R. G. Quiros, and M. Chacón-Rivas
Effects of Incoherence in Multimodal Explanations of Robot Failures
P. Pramanick, N. Federico, L. Raggioli, A. Rossi, and S. Rossi
Design and Preliminary Evaluation of a Stress Reflection System for High-Stress Training Environments
S. Akiri, V. Joshi, S. Taherzadeh, G. Williams, H. M. Mentis, and A. Kleinsmith
Haptic Feedback to Reduce Individual Differences in Corrective Actions for Skill Learning
S. Ono, N. Ninomiya, and H. Kanai
Towards Multimodality: Comparing Quantifications of Movement Coordination
C. Fan, V. Romero, A. Paxton, and T. Chowdhury
Unlocking the Potential of Multimodal Compositionality for Enhanced Recommendations through Sentiment Analysis
S. Nazir and M. Sadrzadeh
Enhancing Autism Spectrum Disorder Screening: Implementation and Pilot Testing of a Robot-Assisted Digital Tool
A. Di Nuovo and A. Kay
Understanding LLMs Ability to Aid Malware Analysts in Bypassing Evasion Techniques
M. Y. Wong, K. Valakuzhy, M. Ahamad, D. Blough, and F. Monrose
“Is This It?”: Towards Ecologically Valid Benchmarks for Situated Collaboration
D. Bohus, S. Andrist, Y. Bao, E. Horvitz, and A. Paradiso
An Audiotactile System for Accessible Graphs on a Coordinate Plane
C. Yang and P. Taele
Levels of Multimodal Interaction
A. K. Sinha, A. Olwal, and C. Kulkarni
Comparing Subjective Measures of Workload in Video Game Play: Evaluating the Test-Retest Reliability of the VGDS and NASA-TLX
E. Pretty, R. L. Martins Guarese, H. Fayek, and F. Zambetta
Towards Investigating Biases in Spoken Conversational Search
S. P. Cherumanal, J. R. Trippas, and D. Spina
Crossmodal Correspondences between Piquancy/Spiciness and Visual Shape
Y. Wang, M. Ohno, T. Narumi, and Y. ah Seong
Adjunct Events Day 2 09:00-12:00 Grand Challenge 2 (EVAC)
Session Chair: Safaa Azzakhnini
09:00 EVAC’2024 Opening & Challenge Introduction
09:40 EVAC’2024 Contribution – Johns Hopkins Center for Language and Speech Processing
10:30 EVAC’2024 Challenge Results
10:40 EVAC’2024 Panel Discussion
11:30 Closing
Papers not presented in the conference (Included in the main proceedings)
SMURF: Statistical Modality Uniqueness and Redundancy Factorization nominated for best paper award
Wörtwein, N. Allen, J. Cohn, L. P. Morency
The impact of auditory warning types and emergency obstacle avoidance takeover scenarios on takeover behavior
X. Li and Z. Xu
Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting
D. Gupta, A. Bhatti, S. Parmar, C. Dan, Y. Liu, B. Shen, and S. Lee
Nonverbal Dynamics in Dyadic Videoconferencing Interaction: The Role of Video Resolution and Conversational Quality
C. Diao, S. Arevalo Arboleda, and A. Raake


