25^th ACM International Conference on Multimodal Interaction
(9-13 October 2023)

Home

Awards

Call for Bids 2025

Program

Keynotes

Registration (closed)

Tutorials

Workshops

Grand Challenges

Presentation Instruction

Doctoral Consortium

Accommodation

Proceedings

Companion Proceedings

Camera-Ready Instructions

Call for Sponsors

Call for Papers

Guidelines for Authors

Guidelines for Reviewers

Call for Blue Sky Papers

Call for Late Breaking Results

Call for Demonstrations
and Exhibits

Call for Doctoral Consortium

Call for Tutorials

Important Dates

People

Conference venue

Platinum Sponsor

Bronze Sponsor

Institutional Sponsors

ICMI 2023 Conference Program

Please note that some changes can still happen due to unforeseen circumstances.

Program at a glance

Workshops and Tutorials

Each event will start at 9:00 at the earliest and will end at 18:00 at the latest. The detailed schedule for each event can be found on their respective websites.

Main Conference

Detailed Program

Doctoral Consortium (Monday, 09 October 2023)

Tuesday, 10 October 2023

Wednesday, 11 October 2023

Thursday, 12 October 2023

Papers not presented in-person

Tuesday, 10 October

All sessions will take place in the Auditorium, Sorbonne University International Conference Centre except for the Poster Session that will be in the Foyer of the Auditorium, Sorbonne University International Conference Centre

09:00-09:15	Welcome ICMI 2023 General Chairs
09:15-10:15	Keynote 1: Multimodal information processing in communication: the nature of faces and voices *Prof. Sophie Scott* Session Chair: Louis-Philippe Morency
10:15-10:45	Break
10:45-12:05	Oral Session 1: Social and Physiological Signals Session Chair: Zakia Hammal
10:45-11:05	EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning D.Pulver, P.Angka, P.Hungler and A.Etemad
11:05-11:25	Representation Learning for Interpersonal and Multimodal Behavior Dynamics: A Multiview Extension of Latent Change Score Models A.Vail, J.M.Girard,L.Bylsma, J.Fournier, H.Swartz, J.Cohn and L.-P.Morency
11:25-11:45	Crucial Clues: Investigating Psychophysiological Behaviors for Measuring Trust in Human-Robot Interaction M.Ahmad and A.Alzahrani
11:45-12:05	Understanding the Social Context of Eating with Multimodal Smartphone Sensing: The Role of Country Diversity N.D.Kammoun, L.Meegahapola and D.Gatica-Perez
12:05-14:00	Lunch
14:00-15:20	Oral Session 2: Bias and Diversity Session Chair: Chloé Clavel
14:00-14:20	Using Explainability for Bias Mitigation: A Case Study for Fair Recruitment Assessment G.Sogancioglu, H.Kaya and A.A.Salah
14:20-14:40	Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques A. Mandal, S.Little and S.Leavy
14:40-15:00	Recognizing Intent in Collaborative Manipulation Z.Rysbek, K-H.Oh and M.Zefran
15:00-15:20	Evaluating Outside the Box: Lessons Learned on eXtended Reality Multi-modal Experiments Beyond the Laboratory B.Marques, S.Silva, R.Maio, J.Alves, C.Ferreira, P.Dias, B.Sousa Santos
15:20-15:50	Break
15:20-17:20	Poster Session 1 (including Doctoral Consortium posters) Session Chair: TBA
	Analyzing and Recognizing Interlocutors’ Gaze Functions from Multimodal Nonverbal Cues A.Tashiro, M.Imamura, S.Kumano and K.Otsuka
	Multimodal Fusion Interactions: A Study of Human and Automatic Quantification P.P.Liang, Y.Cheng, R.Salakhutdinov and L.-P.Morency
	HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer Y.Kim, D.W.Lee, P.P.Liang, S.Alghowinem, C.Breazeal and H.W.Park
	Deciphering Entrepreneurial Pitches: A Multimodal Deep Learning Approach to Predict Probability of Investment P.van Aken, M.M.Jung, W.Liebregts and I.O.Ertugrul
	Identifying Interlocutors’ Behaviors and its Timings Involved with Impression Formation from Head-Movement Features and Linguistic Features S.Otsuchi, K.Ito, Y.Ishii, R.Ishii, S.Eitoku and K.Otsuka
	Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings M.Heck, J.Jeong and C.Becker
	Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning K.Das, M.Abouelenien, M.G.Burzo, J.Elson, K.Prakah-Asante and C.Maranville
	Exploring Feedback Modality Designs to Improve Young Children’s Collaborative Actions A.Melniczuk and E.Vrapi
	Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity Z.Xiao, M.Muszynski, R.Marcinkevičs, L.Zimmerli, A.D.Ivankay, D.Kohlbrenner, M.Kuhn, Y.Nordmann, U.Muehlner, C.Clarenbach,J.E.Vogt and T.Brunschwiler
	Analyzing Synergetic Functional Spectrum from Head Movements and Facial Expressions in Conversations M.Imamura, A.Tashiro, S.Kumano and K.Otsuka
	Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines M.Singh, X.Hoque, D.Zeng, Y.Wang, K.Ikeda and A.Dhall
	Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction M.Sharma, S.Chen, P.Müller, M.Rekrut and A.Krüger
	Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews T.Tran, Y.Yin, L.Tavabi, J.Delacruz, B.Borsari, J.D..Woolley, S.Scherer and M.Soleymani
	Multimodal Turn Analysis and Prediction for Multi-party Conversations M-C.Lee, M.Trinh and Z.Deng
	Explainable Depression Detection via Head Motion Patterns M.Gahalawat, R.Fernandez Rojas, T.Guha, R.Subramanian, R.Goecke
	Early Classifying Multimodal Sequences A.Cao, J.Utke and D.Klabjan
	Predicting Player Engagement in Tom Clancy’s The Division 2: A Multimodal Approach via Pixels and Gamepad Actions K.Pinitas, D.Renaudie, M.Thomsen, M.Barthet, K.Makantasis, A.Liapis and G.Yannakakis
	On Head Motion for Recognizing Aggression and Negative Affect during Speaking and Listening S.Fitrianie and I.Lefter
	SHAP-based Prediction of Mother’s History of Depression to Understand the Influence on Child Behavior M.Bilalpur, S.Hinduja, L.Cariola, L.Sheeber, N.Allen, L-P. Morency, and. J.Cohn
	Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders T.Saga, H.Tanaka and S.Nakamura
	Acoustic and Visual Knowledge Distillation for Contrastive Audio-Visual Localization E.Yaghoubi, A.P.Kelm, T.Gerkmann and S.Frintrop
	Performance Exploration of RNN Variants for Recognizing Daily Life Stress Levels by Using Multimodal Physiological Signals Y.Said Ca and, E.André
	Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization W-C.Lin, L.Goncalves and C.Busso
	Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks C.Luna-Jiménez, M.Gil-Martín, R.Kleinlein, R.San-Segundo and F.Fernández-Martínez
	Expanding the Role of Affective Phenomena in Multimodal Interaction Research L.Mathur, M.Mataric and L.-P.Morency
15:20-17:20	Doctoral Consortium posters Session Chair: TBA
	Smart Garments for Immersive Home Rehabilitation Using VR L.A.Magre
	Crowd Behavior Prediction Using Visual and Location Data un Super-Crowded Scenarios A.B.M.Wijaya
	Recording Multimodal Pair-Programming Dialogue for Reference Resolution by Conversational Agents C.Domingo
	Modeling Social Cognition and Its Neurologic Deficits with Artificial Neural Networks L.P.Mertens
	Come Fl.. Run with me: Understanding the Utilization of Drones to Support Recreational Runner’s Well Being A.Balasubramaniam
	Conversational Grounding in Multimodal Dialog Systems B.Mohapatra
	Explainable Depression Detection using Multimodal Behavioural Cues M.Gahalawat
	Enhancing Surgical Team Collaboration and Situation Awareness Through Multimodal Sensing A.Allemang-Trivalle
	Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents S.Gautam

Wednesday, 11 October

All sessions will take place in the Auditorium, Sorbonne University International Conference Centre, except for the Poster session that will be in TBA and Demo Session that will be in Foyer of the Auditorium, Sorbonne University International Conference Centre

09:15-10:15	Keynote 2: A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care *Prof. Maja Mataric* Session Chair: Tanja Schultz
10:15-10:45	Break
10:45-12:05	Oral Session 3: Affective Computing Session Chair: Dirk Heylen
10:45-11:05	Neural Mixed Effects for Nonlinear Personalized Predictions T.Wörtwein, N.Allen, L.Sheeber, R.Auerbach, J.Cohn and L.-P.Morency
11:05-11:25	Detecting When the Mind Wanders Off Task in Real-time: An Overview and Systematic Review V.Kuvar, J.W.Y.Kam, S. Hutt and C.Mills
11:25-11:45	Annotations from speech and heart rate: impact on multimodal emotion recognition K.Sharma and G.Chanel
11:45-12:05	Toward Fair Facial Expression Recognition with Improved Distribution Alignment M.Kolahdouzi and A.Etemad
12:05-14:00	Lunch
14:00-15:20	Oral Session 4: Multimodal Interfaces Session Chair: Sean Andrist
14:00-14:20	Ether-Mark: An Off-Screen Marking Menu For Mobile Devices H.Rateau, Y.Rekik and E.Lank
14:20-14:40	Embracing Contact: Detecting Parent-Infant Interactions M.Doyran, R.Poppe and A.Ali Salah
14:40-15:00	Cross-Device Shortcuts: An Interaction Technique that Creates Deep Links between Apps Across Devices for Content Transfer M.Beyeler, Y.F.Cheng and C.Holz
15:00-15:20	Component attention network for multimodal dance improvisation recognition J. Fu, J. Tan, W. Yin, S. Pashami, and M. Björkman
15:20-15:40	Challenge Overview Talks
15:40-16:10	Break Overlapping with the poster session
15:40-17:40	Poster Session 2 (and Demo Session) Session Chair: TBA
	TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices T.Gemicioglu, R.Michael Winters, Y-T.Wang,T.Gable, I.J.Tashev
	Using Augmented Reality to Assess the Role of Intuitive Physics in the Water-Level Task R.Abadi, LM.Wilcox and R.Allison
	Classification of Alzheimer’s Disease with Deep Learning on Eye-tracking Data H.Sriram, C.Conati and T.Field
	Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine Interaction T.Obi and K.Funakoshi
	The Role of Audiovisual Feedback Delays and Bimodal Congruency for Visuomotor Performance in Human-Machine Interaction A.Dix,C.Sabrina and A.M.Harkin
	Can empathy affect the attribution of mental states to robots? C.Gena, F.Manini, A.Lieto, A.Lillo and F.Vernero
	AIUnet: Asymptotic inference with U2-Net for referring image segmentation M.Heck, J.Jeong and C.Becker
	Using Speech Patterns to Model the Dimensions of Teamness in Human-Agent Teams E.Doherty, C.Spencer, L.Eloy, N.R.Dickler and L.Hirshfield
	Robot Duck Debugging: Can Attentive Listening Improve Problem Solving? M.T.Parreira, S.Gillet and I.Leite
	Estimation of Violin Bow Pressure Using Photo-Reflective Sensors Y.Mizuho and R.Kitamura and Y.Sugiurar
	Paying Attention to Wildfire: Using U-Net with Attention Blocks on Multimodal Data for Next Day Prediction J.Fitzgerald,E.Seefried, J.E.Yost, S.Pallickara and N.Blanchard
	ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents D.S.Withanage Don, P.Müller, F.Nunnari, E.André and P.Gebhard
	Large language models in textual analysis for gesture selection L.Birka, N.Yongsatianchot, P.G.Torshizi, E.Minucci and S.Marsella
	Increasing Heart Rate and Anxiety Level with Vibrotactile and Audio Presentation of Fast Heartbeat R.Wang, H.Zhang, S.A.Macdonald, P.Di Campli San Vito
	User Feedback-based Online Learning for Intent Classification K.Gönç, B.Sağlam, O.Dalmaz, T.Çukur, S.Kozat and H.Dibeklioglu
	µGeT: Multimodal eyes-free text selection technique combining touch interaction and microgestures G.R.J.Faisandaz, A.Goguey, C.Jouffrais and L.Nigay
	Deep Breathing Phase Classification with a Social Robot for Mental Health K.Matheus, E.Mamantov, M.Vázquez and B.Scassellati
	ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response S.Mertes, M.Strobl, R.Schlagowski and E. André
	Augmented Immersive Viewing and Listening Experience Based on Arbitrarily Angled Interactive Audiovisual Representation T.Horiuchi, S.Okuba and T.Kobayashi
	Out of Sight, … How Asymmetry in Video-Conference Affects Social Interaction C.Sallaberry, G.Englebienne, J.Van Erp and V.Evers
	Demo Session Session Chair: TBA

Thursday, 12 October

09:15-10:15	Keynote 3: Projecting Life Onto Machines *Prof. Simone Natale* Session Chair: Alessandro Vinciarelli
10:15-10:45	Break
10:45-12:05	Oral Session 5: Gestures and Social Interactions Session Chair: Mohammad Soleymani
10:45-11:05	AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis H.Voß and S.Kopp
11:05-11:25	Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion A.Ideno, T.Kaneko and T.Harada
11:25-11:45	FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning K.I.Haque and Z.Yumak
11:45-12:05	Influence of hand representation on a grasping task in augmented reality L.Lafuma, G.Bouyer, O.Goguel and J.-Y.P.Didier
12:05-14:00	Lunch
14:00-15:00	Keynote 4 – Sustained Achievement Award Prof. Louis-Philippe Morency Session Chair: TBA
15:00-15:30	Break Overlapping with Poster Session 3
15:00-16:45	Poster Session 3 and Late Breaking Results Session Chair: TBA
	Synerg-eye-zing: Decoding Nonlinear Gaze Dynamics Driving Successful Collaborations in Co-located Teams G.S.Rajshekar, L.Eloy, R.Dickler, J.G.Reitman, S.L.Pugh, P.Foltz, J.C.Gorman, J.Harrison and L. Hirshfield
	Exploring Neurophysiological Responses to Cross-Cultural Deepfake Videos M.R.Khan, S.Naeem, U.Tariq, A.Dhall, M.N.A.Khan, F.Al Shargie and H.Al Nashash
	Characterization of collaboration in a virtual environment with gaze and speech signals A.Léchappé, A.Milliat, C.Fleury, M.Chollet and C.Dumas
	HEARD-LE: An Intelligent Conversational Interface for Wordle C.Yang, K.Arredondo, J.I.Koh, P.Taele and T.Hammond
	Assessing Infant and Toddler Behaviors through Wearable Inertial Sensors: A Preliminary Investigation A.Onodera, R.Ishioka, Y.Nishiyama and K.Sezaki
	ASAR Dataset and Computational Model for Affective State Recognition During ARAT Assessment for Upper Extremity Stroke Survivors T.Ahmed, T.Rikakis, A.Kelliher and M.Soleymani
	The Limitations of Current Similarity-Based Objective Metrics In the Context of Human-Agent Interaction Applications A.Deffrennes, L.Vincent, M.Pivette, K.El Haddad, J.D.Bailey, M.Perusquia-Hernandez, S.M.Alarcão and T.Dutoit
	Do Body Expressions Leave Good Impressions? – Predicting Investment Decisions based on Pitcher’s Body Expressions M.M.Jung, M.van Vlierden, W.Liebregts and I.Onal Eturgul
	Multimodal Entrainment in Bio-Responsive Multi-User VR Interactives M.Song and S.Di Paola
	Multimodal Synchronization in Musical Ensembles: Investigating Audio and Visual Cues S.Chakraborty and J.Timoney
	Insights Into the Importance of Linguistic Textual Features on the Persuasiveness of Public Speaking A.Barkar, M.Chollet, B.Biancardi and C.Clavel
	Detection of contract cheating in pen-and-paper exams through the analysis of handwriting style K.Kunzentsov, M.Barz and D.Sonntag
	Leveraging gaze for potential error prediction in AI-support systems: An exploratory analysis of interaction with a simulated robot B.Severitt, N.J.Castner, O.Lukashova-Sanz and S.Wahl
	Developing a Generic Focus Modality for Multimodal Interactive Environments F.Barros, A.Teixeira and S.Silva
	Multimodal Prediction of User’s Performance in High-Stress Dialogue Interactions S.Nasihati Gilani, K.Pollard and D.Traum
	Understanding the Physiological Arousal of Novice Performance Drivers for the Design of Intelligent Driving Systems E.Kimani, A.L.S.Filipowicz and H.Yasuda
	A Portable Ball with Unity-based Computer Game for Interactive Arm Motor Control Exercise Y.Zhou, Y.An, Q.Niu, Q.Bu, Y.C.Liang, M.Leach and J.Sun
	Virtual Reality Music Instrument Playing Game for Upper Limb Rehabilitation Training M.Sun, Q.Bu, Y.Hou, X.Ju, L.Yu, E.G.Lim and J.Sun
	Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors K.Inoue, D.Lala, K.Ochi, T.Kawahara and G.Skantze
	“Am I listening?”, Evaluating the Quality of Generated Data-driven Listening Motion P.Wolfert, G.E.Henter and T.Belpaeme
	LinLED: Low latency and accurate contactless gesture interaction S.Viollet, C.Martin and J.-M.Ingargiola
16:45-17:45	Blue Sky Papers
17:45-18:00	Closing
19:00-22:00	Banquet, Le Grand Salon, La Sorbonne, La Chancellerie des Universités de Paris

Papers Not Presented In-person

This is a list of papers for which no authors were able to attend the conference in person. While these papers do not appear in the program above, they are still available in the conference proceedings. Optionally, authors were invited to submit a pre-recorded video presentation of their paper, and submit it as supplementary material, accompanying the conference proceedings.

MMASD: A Multimodal Dataset for Autism Intervention Analysis
J.Li, V.Chheang, P.Kullu, Z.Guo, A.Bhat, K.E.Barner and R.L.Barmaki

GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition
Y.Gao, H.Zhao, Y.Xiao and Z.Zhang

How Noisy is Too Noisy? The Impact of Data Noise on Multimodal Recognition of Confusion and Conflict During Collaborative Learning
Y.Ma, M.Celepkolu, K.E.Boyer, C.Lynch, E.Wiebe and M.Israel

Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation
Y.Sun, Q.Wu, H.Zhou, K.Wang, T.Hu, C.-C.Liao, S.Miyafuji, Z.Liu and H.Koike

Gait Event Prediction of People with Cerebral Palsy using Feature Uncertainty: A Low-Cost Approach
S.Chakraborty, N.Thomas and A.Nandy

ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences
H.Liu, H.Lu, K.Dana and M.Gruteser

Multimodal Approach to Investigate the Role of Cognitive Workload and User Interfaces in Human-robot Collaboration
A.Kalatzis, S.Rahman, V.G.Prabhu, L.Stanley and M.Wittie

WiFiTuned: Monitoring Engagement in Online Participation by Harmonizing WiFi and Audio
V.K.Singh, P.Kar, A.M.Sohini, M.Rangaiah, S.Chakraborty and M.Maity

25th ACM International Conference on Multimodal Interaction (9-13 October 2023)