ICMI 2023 Conference Program
Please note that some changes can still happen due to unforeseen circumstances.
Program at a glance
Workshops and Tutorials
Main Conference
Detailed Program
Tuesday, 10 October 2023
Wednesday, 11 October 2023
Thursday, 12 October 2023
Papers not presented in-person
Tuesday, 10 October
All sessions will take place in the Auditorium, Sorbonne University International Conference Centre except for the Poster Session that will be in the Foyer of the Auditorium, Sorbonne University International Conference Centre
09:00-09:15 | Welcome ICMI 2023 General Chairs |
09:15-10:15 | Keynote 1: Multimodal information processing in communication: the nature of faces and voices Prof. Sophie Scott Session Chair: Louis-Philippe Morency |
10:15-10:45 | Break |
10:45-12:05 | Oral Session 1: Social and Physiological Signals Session Chair: TBA |
10:45-11:05 | EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning D.Pulver, P.Angka, P.Hungler and A.Etemad |
11:05-11:25 | Representation Learning for Interpersonal and Multimodal Behavior Dynamics: A Multiview Extension of Latent Change Score Models A.Vail, J.M.Girard,L.Bylsma, J.Fournier, H.Swartz, J.Cohn and L.-P.Morency |
11:25-11:45 | Crucial Clues: Investigating Psychophysiological Behaviors for Measuring Trust in Human-Robot Interaction M.Ahmad and A.Alzahrani |
11:45-12:05 | Understanding the Social Context of Eating with Multimodal Smartphone Sensing: The Role of Country Diversity N.D.Kammoun, L.Meegahapola and D.Gatica-Perez |
12:05-14:00 | Lunch |
14:00-15:20 | Oral Session 2: Bias and Diversity Session Chair: TBA |
14:00-14:20 | Using Explainability for Bias Mitigation: A Case Study for Fair Recruitment Assessment G.Sogancioglu, H.Kaya and A.A.Salah |
14:20-14:40 | Multimodal Bias: Assessing Gender Bias in Computer Vision Models with NLP Techniques A. Mandal, S.Little and S.Leavy |
14:40-15:00 | Recognizing Intent in Collaborative Manipulation Z.Rysbek, K-H.Oh and M.Zefran |
15:00-15:20 | Evaluating Outside the Box: Lessons Learned on eXtended Reality Multi-modal Experiments Beyond the Laboratory B.Marques, S.Silva, R.Maio, J.Alves, C.Ferreira, P.Dias, B.Sousa Santos |
15:20-15:50 | Break |
15:20-17:20 | Poster Session 1 (including Doctoral Consortium posters) Session Chair: TBA |
Analyzing and Recognizing Interlocutors’ Gaze Functions from Multimodal Nonverbal Cues A.Tashiro, M.Imamura, S.Kumano and K.Otsuka | |
Multimodal Fusion Interactions: A Study of Human and Automatic Quantification P.P.Liang, Y.Cheng, R.Salakhutdinov and L.-P.Morency | |
HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer Y.Kim, D.W.Lee, P.P.Liang, S.Alghowinem, C.Breazeal and H.W.Park | |
Deciphering Entrepreneurial Pitches: A Multimodal Deep Learning Approach to Predict Probability of Investment P.van Aken, M.M.Jung, W.Liebregts and I.O.Ertugrul | |
Identifying Interlocutors’ Behaviors and its Timings Involved with Impression Formation from Head-Movement Features and Linguistic Features S.Otsuchi, K.Ito, Y.Ishii, R.Ishii, S.Eitoku and K.Otsuka | |
Evaluating the Potential of Caption Activation to Mitigate Confusion Inferred from Facial Gestures in Virtual Meetings M.Heck, J.Jeong and C.Becker | |
Towards Autonomous Physiological Signal Extraction From Thermal Videos Using Deep Learning K.Das, M.Abouelenien, M.G.Burzo, J.Elson, K.Prakah-Asante and C.Maranville | |
Exploring Feedback Modality Designs to Improve Young Children’s Collaborative Actions A.Melniczuk and E.Vrapi | |
Breathing New Life into COPD Assessment: Multisensory Home-monitoring for Predicting Severity Z.Xiao, M.Muszynski, R.Marcinkevičs, L.Zimmerli, A.D.Ivankay, D.Kohlbrenner, M.Kuhn, Y.Nordmann, U.Muehlner, C.Clarenbach,J.E.Vogt and T.Brunschwiler | |
Analyzing Synergetic Functional Spectrum from Head Movements and Facial Expressions in Conversations M.Imamura, A.Tashiro, S.Kumano and K.Otsuka | |
Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines M.Singh, X.Hoque, D.Zeng, Y.Wang, K.Ikeda and A.Dhall | |
Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction M.Sharma, S.Chen, P.Müller, M.Rekrut and A.Krüger | |
Multimodal Analysis and Assessment of Therapist Empathy in Motivational Interviews T.Tran, Y.Yin, L.Tavabi, J.Delacruz, B.Borsari, J.D..Woolley, S.Scherer and M.Soleymani | |
Multimodal Turn Analysis and Prediction for Multi-party Conversations M-C.Lee, M.Trinh and Z.Deng | |
Explainable Depression Detection via Head Motion Patterns M.Gahalawat, R.Fernandez Rojas, T.Guha, R.Subramanian, R.Goecke | |
Early Classifying Multimodal Sequences A.Cao, J.Utke and D.Klabjan | |
Predicting Player Engagement in Tom Clancy’s The Division 2: A Multimodal Approach via Pixels and Gamepad Actions K.Pinitas, D.Renaudie, M.Thomsen, M.Barthet, K.Makantasis, A.Liapis and G.Yannakakis | |
On Head Motion for Recognizing Aggression and Negative Affect during Speaking and Listening S.Fitrianie and I.Lefter | |
SHAP-based Prediction of Mother’s History of Depression to Understand the Influence on Child Behavior M.Bilalpur, S.Hinduja, L.Cariola, L.Sheeber, N.Allen, L-P. Morency, and. J.Cohn | |
Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders T.Saga, H.Tanaka and S.Nakamura | |
Acoustic and Visual Knowledge Distillation for Contrastive Audio-Visual Localization E.Yaghoubi, A.P.Kelm, T.Gerkmann and S.Frintrop | |
Performance Exploration of RNN Variants for Recognizing Daily Life Stress Levels by Using Multimodal Physiological Signals Y.Said Ca and, E.André | |
Enhancing Resilience to Missing Data in Audio-Text Emotion Recognition with Multi-Scale Chunk Regularization W-C.Lin, L.Goncalves and C.Busso | |
Interpreting Sign Language Recognition using Transformers and MediaPipe Landmarks C.Luna-Jiménez, M.Gil-Martín, R.Kleinlein, R.San-Segundo and F.Fernández-Martínez | |
Expanding the Role of Affective Phenomena in Multimodal Interaction Research L.Mathur, M.Mataric and L.-P.Morency | |
15:20-17:20 | Doctoral Consortium posters Session Chair: TBA |
Smart Garments for Immersive Home Rehabilitation Using VR L.A.Magre | |
Crowd Behavior Prediction Using Visual and Location Data un Super-Crowded Scenarios A.B.M.Wijaya | |
Recording Multimodal Pair-Programming Dialogue for Reference Resolution by Conversational Agents C.Domingo | |
Modeling Social Cognition and Its Neurologic Deficits with Artificial Neural Networks L.P.Mertens | |
Come Fl.. Run with me: Understanding the Utilization of Drones to Support Recreational Runner’s Well Being A.Balasubramaniam | |
Conversational Grounding in Multimodal Dialog Systems B.Mohapatra | |
Explainable Depression Detection using Multimodal Behavioural Cues M.Gahalawat | |
Enhancing Surgical Team Collaboration and Situation Awareness Through Multimodal Sensing A.Allemang-Trivalle | |
Bridging Multimedia Modalities: Enhanced Multimodal AI Understanding and Intelligent Agents S.Gautam | |
Wednesday, 11 October
All sessions will take place in the Auditorium, Sorbonne University International Conference Centre, except for the Poster session that will be in TBA and Demo Session that will be in Foyer of the Auditorium, Sorbonne University International Conference Centre
09:15-10:15 | Keynote 2: A Robot Just for You: Multimodal Personalized Human-Robot Interaction and the Future of Work and Care Prof. Maja Mataric Session Chair: Tanja Schultz |
10:15-10:45 | Break |
10:45-12:05 | Oral Session 3: Affective Computing Session Chair: TBA |
10:45-11:05 | Neural Mixed Effects for Nonlinear Personalized Predictions T.Wörtwein, N.Allen, L.Sheeber, R.Auerbach, J.Cohn and L.-P.Morency |
11:05-11:25 | Detecting When the Mind Wanders Off Task in Real-time: An Overview and Systematic Review V.Kuvar, J.W.Y.Kam, S. Hutt and C.Mills |
11:25-11:45 | Annotations from speech and heart rate: impact on multimodal emotion recognition K.Sharma and G.Chanel |
11:45-12:05 | Toward Fair Facial Expression Recognition with Improved Distribution Alignment M.Kolahdouzi and A.Etemad |
12:05-14:00 | Lunch |
14:00-15:20 | Oral Session 4: Multimodal Interfaces Session Chair: TBA |
14:00-14:20 | Ether-Mark: An Off-Screen Marking Menu For Mobile Devices H.Rateau, Y.Rekik and E.Lank |
14:20-14:40 | Embracing Contact: Detecting Parent-Infant Interactions M.Doyran, R.Poppe and A.Ali Salah |
14:40-15:00 | Cross-Device Shortcuts: An Interaction Technique that Creates Deep Links between Apps Across Devices for Content Transfer M.Beyeler, Y.F.Cheng and C.Holz |
15:00-15:20 | Component attention network for multimodal dance improvisation recognition J. Fu, J. Tan, W. Yin, S. Pashami, and M. Björkman |
15:20-15:40 | Challenge Overview Talks |
15:40-16:10 | Break Overlapping with the poster session |
15:40-17:40 | Poster Session 2 (and Demo Session) Session Chair: TBA |
TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices T.Gemicioglu, R.Michael Winters, Y-T.Wang,T.Gable, I.J.Tashev | |
Using Augmented Reality to Assess the Role of Intuitive Physics in the Water-Level Task R.Abadi, LM.Wilcox and R.Allison | |
Classification of Alzheimer’s Disease with Deep Learning on Eye-tracking Data H.Sriram, C.Conati and T.Field | |
Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine Interaction T.Obi and K.Funakoshi | |
The Role of Audiovisual Feedback Delays and Bimodal Congruency for Visuomotor Performance in Human-Machine Interaction A.Dix,C.Sabrina and A.M.Harkin | |
Can empathy affect the attribution of mental states to robots? C.Gena, F.Manini, A.Lieto, A.Lillo and F.Vernero | |
AIUnet: Asymptotic inference with U2-Net for referring image segmentation M.Heck, J.Jeong and C.Becker | |
Using Speech Patterns to Model the Dimensions of Teamness in Human-Agent Teams E.Doherty, C.Spencer, L.Eloy, N.R.Dickler and L.Hirshfield | |
Robot Duck Debugging: Can Attentive Listening Improve Problem Solving? M.T.Parreira, S.Gillet and I.Leite | |
Estimation of Violin Bow Pressure Using Photo-Reflective Sensors Y.Mizuho and R.Kitamura and Y.Sugiurar | |
Paying Attention to Wildfire: Using U-Net with Attention Blocks on Multimodal Data for Next Day Prediction J.Fitzgerald,E.Seefried, J.E.Yost, S.Pallickara and N.Blanchard | |
ReNeLiB: Real-time Neural Listening Behavior Generation for Socially Interactive Agents D.S.Withanage Don, P.Müller, F.Nunnari, E.André and P.Gebhard | |
Large language models in textual analysis for gesture selection L.Birka, N.Yongsatianchot, P.G.Torshizi, E.Minucci and S.Marsella | |
Increasing Heart Rate and Anxiety Level with Vibrotactile and Audio Presentation of Fast Heartbeat R.Wang, H.Zhang, S.A.Macdonald, P.Di Campli San Vito | |
User Feedback-based Online Learning for Intent Classification K.Gönç, B.Sağlam, O.Dalmaz, T.Çukur, S.Kozat and H.Dibeklioglu | |
µGeT: Multimodal eyes-free text selection technique combining touch interaction and microgestures G.R.J.Faisandaz, A.Goguey, C.Jouffrais and L.Nigay | |
Deep Breathing Phase Classification with a Social Robot for Mental Health K.Matheus, E.Mamantov, M.Vázquez and B.Scassellati | |
ASMRcade: Interactive Audio Triggers for an Autonomous Sensory Meridian Response S.Mertes, M.Strobl, R.Schlagowski and E. André | |
Augmented Immersive Viewing and Listening Experience Based on Arbitrarily Angled Interactive Audiovisual Representation T.Horiuchi, S.Okuba and T.Kobayashi | |
Out of Sight, … How Asymmetry in Video-Conference Affects Social Interaction C.Sallaberry, G.Englebienne, J.Van Erp and V.Evers | |
Demo Session Session Chair: TBA |
Thursday, 12 October
All sessions will take place in the Auditorium, Sorbonne University International Conference Centre except for the Poster Session that will be in the Foyer of the Auditorium, Sorbonne University International Conference Centre
09:15-10:15 | Keynote 3: Projecting Life Onto Machines Prof. Simone Natale Session Chair: Alessandro Vinciarelli |
10:15-10:45 | Break |
10:45-12:05 | Oral Session 5: Gestures and Social Interactions Session Chair: TBA |
10:45-11:05 | AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis H.Voß and S.Kopp |
11:05-11:25 | Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion A.Ideno, T.Kaneko and T.Harada |
11:25-11:45 | FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning K.I.Haque and Z.Yumak |
11:45-12:05 | Influence of hand representation on a grasping task in augmented reality L.Lafuma, G.Bouyer, O.Goguel and J.-Y.P.Didier |
12:05-14:00 | Lunch |
14:00-15:00 | Keynote 4 – Sustained Achievement Award Session Chair: TBA |
15:00-15:30 | Break Overlapping with Poster Session 3 |
15:00-16:45 | Poster Session 3 and Late Breaking Results Session Chair: TBA |
Synerg-eye-zing: Decoding Nonlinear Gaze Dynamics Driving Successful Collaborations in Co-located Teams G.S.Rajshekar, L.Eloy, R.Dickler, J.G.Reitman, S.L.Pugh, P.Foltz, J.C.Gorman, J.Harrison and L. Hirshfield | |
Exploring Neurophysiological Responses to Cross-Cultural Deepfake Videos M.R.Khan, S.Naeem, U.Tariq, A.Dhall, M.N.A.Khan, F.Al Shargie and H.Al Nashash | |
Characterization of collaboration in a virtual environment with gaze and speech signals A.Léchappé, A.Milliat, C.Fleury, M.Chollet and C.Dumas | |
HEARD-LE: An Intelligent Conversational Interface for Wordle C.Yang, K.Arredondo, J.I.Koh, P.Taele and T.Hammond | |
Assessing Infant and Toddler Behaviors through Wearable Inertial Sensors: A Preliminary Investigation A.Onodera, R.Ishioka, Y.Nishiyama and K.Sezaki | |
ASAR Dataset and Computational Model for Affective State Recognition During ARAT Assessment for Upper Extremity Stroke Survivors T.Ahmed, T.Rikakis, A.Kelliher and M.Soleymani | |
The Limitations of Current Similarity-Based Objective Metrics In the Context of Human-Agent Interaction Applications A.Deffrennes, L.Vincent, M.Pivette, K.El Haddad, J.D.Bailey, M.Perusquia-Hernandez, S.M.Alarcão and T.Dutoit | |
Do Body Expressions Leave Good Impressions? – Predicting Investment Decisions based on Pitcher’s Body Expressions M.M.Jung, M.van Vlierden, W.Liebregts and I.Onal Eturgul | |
Multimodal Entrainment in Bio-Responsive Multi-User VR Interactives M.Song and S.Di Paola | |
Multimodal Synchronization in Musical Ensembles: Investigating Audio and Visual Cues S.Chakraborty and J.Timoney | |
Insights Into the Importance of Linguistic Textual Features on the Persuasiveness of Public Speaking A.Barkar, M.Chollet, B.Biancardi and C.Clavel | |
Detection of contract cheating in pen-and-paper exams through the analysis of handwriting style K.Kunzentsov, M.Barz and D.Sonntag | |
Leveraging gaze for potential error prediction in AI-support systems: An exploratory analysis of interaction with a simulated robot B.Severitt, N.J.Castner, O.Lukashova-Sanz and S.Wahl | |
Developing a Generic Focus Modality for Multimodal Interactive Environments F.Barros, A.Teixeira and S.Silva | |
Multimodal Prediction of User’s Performance in High-Stress Dialogue Interactions S.Nasihati Gilani, K.Pollard and D.Traum | |
Understanding the Physiological Arousal of Novice Performance Drivers for the Design of Intelligent Driving Systems A.L.S.Filipowicz and H.Yasuda | |
A Portable Ball with Unity-based Computer Game for Interactive Arm Motor Control Exercise Y.Zhou, Y.An, Q.Niu, Q.Bu, Y.C.Liang, M.Leach and J.Sun | |
Virtual Reality Music Instrument Playing Game for Upper Limb Rehabilitation Training M.Sun, Q.Bu, Y.Hou, X.Ju, L.Yu, E.G.Lim and J.Sun | |
Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors P.Wolfert, G.E.Henter and T.Belpaeme | |
“Am I listening?”, Evaluating the Quality of Generated Data-driven Listening Motion D.Lala, K.Ochi, T.Kawahara and G.Skantze | |
LinLED: Low latency and accurate contactless gesture interaction S.Viollet, C.Martin and J.-M.Ingargiola | |
16:45-17:45 | Blue Sky Papers |
17:45-18:00 | Closing |
19:00-22:00 | Banquet, Le Grand Salon, La Sorbonne, La Chancellerie des Universités de Paris |
Papers Not Presented In-person
This is a list of papers for which no authors were able to attend the conference in person. While these papers do not appear in the program above, they are still available in the conference proceedings. Optionally, authors were invited to submit a pre-recorded video presentation of their paper, and submit it as supplementary material, accompanying the conference proceedings.
MMASD: A Multimodal Dataset for Autism Intervention Analysis J.Li, V.Chheang, P.Kullu, Z.Guo, A.Bhat, K.E.Barner and R.L.Barmaki |
GCFormer: A Graph Convolutional Transformer for Speech Emotion Recognition Y.Gao, H.Zhao, Y.Xiao and Z.Zhang |
How Noisy is Too Noisy? The Impact of Data Noise on Multimodal Recognition of Confusion and Conflict During Collaborative Learning Y.Ma, M.Celepkolu, K.E.Boyer, C.Lynch, E.Wiebe and M.Israel |
Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation Y.Sun, Q.Wu, H.Zhou, K.Wang, T.Hu, C.-C.Liao, S.Miyafuji, Z.Liu and H.Koike |
Gait Event Prediction of People with Cerebral Palsy using Feature Uncertainty: A Low-Cost Approach S.Chakraborty, N.Thomas and A.Nandy |
ViFi-Loc: Multi-modal Pedestrian Localization using GAN with Camera-Phone Correspondences H.Liu, H.Lu, K.Dana and M.Gruteser |
Multimodal Approach to Investigate the Role of Cognitive Workload and User Interfaces in Human-robot Collaboration A.Kalatzis, S.Rahman, V.G.Prabhu, L.Stanley and M.Wittie |
WiFiTuned: Monitoring Engagement in Online Participation by Harmonizing WiFi and Audio V.K.Singh, P.Kar, A.M.Sohini, M.Rangaiah, S.Chakraborty and M.Maity |