Keynote Speakers
What is Multimodal?

Louis-Philippe Morency
Associate Professor in the Language Technology Institute
Carnegie Mellon University
Abstract: Our experience of the world is multimodal – we see objects, hear sounds, feel texture, smell odors, and taste flavors. In recent years, a broad and impactful body of research emerged in artificial intelligence under the umbrella of multimodal, characterized by multiple modalities. As we formalize a long-term research vision for multimodal research, it is important to reflect on its foundational principles and core technical challenges. What is multimodal? Answering this question is complicated by the multi-disciplinary nature of the problem, spread across many domains and research fields. This talk is based on a recent review of 700+ research papers, to study computational and theoretical foundations for multimodal research, with a focus on multimodal machine learning. Two key principles have driven many multimodal innovations: heterogeneity and interconnections from multiple modalities. Historical and recent progress will be synthesized in a research-oriented taxonomy, centered around 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification. The talk will conclude with open questions and unsolved challenges essential for a long-term research vision in multimodal research.
Bio: Louis-Philippe Morency is Associate Professor in the Language Technology Institute at Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He was formerly research faculty in the Computer Sciences Department at University of Southern California and received his Ph.D. degree from MIT Computer Science and Artificial Intelligence Laboratory. His research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions. He received diverse awards including AI’s 10 to Watch by IEEE Intelligent Systems, NetExplo Award in partnership with UNESCO and 10 best paper awards at IEEE and ACM conferences. His research was covered by media outlets such as Wall Street Journal, The Economist and NPR.
The Future of the Body in Tomorrow’s Workplace

Justine Cassell
Dean’s Chair in Language Technologies, Carnegie Mellon University and Senior Research, Inria Paris
Abstract: Even in the most hectic or time-conscious workplace, employees gather in person to chat. And even in the most networked workplace, employees still make time for face-to-face collaboration. This isn’t surprising if we consider that the ability and desire to engage in face-to-face communication (using eye gaze to manage turn-taking, head nods to indicate listening, and smiles to indicate attention) starts soon after birth – well before infants even learn to talk. We might say that we are built to communicate face-to-face! But what role will embodied interaction play in the future workplace, when we will be interacting with autonomous robots, engaging with other people through presence robots, and working in a virtual world where we and our colleagues are represented by avatars? In this talk I will describe some of the ways that embodied interaction is likely to change in the future, and some of the ways that we need to take those future scenarios into account as we design and implement multimodal interfaces.
Bio: Justine Cassell holds the Dean’s Chair in Language Technologies at Carnegie Mellon University, and is also a Senior Researcher at Inria Paris, where she holds a PRAIRIE chair in Artificial Intelligence. Cassell was named a member of the French governmental council the CNNUM, on the future of digital technology, in 2021, was honored with the Henry and Bryna David award by the National Academies of Science in 2018, received the AAMAS “Influential Paper” award in 2017, and is a fellow of the AAAS, RSE, and ACM. She received tenure as a faculty member at MIT. She has been carrying out research on the multimodal nature of conversation, and multimodal technologies, for a very long time.
Real Talk, Real Listening, Real Change

Deb Roy
Abstract: In an era of rising toxic polarization, political intolerance, and plummeting social trust, the need to listen to one another could not be greater. For all the promise of social media to give everyone a voice, in reality it is the loudest and most polarizing voices that tend to dominate. Our research and development teams at the MIT Center for Constructive Communication and Cortico are developing tools and methods designed to foster authentic conversation and scalable deep listening by merging age-old human practices of facilitated dialogue with modern methods of digital design, speech and language processing, and AI-powered data science. We have partnered with field organizations – ranging from small community organizations to municipalities to global nonprofits – to make sense of the conversations they collect, amplify typically underheard voices, inform public understanding, drive better policy and decisions, and enable unforeseen connections across the ideological spectrum. In this talk I will provide an overview of the approach, highlight some case studies, and sketch open research questions motivated by the work.
Bio: Deb Roy is Professor of Media Arts and Sciences at MIT where he directs the MIT Center for Constructive Communication (CCC). He is also co-founder and CEO of Cortico, a nonprofit social technology company and deployment partner of CCC. In collaboration with colleagues at CCC, Cortico, and a growing network of collaborators, Roy designs human-machine systems for understanding and navigating social media media ecosystems, and for creating new communication spaces for stronger democracy.
Previously Roy was a Visiting Professor at Harvard Law School in 2021-22, Executive Director of the MIT Media Lab from 2019-21, Twitter’s Chief Media Scientist from 2013-17, and co-founder & CEO of Bluefin Labs, a media analytics company that became Twitter’s largest acquisition to date in 2013. He is the author of over 175 academic papers including a study of the spread of false news that was the cover of Science magazine in 2018 and one of the most influential academic publications of the year. Roy’s widely-viewed TED talk Birth of a Word presents his research on his son’s language development that led to new ideas in media analytics, while his 2021 Chautauqua lecture envisions a new kind of social communication platform for a stronger democracy.
Focus on People: Five Questions from Human-Centered Computing

Daniel Gatica-Perez
Abstract: A substantial body of research in multimodal interaction has studied how people naturally interact – face-to-face and through machines – and developed technology to analyze, support, and extend such forms of interaction. The talk will share personal experiences and views on how audio-visual and ubiquitous research on social interaction has evolved over the past two decades. Five recurrent questions, then and now, include how to study interaction in everyday life; how to learn from and collaborate with the humanities and social sciences; how to think about data; how to address the challenges brought by automation; and how to engage and empower individuals and communities to take part in research projects. Today, the limitations of technology-centric solutions are more evident than ever. Future research with a people-first focus will continue to call for reflection, commitment, and action for a long-term alignment with societal needs and nature’s limits.
Bio: Daniel Gatica-Perez directs the Social Computing Group at Idiap, and is adjunct professor with EPFL School of Engineering and College of Humanities. His research uses methods from ubiquitous computing, social media, and machine learning to study how people and technology interface in everyday life. He has been a member of the ICMI community in the last two decades, serving in the past as Chair of the Conference Steering Board among other roles. He serves on the Editorial Board of the Proceedings of the ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT). He also works with cities and organizations on social innovation projects.



