Keynote Speakers

Stefan Kopp

Giving Interaction a Hand – Deep Models of Co-speech Gesture in Multimodal Systems

Prof. Stefan Kopp

Sociable Agents Group, Bielefeld University, Germany

Humans frequently join words and gestures in multimodal communication. Such natural co-speech gesturing goes far beyond what currently can be processed by gesture-based interfaces, and especially its coordination with speech still poses open challenges for basic research and multimodal interfaces alike. How can we develop computational models for processing and generating natural speech-gesture behavior, in a flexible, fast and adaptive manner similar to humans? In this talk I will review approaches and methods applied to this problem. I will argue that those models need to (and can) be based on a deeper understanding of what shapes co-speech gesturing in a particular situation. In particular, I will present work that connects empirical analyses with computational modeling and evaluation to unravel the cognitive, embodied and socio-interactional mechanisms underlying the use of speech-accompanying gestural behavior, and to develop deeper models of these mechanisms for interactive systems such as virtual characters, humanoid robots, or multimodal interfaces.

Stefan Kopp is professor of Computer Science at Bielefeld University, head of the "Sociable Agents" research group at the DFG Center of Excellence "Cognitive Interaction Technology" (CITEC) and deputy coordinator of a DFG collaborative research center on "Alignment in Communication". His research centers around the question how artificial systems can turn into intuitive, socially adept interaction partners (either virtual or robotic). For this, his groups develops empirically grounded, cognitively plausible models of verbal and nonverbal socio-communicative behavior. Current projects focus on versatile multimodality using speech and gesture, the dynamics of interpersonal coordination and adaptation in dialogue, and embodied architectures that ground the learning of behavior perception and production in sensorimotor and social cognitive processing. Stefan is the current president of the German Cognitive Science Society.

Mark Billinghurst

Hands and Speech in Space: Multimodal Interaction with Augmented Reality interfaces

Prof. Mark Billinghurst

HIT Lab, University of Canterbury, New Zealand

Augmented Reality (AR) is technology that allows virtual imagery to be seamlessly integrated into the real world. Although first developed in the 1960's it has only been recently that AR has become widely available, through platforms such as the web and mobile phones. However most AR interfaces have very simple interaction, such as using touch on phone screens or camera tracking from real images. In this presentation I will talk about the opportunities for multimodal input in AR applications. New depth sensing and gesture tracking technologies such as Microsoft Kinect or Leap Motion have made is easier than ever before to track hands in space. Combined with speech recognition and AR tracking and viewing software it is possible to create interfaces that allow users to manipulate 3D graphics in space through a natural combination of speech and gesture. I will review research in multimodal AR interfaces from the HIT Lab NZ and other leading research groups to show the state of the art in multimodal AR interfaces. The talk will also give an overview of the significant research questions that need to be addressed before speech and gesture interaction with AR applications can become commonplace.

Professor Mark Billinghurst is a researcher developing innovative computer interfaces that explore how virtual and real worlds can be merged. Director of the HIT Lab New Zealand (HIT Lab NZ) at the University of Canterbury in New Zealand, he has produced over 250 technical publications and presented demonstrations and courses at a wide variety of conferences. He has a PhD from the University of Washington and conducts research in Augmented and Virtual Reality, multimodal interaction and mobile interfaces. He has previously worked at ATR Research Labs, British Telecom, Nokia and the MIT Media Laboratory. One of his research projects, the MagicBook, was winner of the 2001 Discover award for best Entertainment application, and his AR Tennis project won the 2005 IMG award for best independent mobile game. In 2001 he co-founded of ARToolworks, one of the oldest commercial AR companies.

Jim Rehg

Behavior Imaging and the Study of Autism

Prof. Jim Rehg

School of Interactive Computing, Georgia Institute of Technology, USA

Beginning in infancy, individuals acquire the social and communication skills that are vital for a healthy and productive life. Children with developmental delays face great challenges in acquiring these skills, resulting in substantial lifetime risks. Children with an Autism Spectrum Disorder (ASD) represent a particularly significant risk category, due both to the increasing rate of diagnosis of ASD and its consequences. Since the genetic basis for ASD is unclear, the diagnosis, treatment, and study of the disorder depends fundamentally on the observation of behavior. Unfortunately, current methods for acquiring and analyzing behavioral data are so labor-intensive as to preclude their large scale application. In this talk, I will describe our research agenda in Behavior Imaging, which targets the capture, modeling, and analysis of social and communicative behaviors between children and their caregivers and peers. We are developing computational methods and statistical models for the analysis of vision, audio, and wearable sensor data. Our goal is to develop a new set of capabilities for the large-scale collection and interpretation of behavioral data. I will describe several research challenges in multi-modal sensor fusion and statistical modeling which arise in this area, and present illustrative results from the analysis of social interactions with children.

James M. Rehg (pronounced "ray") is a Professor in the School of Interactive Computing at the Georgia Institute of Technology, where he is the Director of the Center for Behavior Imaging, co-Director of the Computational Perception Lab, and Associate Director of Research in the Center for Robotics and Intelligent Machines. He received his Ph.D. from CMU in 1995 and worked at the Cambridge Research Lab of DEC (and then Compaq) from 1995-2001, where he managed the computer vision research group. He received the National Science Foundation (NSF) CAREER award in 2001, and the Raytheon Faculty Fellowship from Georgia Tech in 2005. He and his students have received a number of best paper awards, including best student paper awards at ICML 2005 and BMVC 2010, and a method of the year award from Nature Methods in 2012. Dr. Rehg is active in the organizing committees of the major conferences in computer vision, most-recently serving as the Program co-Chair for ACCV 2012. He has authored more than 100 peer-reviewed scientific papers and holds 23 issued US patents. Dr. Rehg is currently leading a multi-institution effort to develop the science and technology of Behavior Imaging, funded by an NSF Expedition award (see for details).

ICMI 2013 ACM International Conference on Multimodal Interaction. 9-13th December 2013, Sydney, Australia. Copyright © 2010-2023
Photo credits: David Iliff, Enoch Lau (license: CC-BY-SA 3.0). Destination NSW, Don Fuchs, Susan Wright, David Druce.