A multimodal approach to understanding human vocal expressions and beyond
Prof. Shrikanth (Shri) Narayanan
Professor of Electrical Engineering
Signal Analysis and Interpretation Laboratory
University of Southern California, Los Angeles, CA
Abstract: Human verbal and nonverbal expressions carry crucial information not only about intent but also emotions, individual identity, and the state of health and wellbeing. From a basic science perspective, understanding how such rich information is encoded in these signals can illuminate underlying production mechanisms including the variability therein, within and across individuals. From a technology perspective, finding ways for automatically processing and decoding this complex information continues to be of interest across a variety of applications.
The convergence of sensing, communication and computing technologies is allowing access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the multimodal analysis and interpretation of the generation of human expressions. The first part of the talk will highlight advances that allow us to perform investigations on the dynamics of vocal production using real-time imaging and audio modeling to offer insights about how we produce speech and song with the vocal instrument. The second part of the talk will focus on the production of vocal expressions in conjunction with other signals from the face and body especially in encoding affect. The talk will draw data from various domains notably in health to illustrate some of the applications.
Bio: Shrikanth (Shri) Narayanan is the Niki & C. L. Max Nikias Chair in Engineering at the University of Southern California, where he is Professor of Electrical Engineering, and jointly in Computer Science, Linguistics, Psychology, Neuroscience and Pediatrics, Director of the Ming Hsieh Institute and Research Director of the Information Sciences Institute. Prior to USC he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Fellow of the Acoustical Society of America, IEEE, ISCA, the American Association for the Advancement of Science (AAAS), Association for Psychological Science (APS) and the National Academy of Inventors. Shri Narayanan is Editor in Chief for IEEE Journal of Selected Topics in Signal Processing and an Editor for the Computer, Speech and Language Journal and an Associate Editor for the APISPA Transactions on Signal and Information Processing having previously served an Associate Editor for the IEEE Transactions of Speech and Audio Processing (2000-2004), the IEEE Signal Processing Magazine (2005-2008), the IEEE Transactions on Signal and Information Processing over Networks (2014-2015), IEEE Transactions on Multimedia (2008-2012), the IEEE Transactions on Affective Computing, and the Journal of Acoustical Society of America. He is a recipient of several honors including the 2015 Engineers Council’s Distinguished Educator Award, a Mellon award for mentoring excellence, the 2005 and 2009 Best Journal Paper awards from the IEEE Signal Processing Society and serving as its Distinguished Lecturer for 2010-11, as an ISCA Distinguished Lecturer for 2015-16 and the 2017 Willard R. Zemlin Memorial Lecturer for ASHA. With his students, he has received several best paper awards including a 2014 Ten-year Technical Impact Award from ACM ICMI and a six-time winner of the Interspeech Challenges. He has published over 750 papers and has been granted 17 U.S. patents.
Using Technology for Health and Wellbeing
Dr. Mary Czerwinski
Principle Researcher of the Visualization and Interaction
Microsoft Research, WA
Abstract: How can we create technologies to help us reflect on and change our behavior, improving our health and overall wellbeing? In this talk, I will briefly describe the last several years of work our research team has been doing in this area. We have developed wearable technology to help families manage tense situations with their children, mobile phone-based applications for handling stress and depression, as well as logging tools that can help you stay focused or recommend good times to take a break at work. The overarching goal in all of this research is to develop tools that adapt to the user so that they can maximize their productivity and improve their health.
Bio: Dr. Mary Czerwinski is a Principle Researcher and Research Manager of the Visualization and Interaction (VIBE) Research Group. Mary's latest research focuses primarily on emotion tracking and intervention design and delivery, information worker task management and health and wellness for individuals and groups. Her research background is in visual attention and multitasking. She holds a Ph.D. in Cognitive Psychology from Indiana University in Bloomington. Mary was awarded the ACM SIGCHI Lifetime Service Award, was inducted into the CHI Academy, and became an ACM Distinguished Scientist in 2010. She also received the Distinguished Alumni award from Indiana University's Brain and Psychological Sciences department in 2014. Mary became a Fellow of the ACM in 2016.
Reinforcing, reassuring, and roasting: The forms and functions of the human smile
Paula M. Niedenthal
Professor of Psychology
University of Wisconsin-Madison
Abstract: What are facial expressions for? In social-functional accounts, they are efficient adaptations that are used flexibly to address the problems inherent to successful social living. Facial expressions both broadcast emotions and regulate the emotions of perceivers. Research from my laboratory focuses on the human smile and demonstrates how this very nuanced display varies in its physical form in order to solve three basic social challenges: rewarding others, signaling non-threat, and negotiating social hierarchies. We mathematically modeled the dynamic facial-expression patterns of reward, affiliation, and dominance smiles using a data-driven approach that combined a dynamic facial expression generator with methods of reverse correlation. The resulting models were validated using human-perceiver and Baysian classifiers. Human smile stimuli were also developed and validated in studies in which distinct effects of the smiles on physiological and hormonal processes were observed. The social-function account is extended to the acoustic form of laughter and is used to address questions about cross-cultural differences in emotional expression.
Bio: Paula M. Niedenthal received her Ph.D. from the University of Michigan and was on the faculty of the departments of Psychology at Johns Hopkins University and Indiana University. She was a member of the National Centre for Scientific Research in France for more than a decade and is now the Howard Leventhal WARF Professor of Psychology at the University of Wisconsin at Madison. Her areas of research include emotion-cognition interaction, representational models of emotion, and the processing of facial expression. Dr. Niedenthal is senior author of the textbook The Psychology of Emotion, 2nd Ed. (Routledge) and is in-coming president of the Society for Affective Science.
Sustained Contribution Award Keynote
Put That There: 20 Years of Research on Multimodal Interaction
Prof. James L. Crowley
Professor, Grenoble Polytechnique Institut
Pervasive Interaction research group Univ. Grenoble Alpes, Montbonnot, France
Abstract: Humans interact with the world using five major senses: sight, hearing, touch, smell, and taste. Almost all interaction with the environment is naturally multimodal, as audio, tactile or paralinguistic cues provide confirmation for physical actions and spoken language interaction. Multimodal interaction seeks to fully exploit these parallel channels for perception and action to provide robust, natural interaction.
Richard Bolt's "Put That There" (1980) provided an early paradigm that demonstrated the power of multimodality and helped attract researchers from a variety of disciplines to study a new approach for post-WIMP computing that moves beyond desktop graphical user interfaces (GUI).
In this talk, I will look back to the origins of the scientific community of multimodal interaction, and review some of the more salient results that have emerged over the last 20 years, including results in machine perception, system architectures, visualization, and computer to human communications.
Recently, a number of game-changing technologies such as deep learning, cloud computing, and planetary scale data collection have emerged to provide robust solutions to historically hard problems. As a result, scientific understanding of multimodal interaction has taken on new relevance as construction of practical systems has become feasible. I will discuss the impact of these new technologies and the opportunities and challenges that they raise. I will conclude with a discussion of the importance of convergence with cognitive science and cognitive systems to provide foundations for intelligent, human-centered interactive systems that learn and fully understand humans and human-to-human social interaction, in order to provide services that surpass the abilities of the most intelligent human servants.
Bio: James L. Crowley is a Professor at the Univ. Grenoble Alpes, where he teaches courses in Computer Vision, Machine Learning and Artificial Intelligence at Grenoble Polytechnique Institut (Grenoble INP). He directs the Pervasive Interaction research group at INRIA Grenoble Rhône-Alpes Research Center in Montbonnot, France.
Over the last 35 years, professor Crowley has made a number of fundamental contributions to computer vision, robotics and multi-modal interaction. These include early innovations in scale invariant computer vision, localization and mapping for mobile robots, appearance-based techniques for computer vision, and visual perception for human-computer interaction.
Current research concerns context aware observation of human activity, Ambient Intelligence, and new forms of Human-Computer Interaction based on machine perception.