Tutorials

Multimodal-Multisensor Behavioral Analytics: Going Deeper into Human-Centered Design

Sharon Oviatt

Prof. Sharon Oviatt

Professor of HCI and Creative Technologies,
Monash University, Australia

Abstract: In the ICMI community, we know that multimodal-multisensor data is profoundly different from past data sources. It is extremely rich and dense data that typically involves multiple time-synchronized data streams, and it also can be analyzed at multiple levels - signal, activity pattern, representational, transactional, etc. When multimodal-multisensor data are analyzed at multiple levels, they constitute a vast multi-dimensional space for discovering important new phenomena. In addition, multimodal-multisensor data afford a deeply human-centred foundation for detecting human behavioral states, and then designing user-centered adaptive systems based on them. For example, analysis of human communication and movement patterns are proving particularly apt for assessing human intention (e.g., deception), mental load and cognition (e.g., attentional focus, domain expertise), motivation and emotion (e.g., task engagement, confusion), and related health and mental health status (e.g., anxiety, neurodegenerative disease, dementia).

This tutorial will discuss the emergent international behavioral analytics movement, and how it is transforming what it means now to conduct human-centred design of computer systems. We will discuss the key capabilities that are enabling progress in this fertile area, examples of newly emergent application areas supported by this trend, and progress toward establishing accurate automated assessments compared with the "gold-standard" of human assessment. To clarify the many concrete issues involved in behavioral analytics research, we will walk through specific case histories. For balance, we will also critically analyze the state of recent behavioral analytics work, discuss major factors that currently hold the field back, and provide pointers to how future work could surpass these limitations - reaping more ultra-reliable and high-impact applications. For example, we will describe the need for higher-quality modeling, which requires leveraging multidisciplinary expertise and partnerships.

Bio: Professor Sharon Oviatt is internationally known for her work on human-centered interfaces, multimodal-multisensor interfaces, mobile interfaces, educational interfaces, the cognitive impact of computer input tools, and behavioral analytics. She originally received her PhD in Experimental Psychology at the University of Toronto, and she has been a professor of Computer Science, Information Technology, Psychology, and also Linguistics. Her research is known for its pioneering and multidisciplinary style at the intersection of Computer Science, Psychology, Linguistics, and Learning Sciences. Sharon has been recipient of the inaugural ACM-ICMI Sustained Accomplishment Award, National Science Foundation Special Creativity Award, ACM-SIGCHI CHI Academy Award, and an ACM Fellow Award for "contributions to the empirical and theoretical foundations of multimodal systems, and to human-centered computer interfaces" awarded to the top 1% of the international computing community. She has published a large volume of high-impact papers in a wide range of multidisciplinary venues (Google Scholar citations >11,000; h-index 47) and is an Associate Editor of the main journals and edited book collections in the field of human-centered interfaces. Many of Sharon's publications are considered classics that are required reading in HCI, mobile, and multimodal interface courses, including at top universities such as MIT and Stanford. Her recent books include The Design of Future Educational Interfaces (2013, Routledge Press), The Paradigm Shift to Multimodality in Contemporary Computer Interfaces (2015, Morgan-Claypool), and the multi-volume Handbook of Multimodal-Multisensor Interfaces (co-edited with Bjoern Schuller, Phil Cohen, Anthony Krueger, Gerasimos Potamianos and Daniel Sonntag, 2017-2018, ACM Books). She has delivered over 100 keynotes, invited talks and tutorials worldwide at conferences, universities and corporate events. Related to her ICMI 2018 tutorial, Sharon is one of the founders of the field of multimodal learning analytics.

Deep Learning for Multimodal and Multisensorial Interaction

Björn W. Schuller

Prof. Björn W. Schuller

Head of Group on Language Audio & Music,
Imperial College London, UK

Abstract: Intelligent interaction such as by speech recognition or handwriting recognition has long since benefitted from deep neural network approaches and in fact has largely contributed to the success of the latter. The aim of this tutorial is to particularly highlight such methods in the context of multimodal and multisensorial information processing in an interaction context. This comes at a number of specific requirements such as optimal information fusion, but usually also optimal alignment of the information streams in time. The tutorial deals in particular also with the challenges of either little user learning data, or large, yet, unlabeled user data to learn from. Overall, the methods introduced first include end-to-end learning from raw user data by a combination of convolutional layers and memory-enhanced recurrent layers. As such topologies often require fine-tuning in topology, the tutorial further discusses approaches towards automatic machine learning to provide self-optimising deep neural network approaches. Further, generative adversarial topologies will be introduced for data synthesis within the learning process, including conditional such and methods to prevent the often-occurring model collapse of synthesizing too self-similar data. In addition, transfer learning methods on feature and model level across modalities and sensor data will be discussed. For the time-aligned fusion of heterogeneous data-streams, deep canonical time-warping is demonstrated. Application examples will largely stem from the domains of Affective Computing and Intelligent Interaction including several challenging real-world use-cases. Suited open-source and free-for-research software packages will be introduced along the tutorial together with free databases for testing and experiencing according methods.

Bio: Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor all in EE/IT from TUM in Munich/Germany. He is the Head of GLAM - the Group on Language Audio & Music - at Imperial College London/UK, Full Professor and ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING, permanent Visiting Professor at HIT/China, and an Associate of the University of Geneva/Switzerland. Before, he was Full Professor at the University of Passau/Germany, and with Joanneum Research in Graz/Austria and the CNRS-LIMSI in Orsay/France. He is a Fellow of the IEEE, President-Emeritus of the AAAC, and Senior Member of the ACM. He (co-)authored 700+ publications (18000+ citations, h-index=66), and is the Editor in Chief of the IEEE Transactions on Affective Computing, General Chair of ACM ICMI 2014, ACII 2019, ACII Asia 2018, and a Program Chair of ACM ICMI 2019/2013, Interspeech 2019, ACII 2015/2011, and IEEE SocialCom 2012. He was honoured as Fellow of the IEEE, one of 40 extraordinary scientists under the age of 40 by the WEF, and Senior Member of the ACM. He has given more than a dozen tutorials at prime venues such as ACII, ACM Multimedia, EMBC, ICASSP, IJCAI, Interspeech, or IUI. Björn served as Coordinator/PI in 10+ European Projects, is an ERC Starting Grantee, and consultant of companies such as GN, Huawei or Samsung.


ICMI 2018 ACM International Conference on Multimodal Interaction. Copyright © 2018-2023