Spoken Dialogue Processing for Multimodal Human-Robot Interaction

Tatsuya Kawahara

Prof. Tatsuya Kawahara

Kyoto University, Japan

Tutorial Date: 14th October, 2019

Abstract: Following the success of spoken dialogue systems (SDS) in smartphone assistants and smart speakers, a number of communicative robots are being developed and commercialized. Since robots have a face and a body, the interaction is essentially multimodal. This tutorial will focus on spoken dialogue with robots in the context of multimodal interaction. Compared with the conventional SDSs, people tend to talk to a robot in a closer manner to talking to a human (or a pet?) because of the anthropomorphism and physical presence. This poses fundamental changes in the design and methodology of dialogue and interaction, since the conventional SDSs are designed as a human-machine interface. For example, you don’t need a robot just to ask for weather information or news. And a robot should detect when you speak even without pressing a button or saying a magic word.

The tutorial first addresses the desirable tasks and interactions conducted by humanoid robots engaged in spoken dialogue. These obviously depend on the character design of the robots, and I will focus on long and deep interactions such as counseling and interview, which have a definite task but do not have observable goals. They will expand the potential of communicative robots. The second part of the tutorial will focus on the methodology and technical aspects of spoken dialogue processing including speech recognition and synthesis for human-robot interaction. A comprehensive review on spoken language understanding and dialogue management is provided. Then, non-verbal processing is also addressed. In particular, smooth turn-taking and real-time feedback including backchannels are critically important for keeping the user engaged in the dialogue, so the interaction will be duplex consisting of not only speaking but also attentive listening.

Bio:Tatsuya Kawahara received B.E. in 1987, M.E. in 1989, and Ph.D. in 1995, all in information science, from Kyoto University, Kyoto, Japan. From 1995 to 1996, he was a Visiting Researcher at Bell Laboratories, Murray Hill, NJ, USA. Currently, he is a Professor in the School of Informatics, Kyoto University. He has also been an Invited Researcher at ATR and NICT.

He has published more than 300 technical papers on speech recognition, spoken language processing, and spoken dialogue systems. He has been conducting several projects including speech recognition software Julius and the automatic transcription system for the Japanese Parliament (Diet). Dr. Kawahara received the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (MEXT) in 2012.

From 2003 to 2006, he was a member of IEEE SPS Speech Technical Committee. He was a General Chair of IEEE Automatic Speech Recognition and Understanding workshop (ASRU 2007). He also served as a Tutorial Chair of INTERSPEECH 2010 and a Local Arrangement Chair of ICASSP 2012. He has been an editorial board member of Elsevier Journal of Computer Speech and Language and IEEE/ACM Transactions on Audio, Speech, and Language Processing. He is an editor in chief of APSIPA Transactions on Signal and Information Processing. Dr. Kawahara is a board member of APSIPA and ISCA, and a Fellow of IEEE.

Getting Virtually Personal: Power Conversational AI to Fulfill Tasks and Personalize Chitchat for Real-World Applications

Michelle Zhou

Dr. Michelle Zhou

Co-Founder and CEO of Juji, Inc., USA

Tutorial Date: 14th October, 2019

Abstract: In the past few years, we have witnessed the rapid, real-world adoption of conversational AI. For example, 47 millions of U.S. adults now own a smart voice device, like Amazon Alexa or Google Home, while there are over 300,000 chatbots deployed on Facebook Messenger. Despite the recent advances in machine learning, it is still non-trivial to power conversational AI that can effectively fulfill tasks and chitchat for real-world applications.

This tutorial will first review the state-of-art approaches to conversational AI and challenges for building conversational AI for real-world applications. It will then introduce a model-based framework that combines symbolic approaches and deep learning to support conversational AI for fulfilling tasks and personalizing chitchat. The tutorial will also teach conversational AI design patterns and discuss auto-evaluation of conversational AI. This will be a fun, hands-on tutorial during which participants will design, build, test, and deploy their own conversational AI to achieve certain tasks and chitchat. This tutorial is especially suitable for those who are interested in building conversational AI for real-world applications or exploring complex applications of conversational AI (e.g., AI companions, AI caretakers, and AI interviewers).

Bio:Dr. Michelle Zhou is a Co-Founder and CEO of Juji, Inc., an Artificial Intelligence (AI) startup located in Silicon Valley, specializing in building responsible and empathetic AI agents that can deeply understand users and guide their behavior based on their psychological characteristics. Prior to starting Juji, Michelle led the User Systems and Experience Research (USER) group at IBM Research – Almaden and then the IBM Watson Group. Michelle’s expertise is in the interdisciplinary area of intelligent user interaction (IUI), including conversational AI and personality analytics. She has published over 100 peer-reviewed, refereed articles and filed over 40 patents. Michelle is currently the Editor-in-Chief of ACM Transactions on Interactive Intelligent Systems (TiiS) and an Associate Editor of ACM Transactions on Intelligent Systems and Technology (TIST). She received a Ph.D. in Computer Science from Columbia University and is an ACM Distinguished Scientist. https://www.linkedin.com/in/mxzhou/

ICMI 2019 ACM International Conference on Multimodal Interaction. Copyright © 2018-2019