24th ACM International Conference on Multimodal Interaction
(7-11 Nov 2022)



Call for Bids 2025






Grand Challenge

Presentation Instruction

Doctoral Consortium

Airmeet Instructions

Arrival and transit


Local Info

Trip to Mysuru

Late Breaking Results

Call for Sponsors

Call for Papers

Guidelines for Authors

Call for Demonstrations
and Exhibits

Call for Doctoral Consortium

Important Dates


Preconference Workshop

Call for Workshops

Guidelines for Reviewers

Camera-Ready Instructions

Conference Venue

Visa Process

About Bangalore

Platinum Sponsor


Gold Sponsor


Silver Sponsor


Bronze Sponsor



7th November 2022, 2 PM IST


Open-Task Multimodal Conversational Assistants


3 hours


Recently, conversational systems have seen a significant rise in popularity due to commercial applications such as Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, and Google Assistant. However, there is untapped potential in the study of multimodal chatbots, which let users and dialogue agents converse using both natural language and visual information.

Due to this increased demand, multimodal agents are becoming ubiquitous as various companies push for this technology. This increased use poses many challenges in achieving more natural, human-like, and engaging interactions. Several of these research avenues are currently very active in the community: How to combine visual and text data? How to interpret the user intent? How to engage in a conversation about multimodal content?

We will start the tutorial with an introduction to the concept of Conversational Task Assistants, agents that can assist users in completing tasks. The second part of the tutorial will focus on the introduction of multimodality to conversational systems, and we will address some of the challenges of assistant embodiment and user understanding. In the third part, we will discuss other components necessary to support multimodal conversations, including a dialogue policy, search components, and response generation methods. In the final part of the tutorial, we will present case studies of the presented methods, in particular, Nova Wiki Wizard, which is a conversational search platform, iFetch an online-fashion shopping assistant, and the Alexa Prize Taskbot Challenge award-winning TWIZ bot.

  • Part 1: Introduction (30 mins)
    • What is a Conversational Agent?
    • Task Assistants
    • Open-task Assistants
    • Dialogue Systems Concepts
  • Part 2: Multimodal Conversational Agents (1 hour)
    • Multimodal Conversations
    • Virtual Assistant Embodiment and Personality
    • Simple Dialog Managers / DST
  • Part 3: Conversational Agent Components (1 hour)
    • Dialog Policy
    • Answering User Needs
    • Response Generation
  • Part 4: Case studies (30 mins)
    • Case Study A – NOVA Wiki Wizard
    • Case Study B – iFetch: Online fashion shopping assistant
    • Case Study C – TWIZ: The Multimodal Task-Assistant


João Magalhães, Associate Professor, Universidade Nova de Lisboa (FCT NOVA), Portugal
Bio: João Magalhães is an Associate Professor at the Department of Computer Science, Universidade Nova de Lisboa (FCT NOVA). He holds a Ph.D. degree in Computer Science from Imperial College London, UK. His research interests cover different problems of Vision and Language Mining and Search, in particular: multimedia retrieval, social media information analysis, and machine learning. He is regularly involved in international program committees and EU ICT project review panels. He has coordinated and participated in several research projects (national, EU‐FP7, and H2020), including two Portugal-USA projects with Carnegie Mellon University and the University of Texas at Austin. He is the General co-Chair of ACM Multimedia 2022.


Rafael Ferreira, Ph.D Student, Universidade Nova de Lisboa (FCT NOVA), Portugal
Bio: Rafael Ferreira is a Researcher at NOVA Laboratory for Computer Science and Informatics. He holds an M.Sc. Degree in Computer Science from NOVA University and is currently pursuing a Ph.D. degree in the area of multimodal conversational systems. His interests include the development of conversational agents, natural language processing, and multimodal AI. He has experience in conversational search and task-guiding agents and was the team leader of the award-winning Alexa’s TWIZ TaskBot.

Tutorial Website: