Keynote Speakers

Charles Spence

Prof. Charles Spence

Head of the Crossmodal Research Laboratory, The University of Oxford, UK

Title: Gastrophysics: Using technology to enhance the experience of food and drink

Abstract: Currently, technology mostly distracts us from what we are eating and drinking. As hand-held devices continues to evolve, food and drink will increasingly have to fight with our smart phones, rather than the TV, for our attention. Some chefs have already responded to the challenge by trying to ban technology at the dinner table. However, I am optimistic that, in the years to come, technology will rather become integral to our food and drink experiences: Everything from using your tablet as 21st century plateware (now that they are dishwasher safe), through to using hand-held technologies to provide a dash of sonic (digital) seasoning – that is, providing the right sonic backdrop (be it music or soundscapes) matched to bring out the best in whatever we happen to be eating or drinking. In this talk, I will highlight the ways in which technology will, and will not, change our experience of food and drink in the years to come. I will give examples from modernist chefs and molecular mixologists that are already starting to transform our mainstream experience – be it in the air or in the home environment.

Bio: Professor Charles Spence is a world-famous experimental psychologist with a specialization in neuroscience-inspired multisensory design. He has worked with many of the world’s largest companies across the globe since establishing the Crossmodal Research Laboratory (CRL) at the Department of Experimental Psychology, Oxford University in 1997. Prof. Spence has published over 750 articles and edited or authored, 10 academic volumes including, in 2014, the prize-winning “The perfect meal”, and “Gastrophysics: The new science of eating” (2017). Much of Prof. Spence’s work focuses on the design of enhanced multisensory food and drink experiences, through collaborations with chefs, baristas, mixologists, perfumiers, and the food and beverage, and flavour and fragrance industries. Prof. Spence has also worked extensively on the question of how technology will transform our dining experiences in the future.

Lawrence Barsalou

Prof. Lawrence Barsalou

The University of Glasgow, UK

Title: Situated Conceptualization: A Framework for Multimodal Interaction

Abstract: One way of construing brain organization is as a collection of neural systems that processes the components of a situation in parallel, including its setting, agents, objects, self-relevance, internal states, actions, outcomes, etc. In a given situation, each situational component is conceptualized individually, as when components of eating in a kitchen are conceptualized as kitchen (setting), diner (agent), food (food), hunger (internal state), and chewing (action). In turn, global concepts integrate these individual conceptualizations into larger structures that conceptualize the situation as a whole, such as eating and meal. From this perspective, a situated conceptualization is a distributed record of conceptual processing in a given situation, across all the relevant component systems each distributed throughout the brain. On later occasions, when cued by something in the external or internal environment, a situated conceptualization becomes active to simulate the respective situation in its absence, producing multimodal pattern-completion inferences that guide situated action (e.g., activating a situated conceptualization to simulate and control eating). From this perspective, the concept that represents a category, such as kitchen or eating, is the collection of situated conceptualizations that has accumulated from processing the category across situations, similar to exemplar theories. The utility of situated conceptualization as a general theoretical construct is illustrated for situated action, social priming, social mirroring, emotion, and appetitive behaviors, as well as for habits and individual differences.

Bio: Lawrence Barsalou received a Bachelors degree in Psychology from the University of California, San Diego in 1977, and a Ph.D. in Psychology from Stanford University in 1981. Before coming to the University of Glasgow, he held faculty positions at Emory University, the Georgia Institute of Technology, and the University of Chicago. Barsalou's research addresses the nature of the human conceptual system from the perspective of grounded cognition, using methods from cognitive science and neuroscience.

Danica Kragic

Prof. Danica Kragic

Royal Institute of Technology (KTH), Centre for Autonomous Systems, Stockholm, Sweden

Title: Collaborative robots: from action and interaction to collaboration

Abstract: The integral ability of any robot is to act in the environment, interact and collaborate with people and other robots. Interaction between two agents builds on the ability to engage in mutual prediction and signaling. Thus, human-robot interaction requires a system that can interpret and make use of human signaling strategies in a social context. Our work in this area focuses on developing a framework for human motion prediction in the context of joint action in HRI. We base this framework on the idea that social interaction is highly influences by sensorimotor contingencies (SMCs). Instead of constructing explicit cognitive models, we rely on the interaction between actions the perceptual change that they induce in both the human and the robot. This approach allows us to employ a single model for motion prediction and goal inference and to seamlessly integrate the human actions into the environment and task context.

The current trend in computer vision is development of data-driven approaches where the use of large amounts of data tries to compensate for the complexity of the world captured by cameras. Are these approaches also viable solutions in robotics? Apart from 'seeing', a robot is capable of acting, thus purposively change what and how it sees the world around it. There is a need for an interplay between processes such as attention, segmentation, object detection, recognition and categorization in order to interact with the environment. In addition, the parameterization of these is inevitably guided by the task or the goal a robot is supposed to achieve. In this talk, I will present the current state of the art in the area of robot vision and discuss open problems in the area. I will also show how visual input can be integrated with proprioception, tactile and force-torque feedback in order to plan, guide and assess robot's action and interaction with the environment.

We employ a deep generative model that makes inferences over future human motion trajectories given the intention of the human and the history as well as the task setting of the interaction. With help predictions drawn from the model, we can determine the most likely future motion trajectory and make inferences over intentions and objects of interest.

Bio: Danica Kragic is a Professor at the School of Computer Science and Communication at the Royal Institute of Technology, KTH. She received MSc in Mechanical Engineering from the Technical University of Rijeka, Croatia in 1995 and PhD in Computer Science from KTH in 2001. She has been a visiting researcher at Columbia University, Johns Hopkins University and INRIA Rennes. She is the Director of the Centre for Autonomous Systems. Danica received the 2007 IEEE Robotics and Automation Society Early Academic Career Award. She is a member of the Royal Swedish Academy of Sciences, Royal Swedish Academy of Engineering Sciences and Young Academy of Sweden. She holds a Honorary Doctorate from the Lappeenranta University of Technology. She chaired IEEE RAS Technical Committee on Computer and Robot Vision and served as an IEEE RAS AdCom member. Her research is in the area of robotics, computer vision and machine learning. In 2012, she received an ERC Starting Grant. Her research is supported by the EU, Knut and Alice Wallenberg Foundation, Swedish Foundation for Strategic Research and Swedish Research Council. She is an IEEE Fellow.

Phil Cohen

Dr. Phil Cohen

Chief Scientist for Artificial Intelligence and Senior Vice President for Advanced Technology at Voicebox Technologies

Title: Steps Towards Collaborative Multimodal Dialogue

Abstract: This talk will discuss progress in building collaborative multimodal systems, both systems that offer a collaborative interface that augments human performance, and autonomous systems with which one can collaborate. To begin, I discuss what we will mean by collaboration, which revolves around plan recognition skills learned as a child. Then, I present a collaborative multimodal operations planning system, Sketch-Thru-Plan, that enables users to interact multimodally with speech and pen as it attempts to infer their plans. The system offers suggested actions and allows the user to confirm/disconfirm those suggestions. I show how the collaborative multimodal interface enables more rapid task performance and higher user satisfaction than existing deployed GUIs built for the same task.

In the second part of the talk, I discuss the differences for system design between building such a collaborative multimodal interface and building an autonomous agent with which one can collaborate through multimodal dialogue. I argue that interacting with an autonomous agent (e.g., a robot or virtual assistant) may require a more declarative approach to supporting collaborative communication. People’s deeply engrained collaboration strategies will be seen to be at the foundation of dialogue and are expected by human interlocutors. The approach I will advocate to implementing such a strategy is to build a belief-desire-intention (BDI) architecture that attempts to recognize the collaborator’s plans, and determine obstacles to their success. The system then plans and executes a response to overcome those obstacles, which results in the system’s planning appropriate actions (including speech acts). I will illustrate and demonstrate a system that embodies this type of collaboration, engaging users in dialogue about travel planning. Finally, I will compare this approach with current academic and research approaches to dialogue.

Bio: Dr. Philip Cohen has long been engaged in the AI subfields of human-computer dialogue, multimodal interaction, and multiagent systems. He is a Fellow of the Association for the Advancement of Artificial Intelligence, and a past President of the Association for Computational Linguistics. Currently Chief Scientist, AI and Sr. Vice President for Advanced Technology at VoiceBox Technologies, he has also held positions at Adapx Inc (founder), the Oregon Graduate Institute, Artificial Intelligence Center of SRI International, Fairchild Laboratory for Artificial Intelligence, and Bolt Bernanek and Newman. His accomplishments include co-developing influential theories of intention, collaboration, and speech acts, co-developing and deploying multimodal systems, and conceiving and leading the project at SRI International that developed the Open Agent Architecture, which eventually became Siri. Cohen has published more than 150 refereed papers, with more than 16,000 citations, and received 7 patents. His paper with Prof. Hector Levesque “Intention is Choice with Commitment” was awarded the inaugural Influential Paper Award from the International Foundation for Autonomous Agents and Multi-Agent Systems. Cohen currently leads a team engaged in semantic parsing, and human-computer dialogue.


ICMI 2017 ACM International Conference on Multimodal Interaction. Copyright © 2017-2025