|
|
Proceedings_comp
ICMI '20 Companion: Companion Publication of the 2020 International Conference on Multimodal Interaction
SESSION: ICMI 2020 Late Breaking Results
- Sahar Mahdie Klim Al Zaidawi
- Martin H.U. Prinzler
- Christoph Schröder
- Gabriel Zachmann
- Sebastian Maneth
We present a new study of gender prediction using eye movements of prepubescent children
aged 9--10. Despite previous research indicating that gender differences in eye movements
are observed only in adults, we are able to predict gender with accuracies of up to
64%. Our method segments gaze point trajectories into saccades and fixations. It then
computes a small number of features and classifies saccades and fixations separately
using statistical methods. The used dataset contains non-dyslexic and dyslexic children.
In mixed groups, the accuracy of our classifiers drops dramatically. To address this
challenge, we construct a hierarchical classifier that makes use of dyslexia prediction
to improve significantly the accuracy of gender prediction in mixed groups.
- Mengjiong Bai
- Roland Goecke
This study investigates the utility of Long Short-Term Memory (LSTM) networks for
modelling spatial-temporal patterns for micro-expression recognition (MER). Micro-expressions
are involuntary, short facial expressions, often of low intensity. RNNs have attracted
a lot of attention in recent years for modelling temporal sequences. The RNN-LSTM
combination to be highly effective results in many application areas. The proposed
method combines the recent VGGFace2 model, basically a ResNet-50 CNN trained on the
VGGFace2 dataset, with uni-directional and bi-directional LSTM to explore different
ways modelling spatial-temporal facial patterns for MER. The Grad-CAM heat map visualisation
is used in the training stages to determine the most appropriate layer of the VGGFace2
model for retraining. Experiments are conducted with pure VGGFace2, VGGFace2 + uni-directional
LSTM, and VGGFace2 + Bi-directional LSTM on the SMIC database using 5-fold cross-validation.
- George Boateng
- Tobias Kowatsch
Recognizing the emotions of the elderly is important as it could give an insight into
their mental health. Emotion recognition systems that work well on the elderly could
be used to assess their emotions in places such as nursing homes and could inform
the development of various activities and interventions to improve their mental health.
However, several emotion recognition systems are developed using data from younger
adults. In this work, we train machine learning models to recognize the emotions of
elderly individuals via performing a 3-class classification of valence and arousal
as part of the INTERSPEECH 2020 Computational Paralinguistics Challenge (COMPARE).
We used speech data from 87 participants who gave spontaneous personal narratives.
We leveraged a transfer learning approach in which we used pretrained CNN and BERT
models to extract acoustic and linguistic features respectively and fed them into
separate machine learning models. Also, we fused these two modalities in a multimodal
approach. Our best model used a linguistic approach and outperformed the official
competition of unweighted average recall (UAR) baseline for valence by 8.8% and the
mean of valence and arousal by 3.2%. We also showed that feature engineering is not
necessary as transfer learning without fine-tuning performs as well or better and
could be leveraged for the task of recognizing the emotions of elderly individuals.
This work is a step towards better recognition of the emotions of the elderly which
could eventually inform the development of interventions to manage their mental health.
- George Boateng
- Laura Sels
- Peter Kuppens
- Peter Hilpert
- Tobias Kowatsch
Extensive couples? literature shows that how couples feel after a conflict is predicted
by certain emotional aspects of that conversation. Understanding the emotions of couples
leads to a better understanding of partners? mental well-being and consequently their
relationships. Hence, automatic emotion recognition among couples could potentially
guide interventions to help couples improve their emotional well-being and their relationships.
It has been shown that people's global emotional judgment after an experience is strongly
influenced by the emotional extremes and ending of that experience, known as the peak-end
rule. In this work, we leveraged this theory and used machine learning to investigate,
which audio segments can be used to best predict the end-of-conversation emotions
of couples. We used speech data collected from 101 Dutch-speaking couples in Belgium
who engaged in 10-minute long conversations in the lab. We extracted acoustic features
from (1) the audio segments with the most extreme positive and negative ratings, and
(2) the ending of the audio. We used transfer learning in which we extracted these
acoustic features with a pre-trained convolutional neural network (YAMNet). We then
used these features to train machine learning models - support vector machines - to
predict the end-of-conversation valence ratings (positive vs negative) of each partner.
The results of this work could inform how to best recognize the emotions of couples
after conversation-sessions and eventually, lead to a better understanding of couples?
relationships either in therapy or in everyday life.
- Alysha Bogaers
- Zerrin Yumak
- Anja Volk
While audio-driven face and gesture motion synthesis has been studied before, to our
knowledge no research has been done yet for automatic generation of musical gestures
for virtual humans. Existing work either focuses on precise 3D finger movement generation
required to play an instrument or expressive musical gestures based on 2D video data.
In this paper, we propose a music-driven piano performance generation method using
3D motion capture data and recurrent neural networks. Our results show that it is
feasible to automatically generate expressive musical gestures for piano playing using
various audio and musical features. However, it is not yet clear which features work
best for which type of music. Our future work aims to further test with other datasets,
deep learning methods and musical instruments using both objective and subjective
evaluations.
- Harshit Chauhan
- Anmol Prasad
- Jainendra Shukla
In this paper, we focus on finding the correlation between visual attention and engagement
of ADHD students in one-on-one sessions with specialized educators using visual cues
and eye-tracking data. Our goal is to investigate the extent to which observations
of eye-gaze, posture, emotion and other physiological signals can be used to model
the cognitive state of subjects and to explore the integration of multiple sensor
modalities to improve the reliability of detection of human displays of awareness
and emotion in the context of ADHD affected children. This is a novel problem since
no previous studies have aimed to identify markers of attentiveness in the context
of students affected with ADHD. The experiment has been designed to collect data in
a controlled environment and later on can be used to generate Machine Learning models
to assist real-world educators. Additionally, we propose a novel approach for AOI
(Area of Interest) detection for eye-tracking analysis in dynamic scenarios using
existing deep learning-based saliency prediction and fixation prediction models. We
aim to use the processed data to extract the features from a subject's eye-movement
patterns and use Machine Learning models to classify the attention levels.
- Roberto Daza
- Aythami Morales
- Julian Fierrez
- Ruben Tolosana
This work presents mEBAL, a multimodal database for eye blink detection and attention
level estimation. The eye blink frequency is related to the cognitive activity and
automatic detectors of eye blinks have been proposed for many tasks including attention
level estimation, analysis of neuro-degenerative diseases, deception recognition,
drive fatigue detection, or face anti-spoofing. However, most existing databases and
algorithms in this area are limited to experiments involving only a few hundred samples
and individual sensors like face cameras. The proposed mEBAL improves previous databases
in terms of acquisition sensors and samples. In particular, three different sensors
are simultaneously considered: Near Infrared (NIR) and RGB cameras to capture the
face gestures and an Electroencephalography (EEG) band to capture the cognitive activity
of the user and blinking events. Regarding the size of mEBAL, it comprises 6,000 samples
and the corresponding attention level from 38 different students while conducting
a number of e-learning tasks of varying difficulty. In addition to presenting mEBAL,
we also include preliminary experiments on: i) eye blink detection using Convolutional
Neural Networks (CNN) with the facial images, and ii) attention level estimation of
the students based on their eye blink frequency.
- Karen Fucinato
- Elena Lavinia Leustean
- Lilla Fekecs
- Tünde Tárnoková
- Rosalyn M. Langedijk
- Kerstin Fischer
Social robots are increasingly entering our households, being able to interact with
humans in various ways. One functionality of social robots may be to connect to a
user's mobile phone and to read text messages out loud. Such a technology and communication
platform should therefore be able to support emojis properly. We therefore address
emoji usage in computer-mediated communication in order to develop appropriate emoji
conveyance in social robot behavior. Our research explores how participants feel about
the behavior of a tabletop robot prototype named Nina that reads text messages to
user and to what extent different renderings correspond to user expectations and preferences
of how text messages and emoji combinations should be delivered. Based on online animated
videos and questionnaires, respondents evaluated the behavior of Nina based on different
renderings of text messages with emojis in them. The experiment results and data analysis
show that respondents liked the social robot to display emojis with or without sound
effect and to "act out" emojis in text messages almost equally well, but rated it
less useful, less fun and more confusing to replace the emojis by words.
- Amogh Gulati
- Brihi Joshi
- Chirag Jain
- Jainendra Shukla
Music is an efficient medium to elicit and convey emotions. The comparison between
perceived and induced emotions from western music has been widely studied. However,
this relationship has not been studied from the perspective of Hindustani classical
music. In this work, we explore the relationship between perceived and induced emotions
with Hindustani classical music as our stimuli. We observe that there is little to
no correlation between them, however, audio features help in distinguishing the increase
or decrease in induced emotion quality. We also introduce a novel dataset which contains
induced valence and arousal annotations for 18 Hindustani classical music songs. Furthermore,
we propose a latent space representation based approach, that leads to a relative
increase in F1 Score of 32.2% for arousal and 34.5% for valence classification, as
compared to feature-based approaches for Hindustani classical music.
- Seiichi Harata
- Takuto Sakuma
- Shohei Kato
To emulate human emotions in agents, the mathematical representation of emotion (an
emotional space) is essential for each component, such as emotion recognition, generation,
and expression. In this study, we aim to acquire a modality-independent emotional
space by extracting shared emotional information from different modalities. We propose
a method of acquiring an emotional space by integrating multimodalities on a DNN and
combining the emotion recognition task and the unification task. The emotion recognition
task learns the representation of emotions, and the unification task learns an identical
emotional space from each modality. Through the experiments with audio-visual data,
we confirmed that there are differences in emotional spaces acquired from unimodality,
and the proposed method can acquire a joint emotional space. We also indicated that
the proposed method could adequately represent emotions in a low-dimensional emotional
space, such as in five or six dimensions, under this paper's experimental conditions.
- Youssef Hmamouche
- Magalie Ochs
- Laurent Prévot
- Thierry Chaminade
To what extent do human-robot interactions (HRI) rely on social processes similar
to human-human interactions (HHI)? To address this question objectively, we use a
unique corpus. Brain activity and behaviors were recorded synchronously while participants
were discussing with a human (confederate of the experimenter) or a robotic device
(controlled by the confederate). Here, we focus on two main regions of interest (ROIs),
that form the core of "the social brain", the right temporoparietal junction [rTPJ]
and right medial prefrontal cortex [rMPFC]. An new analysis approach derived from
multivariate time-series forecasting is used. A prediction score describes the ability
to predict brain activity for each ROI, and results identify which behavioral features,
built from raw recordings of the conversations, are used for this prediction. Results
identify some differences between HHI and HRI in the behavioral features predicting
activity in these ROIs of the social brain, that could explain significant differences
in the level of activity
- Daisuke Kamisaka
- Yuichi Ishikawa
Personality is an essential human attribute, and plays an important role for personalization
of face-to-face services that improves sales and customer satisfaction in a variety
of different business domains. Most studies addressing personality prediction to date
built a prediction model with user data gathered via online services (e.g., SNS and
mobile phone services). On the other hand, predicting the personality traits of customers
without an online service log (e.g., first-time visitors and customers who only use
a physical store) is challenging. In this paper, we present a multimodal approach
to predict the personality of customers about whom only visual data collected using
an in-store surveillance cameras are available. Our approach extracts simple gait
features and projects them into pseudo online service log features to gain predictive
power. Through an evaluation using the data collected by real mobile retailers, our
approach improved prediction accuracy compared with a method that directly predicts
personality from the visual data.
- Divesh Lala
- Koji Inoue
- Tatsuya Kawahara
Shared laughter is a phenomenon in face-to-face human dialogue which increases engagement
and rapport, and so should be considered for conversation robots and agents. Our aim
is to create a model of shared laughter generation for conversational robots. As part
of this system, we train models which predict if shared laughter will occur, given
that the user has laughed. Models trained using combinations of acoustic, prosodic
features and laughter type were compared with online versions considered to better
quantify their performance in a real system. We find that these models perform better
than the random chance, with the multimodal combination of acoustic and prosodic features
performing the best.
- Casper Laugs
- Hendrik Vincent Koops
- Daan Odijk
- Heysem Kaya
- Anja Volk
While both speech emotion recognition and music emotion recognition have been studied
extensively in different communities, little research went into the recognition of
emotion from mixed audio sources, i.e. when both speech and music are present. However,
many application scenarios require models that are able to extract emotions from mixed
audio sources, such as television content. This paper studies how mixed audio affects
both speech and music emotion recognition using a random forest and deep neural network
model, and investigates if blind source separation of the mixed signal beforehand
is beneficial. We created a mixed audio dataset, with 25% speech-music overlap without
contextual relationship between the two. We show that specialized models for speech-only
or music-only audio were able to achieve merely 'chance-level' performance on mixed
audio. For speech, above chance-level performance was achieved when trained on raw
mixed audio, but optimal performance was achieved with audio blind source separated
beforehand. Music emotion recognition models on mixed audio achieve performance approaching
or even surpassing performance on music-only audio, with and without blind source
separation. Our results are important for estimating emotion from real-world data,
where individual speech and music tracks are often not available.
- Heera Lee
- Varun Mandalapu
- Jiaqi Gong
- Andrea Kleinsmith
- Ravi Kuber
Deaf and hard of hearing English language learners encounter a range of challenges
when learning spoken/written English, many of which are not faced by their hearing
counterparts. In this paper, we examine the feasibility of utilizing physiological
data, including arousal and eye gaze behaviors, as a method of identifying instances
of anxiety and frustration experienced when delivering presentations. Initial findings
demonstrate the potential of using this approach, which in turn could aid English
language instructors who could either provide emotional support or personalized instructions
to assist deaf and hard of hearing English language learners in the classroom.
- Yi Liu
- Shuang Yang
- Hongying Meng
- Mohammad Rafiq Swash
- Shiguang Shan
Recently, video-based micro-gesture recognition with the data captured by holoscopic
3D (H3D) sensors is getting more and more attention, mainly because of their particular
advantages to use a single aperture camera to embed the 3D information in 2D images.
However, it is not easy to use the embedded 3D information in an efficient manner
due to the special imaging principles of H3D sensors. In this paper, an efficient
Pseudo View Points (PVP) based method is proposed to introduce the embedded 3D information
in H3D images into a new micro-gesture recognition framework. Specifically, we obtain
several pseudo view points based frames by composing all the pixels at the same position
in each elemental image(EI) in the original H3D frames. This is a very efficient and
robust step, and could mimic the real view points so as to represent the 3D information
in the frames. Then, a new recognition framework based on 3D DenseNet and Bi-GRU networks
is proposed to learn the dynamic patterns of different micro-gestures based on the
representation of the pseudo view points. Finally, we perform a thorough comparison
of the related benchmark, which demonstrates the effectiveness of our method and also
reports a new state of the art performance.
- Vasundhara Misal
- Surely Akiri
- Sanaz Taherzadeh
- Hannah McGowan
- Gary Williams
- J. Lee Jenkins
- Helena Mentis
- Andrea Kleinsmith
Paramedics play a critical role in society and face many high stress situations in
their day-to-day work. Long-term unmanaged stress can result in mental health issues
such as depression, anxiety, and post-traumatic stress disorder. Physiological synchrony
- the unconscious, dynamic linking of physiological responses such as electrodermal
activity (EDA) - have been linked to stress and team coordination. In this preliminary
analysis, we examined the relationship between EDA synchrony, perceived stress and
communication between paramedic trainee pairs during in-situ simulation training.
Our initial results indicated a correlation between high physiological synchrony and
social coordination and group processes. Moreover, communication between paramedic
dyads was inversely related to physiological synchrony, i.e., communication increased
during low synchrony segments of the interaction and decreased during high synchrony
segments.
- Gerard Pons
- Abdallah El Ali
- Pablo Cesar
Facial thermal imaging has in recent years shown to be an efficient modality for facial
emotion recognition. However, the use of deep learning in this field is still not
fully exploited given the small number and size of the current datasets. The goal
of this work is to improve the performance of the existing deep networks in thermal
facial emotion recognition by generating new synthesized thermal images from images
in the visual spectrum (RGB). To address this challenging problem, we propose an emotion-guided
thermal CycleGAN (ET-CycleGAN). This Generative Adversarial Network (GAN) regularizes
the training with facial and emotion priors by extracting features from Convolutional
Neural Networks (CNNs) trained for face recognition and facial emotion recognition,
respectively. To assess this approach, we generated synthesized images from the training
set of the USTC-NVIE dataset, and included the new data to the training set as a data
augmentation strategy. By including images generated using the ET-CycleGAN, the accuracy
for emotion recognition increased by 10.9%. Our initial findings highlight the importance
of adding priors related to training set image attributes (in our case face and emotion
priors), to ensure such attributes are maintained in the generated images.
- Eduardo B. Sandoval
- Binghao Li
- Abdoulaye Diakite
- Kai Zhao
- Nicholas Oliver
- Tomasz Bednarz
- Sisi Zlatanova
We developed a 3D-Enhanced Facility Management System for Indoors Navigation (3D-EFMS-IN)
to assist visually impaired users (VIU). Additionally, the system aims to facilitate
the management of estate property and provide support for future scenarios related
to emergencies, security, and robotics devices. The system combines four main subsystems:
Mapping, Navigation Paths, Indoor Localisation and Navigation, and a Visualisation.
An Integration of the subsystems has been done and a pretest with one VIU was performed
to obtain feedback and tune the critical characteristics of our development. We observed
that the system offers an acceptable preliminary user experience for VIU and future
tests require to improve the latency of the system and usability. Shortly, we aim
to obtain qualitative and quantitative measurements in a significant pool of users
once the COVID lockdown ends.
- Maia Stiber
- Chien-Ming Huang
Robot errors occurring during situated interactions with humans are inevitable and
elicit social responses. While prior research has suggested how social signals may
indicate errors produced by anthropomorphic robots, most have not explored Programming
by Demonstration (PbD) scenarios or non-humanoid robots. Additionally, how human social
signals may help characterize error severity, which is important to determine appropriate
strategies for error mitigation, has been subjected to limited exploration. We report
an exploratory study that investigates how people may react to technical errors with
varying severity produced by a non-humanoid robotic arm in a PbD scenario. Our results
indicate that more severe robot errors may prompt faster, more intense human responses
and that multimodal responses tend to escalate as the error unfolds. This provides
initial evidence suggesting temporal modeling of multimodal social signals may enable
early detection and classification of robot errors, thereby minimizing unwanted consequences.
- Sandrine Tornay
- Necati Cihan Camgoz
- Richard Bowden
- Mathew Magimai Doss
Interactive learning platforms are in the top choices to acquire new languages. Such
applications or platforms are more easily available for spoken languages, but rarely
for sign languages. Assessment of the production of signs is a challenging problem
because of the multichannel aspect (e.g., hand shape, hand movement, mouthing, facial
expression) inherent in sign languages. In this paper, we propose an automatic sign
language production assessment approach which allows assessment of two linguistic
aspects: (i) the produced lexeme and (ii) the produced forms. On a linguistically
annotated Swiss German Sign Language dataset, SMILE DSGS corpus, we demonstrate that
the proposed approach can effectively assess the two linguistic aspects in an integrated
manner.
- Pim Verhagen
- Irene Kuling
- Kaj Gijsbertse
- Ivo V. Stuldreher
- Krista Overvliet
- Sara Falcone
- Jan Van Erp
- Anne-Marie Brouwer
Remote control of robots generally requires a high level of expertise and may impose
a considerable cognitive burden on operators. A sense of embodiment over a remote-controlled
robot might enhance operators? task performance and reduce cognitive workload. We
want to study the extent to which different factors affect embodiment. As a first
step, we aimed to validate the cross-modal congruency effect (CCE) as a potential
objective measure of embodiment under four conditions with different, a priori expected
levels of embodiment, and by comparing CCE scores with subjective reports. The conditions
were (1) a real hand condition (real condition), (2) a real hand seen through a telepresence
unit (mediated condition), (3) a robotic hand seen through a telepresence unit (robot
condition), and (4) a human-looking virtual hand seen through VR glasses (VR condition).
We found no unambiguous evidence that the magnitude of the CCE was affected by the
degree of visual realism in each of the four conditions. We neither found evidence
to support the hypothesis that the CCE and embodiment score as assessed by the subjective
reports are correlated. These findings raise serious concerns about the use of the
CCE as an objective measure of embodiment.
SESSION: DVU'20 Workshop
- Valeriya Karpova
- Polina Popenova
- Nadezda Glebko
- Vladimir Lyashenko
- Olga Perepelkina
Automatic deception detection is a challenging issue since human behaviors are too
complex to establish any standard behavioral signs that would explicitly indicate
that a person is lying. Furthermore, it is difficult to collect naturalistic datasets
for supervised learning as both external and self-annotation may be unreliable for
deception annotation. For these purposes, we collected the TRuLie dataset that consists
of synchronously recorded videos (34 hours in total) and data received from contact
photoplethysmography (PPG) and hardware eye-tracker of ninety three subjects who tried
to feign innocence during interrogation after they committed mock crimes. Thus, we
had multimodal fragments with lie (n=3380) and truth (n=6444). We trained an end-to-end
convolutional neural network (CNN) on this dataset to predict lie and truth from audio
and video, and also built classifiers on combined features extracted from video, audio,
PPG, eye-tracker, and predictions from CNN. The best classifier (LightGBM) showed
a mean balanced accuracy of 0.64 and an F1-score of 0.76 on a 5-fold cross-validation.
- Yang Lu
- Asri Rizki Yuliani
- Keisuke Ishikawa
- Ronaldo Prata Amorim
- Roland Hartanto
- Nakamasa Inoue
- Kuniaki Uto
- Koichi Shinoda
Humans can easily understand storylines and character relationships in movies. However,
the automatic relationship analysis from videos is challenging. In this paper, we
introduce a deep video understanding system to infer relationships between movie characters
from multimodal features. The proposed system first extracts visual and text features
from full-length movies. With these multimodal features, we then utilize graph-based
relationship reasoning models to infer the characters' relationships. We evaluate
our proposed system on the High-Level Video Understanding (HLVU) dataset. We achieve
53% accuracy on question answering tests.
- Shravan Nayak
- Timo Baumann
- Supratik Bhattacharya
- Alina Karakanta
- Matteo Negri
- Marco Turchi
Dubbing is the art of finding a translation from a source into a target language that
can be lip-synchronously revoiced, i. e., that makes the target language speech appear
as if it was spoken by the very actors all along. Lip synchrony is essential for the
full-fledged reception of foreign audiovisual media, such as movies and series, as
violated constraints of synchrony between video (lips) and audio (speech) lead to
cognitive dissonance and reduce the perceptual quality. Of course, synchrony constraints
only apply to the translation when the speaker's lips are visible on screen. Therefore,
deciding whether to apply synchrony constraints requires an automatic method for detecting
whether an actor's lips are visible on screen for a given stretch of speech or not.
In this paper, we attempt, for the first time, to classify on- from off-screen speech
based on a corpus of real-world television material that has been annotated word-by-word
for the visibility of talking lips on screen. We present classification experiments
in which we classify
- Raksha Ramesh
- Vishal Anand
- Ziyin Wang
- Tianle Zhu
- Wenfeng Lyu
- Serena Yuan
- Ching-Yung Lin
We create multi-modal fusion models to predict relational classes within entities
in free-form inputs such as unseen movies. Our approach identifies information rich
features within individual sources -- emotion, text-attention, age, gender, and contextual
background object tracking. These information are absorbed and contrasted from baseline
fusion architectures. These five models then showcase future research areas on this
challenging problem of relational knowledge extraction from movies and free-form multi-modal
input sources. We find that, generally, the Kinetics model added with Attributes and
Objects beat the baseline models.
SESSION: FGAHI'20 Workshop
- Batuhan Sayis
- Narcis Pares
- Hatice Gunes
This study is part of a larger project that showed the potential of our mixed reality
(MR) system in fostering social initiation behaviors in children with Autism Spectrum
Condition (ASC). We compared it to a typical social intervention strategy based on
construction tools, where both mediated a face-to-face dyadic play session between
an ASC child and a non-ASC child. In this study, our first goal is to show that an
MR platform can be utilized to alter the nonverbal body behavior between ASC and non-ASC
during social interaction as much as a traditional therapy setting (LEGO). A second
goal is to show how these body cues differ between ASC and non-ASC children during
social initiation in these two platforms. We present our first analysis of the body
cues generated under two conditions in a repeated-measures design. Body cue measurements
were obtained through skeleton information and characterized in the form of spatio-temporal
features from both subjects individually (e.g. distances between joints and velocities
of joints), and interpersonally (e.g. proximity and visual focus of attention). We
used machine learning techniques to analyze the visual data of eighteen trials of
ASC and non-ASC dyads. Our experiments showed that: (i) there were differences between
ASC and non-ASC bodily expressions, both at individual and interpersonal level, in
LEGO and in the MR system during social initiation; (ii) the number of features indicating
differences between ASC and non-ASC in terms of nonverbal behavior during initiation
were higher in the MR system as compared to LEGO; and (iii) computational models evaluated
with combination of these different features enabled the recognition of social initiation
type (ASC or non-ASC) from body features in LEGO and in MR settings. We did not observe
significant differences between the evaluated models in terms of performance for LEGO
and MR environments. This might be interpreted as the MR system encouraging similar
nonverbal behaviors in children, perhaps more similar than the LEGO environment, as
the performance scores in the MR setting are lower as compared to the LEGO setting.
These results demonstrate the potential benefits of full body interaction and MR settings
for children with ASC.
- Katelyn Morrison
- Daniel Yates
- Maya Roman
- William W. Clark
Different measuring instruments, such as a goniometer, have been used by clinicians
to measure a patient's ability to rotate their thoracic spine. Despite the simplicity
of goniometers, this instrument requires the user to decipher the resulting measurement
properly. The correctness of these measurements are imperative for clinicians to properly
identify and evaluate injuries or help athletes enhance their overall performance.
This paper introduces a goniometer-free, noninvasive measuring technique using a Raspberry
Pi, a Pi Camera module, and software for clinicians to measure a subject's thoracic
rotation range of motion (ROM) when administering the seated rotation technique with
immediate measurement feedback. Determining this measurement is achieved by applying
computer vision object tracking techniques on a live video feed from the Pi Camera
that is secured on the ceiling above the subject. Preliminary results using rudimentary
techniques reveal that our system is very accurate in static environments.
- Diyala Erekat
- Zakia Hammal
- Maimoon Siddiqui
- Hamdi Dibeklioğlu
The standard clinical assessment of pain is limited primarily to self-reported pain
or clinician impression. While the self-reported measurement of pain is useful, in
some circumstances it cannot be obtained. Automatic facial expression analysis has
emerged asa potential solution for an objective, reliable, and valid measurement of
pain. In this study, we propose a video based approach for the automatic measurement
of self-reported pain and the observer pain intensity, respectively. To this end,
we explore the added value of three self-reported pain scales, i.e., the Visual Analog
Scale(VAS), the Sensory Scale (SEN), and the Affective Motivational Scale(AFF), as
well as the Observer Pain Intensity (OPI) rating for a reliable assessment of pain
intensity from facial expression. Using a spatio-temporal Convolutional Neural Network
- Recurrent Neural Network (CNN-RNN) architecture, we propose to jointly minimize
the mean absolute error of pain scores estimation for each of thesescales while maximizing
the consistency between them. The reliability of the proposed method is evaluated
on the benchmark database for pain measurement from videos, namely, the UNBC-McMaster
Pain Archive. Our results show that enforcing the consistency be-tween different self-reported
pain intensity scores collected using different pain scales enhances the quality of
predictions and improve the state of the art in automatic self-reported pain estimation.The
obtained results suggest that automatic assessment of self-reported pain intensity
from videos is feasible, and could be used as a complementary instrument to unburden
caregivers, specially for vulnerable populations that need constant monitoring.
- Yujin Wu
- Mohamed Daoudi
- Ali Amad
- Laurent Sparrow
- Fabien D'Hondt
So far, stress detection technology usually uses supervised learning methods combined
with a series of physiological, physical, or behavioral signals and has achieved promising
results. However, the problem of label collection such as the latency of stress response
and subjective uncertainty introduced by the questionnaires has not been effectively
solved. This paper proposes an unsupervised learning method with K-means clustering
for exploring students' autonomic responses to medical simulation training in an ambulant
environment. With the use of wearable sensors, features of electrodermal activity
and heart rate variability of subjects are extracted to train the K-means model. The
Silhouette Score of 0.49 with two clusters was reached, proving the difference in
students' mental stress between baseline stage and simulation stage. Besides, with
the aid of external ground truth which could be associated with either the baseline
phase or simulation phase, four evaluation metrics were calculated and provided comparable
results concerning supervised and unsupervised learning methods. The highest classification
performance of 70% was reached with the measure of precision. In the future, we will
integrate context information or facial image to provide more accurate stress detection.
SESSION: IGTD'20 Workshop
- Anna-Sophie Ulfert
- Eleni Georganta
Trust is a central element for effective teamwork and successful human-technology
collaboration. Although technologies, such as agents, are increasingly becoming autonomous
team members operating alongside humans, research on team trust in human-agent teams
is missing. Thus far, empirical and theoretical work have focused on aspects of trust
only towards the agent as a technology neglecting how team trust - with regards to
the human-agent team as a whole - develops. In this paper, we present a model of team
trust in human-agent teams combining two streams of research: (1) theories of trust
in human teams and (2) theories of human-computer interaction (HCI). We propose different
antecedents (integrity, ability, benevolence) that influence team trust in human-agent
teams as well as individual, team, system, and temporal factors that impact this relationship.
The goal of the present article is to advance our understanding of team trust in human-agent
teams and encourage an integration between HCI and team research when planning future
research. This will also help to design trustworthy human-agent teams and thereby,
when introducing human-agent teams, support organizational functioning.
- Angelika Kasparova
- Oya Celiktutan
- Mutlu Cukurova
Automatic analysis of students' collaborative interactions in physical settings is
an emerging problem with a wide range of applications in education. However, this
problem has been proven to be challenging due to the complex, interdependent and dynamic
nature of student interactions in real-world contexts. In this paper, we propose a
novel framework for the classification of student engagement in open-ended, face-to-face
collaborative problem-solving (CPS) tasks purely from video data. Our framework i)
estimates body pose from the recordings of student interactions; ii) combines face
recognition with a Bayesian model to identify and track students with a high accuracy;
and iii) classifies student engagement leveraging a Team Long Short-Term Memory (Team
LSTM) neural network model. This novel approach allows the LSTMs to capture dependencies
among individual students in their collaborative interactions. Our results show that
the Team LSTM significantly improves the performance as compared to the baseline method
that takes individual student trajectories into account independently.
- Fabian Walocha
- Lucien Maman
- Mohamed Chetouani
- Giovanna Varni
Group cohesion is a multidimensional emergent state that manifests during group interaction.
It has been extensively studied in several disciplines such as Social Sciences and
Computer Science and it has been investigated through both verbal and nonverbal communication.
This work investigates the dynamics of task and social dimensions of cohesion through
nonverbal motion-capture-based features. We modeled dynamics either as decreasing
or as stable/increasing regarding the previous measurement of cohesion. We design
and develop a set of features related to space and body movement from motion capture
data as it offers reliable and accurate measurements of body motions. Then, we use
a random forest model to binary classify (decrease or no decrease) the dynamics of
cohesion, for the task and social dimensions. Our model adopts labels from self-assessments
of group cohesion, providing a different perspective of study with respect to the
previous work relying on third-party labelling. The analysis reveals that, in a multilabel
setting, our model is able to predict changes in task and social cohesion with an
average accuracy of 64%(±3%) and 67%(±3%), respectively, outperforming random guessing
(50%). In a multiclass setting comprised of four classes (i.e., decrease/decrease,
decrease/no decrease, no decrease/decrease and no decrease/no decrease), our model
also outperforms chance level (25%) for each class (i.e., 54%, 44%, 33%, 50%, respectively).
Furthermore, this work provides a method based on notions from cooperative game theory
(i.e., SHAP values) to assess features' impact and importance. We identify that the
most important features for predicting cohesion dynamics relate to spacial distance,
the amount of movement while walking, the overall posture expansion as well as the
amount of inter-personal facing in the group.
- Uliyana Kubasova
- Gabriel Murray
Automated prediction of group task performance normally proceeds by extracting linguistic,
acoustic, or multimodal features from an entire conversation in order to predict an
objective task measure. In this work, we investigate whether we can maintain robust
prediction performance when using only limited context from the beginning of the meeting.
Graph-based conversation features as well as more traditional linguistic features
are extracted from the first minute of the meeting and from the entire meeting. We
find that models trained only on the first minute are competitive with models trained
on the full conversation. In particular, deriving features from graph-based models
of conversational interaction in the first minute of discussion is particularly effective
for predicting group performance, and outperforms models using more traditional linguistic
features. This work also uses a much larger amount of data than previous work, by
combining three similar survival task datasets.
- Navin Raj Prabhu
- Chirag Raman
- Hayley Hung
Social interactions in general are multifaceted and there exists a wide set of factors
and events that influence them. In this paper, we quantify social interactions with
a holistic viewpoint on individual experiences, particularly focusing on non-task-directed
spontaneous interactions. To achieve this, we design a novel perceived measure, the
perceived Conversation Quality, which intends to quantify spontaneous interactions
by accounting for several socio-dimensional aspects of individual experiences.
To further quantitatively study spontaneous interactions, we devise a questionnaire
which measures the perceived Conversation Quality, at both the individual- and at
the group- level. Using the questionnaire, we collected perceived annotations for
conversation quality in a publicly available dataset using naive annotators. The results
of the analysis performed on the distribution and the inter-annotator agreeability
shows that naive annotators tend to agree less in cases of low conversation quality
samples, especially while annotating for group-level conversation quality.
SESSION: MAAE'20 Workshop
Emotions are subjective experiences involving perceptual and con-textual factors [4].
There is no objective tool for precise measurement of emotions. However, we can anticipate
an emotion's emergence through the knowledge of common responses to events in similar
situations. We can also measure proxies of emotions by recognizing emotional expressions
[3]. Studying emotional response to multimedia allows identifying expected emotions
in users consuming the content. For example,abrupt loud voices are novel and unsettling
which result in surprise and higher experience of arousal [2,6]. For a particular
type of con-tent such as music, mid-level attributes such as rhythmic stability or
melodiousness have strong association with expected emotions[1]. Given that such mid-level
attributes are more related to the con-tent, their machine-perception is more straightforward.
Moreover,their perception in combination with user models enables building person-specific
emotion anticipation models.In addition to studying expected emotions, we can also
observe users emotional reactions to understand emotion in multimedia.Typical methods
of emotion recognition include recognizing emotions from facial or vocal expressions.
Recognition of emotional expressions requires large amount of labeled data, expensive
to produce. Hence, the most recent advances in machine-based emotion perception include
methods that can leverage unlabeled data through self-supervised and semi-supervised
learning [3, 5]. In this talk, I review the field and showcase methods for automatic
modeling and recognition of emotions and sentiment indifferent contexts [3,8]. I show
how we can identify underlying factors contributing to the construction of subjective
experience of emotions [1,7]. Identification of these factors allows us to use them
as mid-level attributes to build machine learning models for emotion and sentiment
understanding. I also show how emotions and sentiment can be recognized from expressions
with the goal of building empathetic autonomous agents [8].
The negative environmental stimuli (e.g., poorly maintained sidewalks, blighted properties,
graffiti, trash on the ground, unsafe traffic conditions) in the urban built environment
are linked to stress symptomatology in a significant portion of the urban populations.
It plays a significant contributor to the increase of urban-associated diseases such
as depression, allergies, asthma, diabetes, and cardiovascular diseases. A few studies
presented the potential to identify pedestrians' environmental distress caused by
the negative environmental stimuli using bio-signals (e.g., gait patterns, blood volume
pulse, and electrodermal activity) beyond the subjectivity concerns of traditional
approaches such as neighborhood surveys and field observation. However, there remain
several unanswered questions regarding whether the effect of the negative environmental
stimuli can be identified from bio-signals captured in naturalistic ambulatory settings,
which include various uncontrollable confounding factors (e.g., movement artifacts,
physiology reactivity due to non-intended stimuli, and individual variability). In
this context, this talk discusses the challenges and opportunities of leveraging bio-signals
in capturing environmental distress. We examine empirical associations between bio-signals
and environmental stimuli commonly observed in neighborhood-built environment. Then
we present a novel method that identifies group-level environmental distress by capturing
and aggregating prominent local patterns of bio-signals from multiple pedestrians.
In addition, the potential benefits of multimodal data are illustrated through the
experimental results that predict environmental distress by using both bio-signals
and image-based data (e.g., visual features captured from built environment image,
such as land use, sidewalk connectivity, and road speeds).
- Michal Gnacek
- David Garrido-Leal
- Ruben Nieto Lopez
- Ellen Seiss
- Theodoros Kostoulas
- Emili Balaguer-Ballester
- Ifigeneia Mavridou
- Charles Nduka
An increasing amount of virtual reality (VR) research is carried out to support the
vast number of applications across mental health, exercise and entertainment fields.
Often, this research involves the recording of physiological measures such as heart
rate recordings with an electrocardiogram (ECG). One challenge is to enable remote,
reliable and unobtrusive VR and heart rate data collection which would allow a wider
application of VR research and practice in the field in future. To address the challenge,
this work assessed the viability of replacing standard ECG devices with a photoplethysmography
(PPG) sensor that is directly integrated into a VR headset over the branches of the
supratrochlear vessels. The objective of this study was to investigate the reliability
of the PPG sensor for heart-rate detection. A total of 21 participants were recruited.
They were asked to wear an ECG belt as ground truth and a VR headset with the embedded
PPG sensor. Signals from both sensors were captured in free standing and sitting positions.
Results showed that VR headset with an integrated PPG sensor is a viable alternative
to an ECG for heart rate measurements in optimal conditions with limited movement.
Future research will extend on this finding by testing it in more interactive VR settings
- Julien Venni
- Mireille Bétrancourt
According to recent perspectives on human-computer interactions, subjective aspects
(emotion or visual attractiveness) have to be considered to provide optimal multimedia
material. However, the research investigating the impact of aesthetics or emotional
design has yielded varying conclusions regarding the use of interfaces and the resulting
learning outcomes. Possible reasons include implementation of the aesthetics variable
which varies from one study to another. On this base, an experimental study was conducted
to assess the influence of a specific feature of aesthetics, colour harmony, on the
use and subjective evaluation of a website. The study involved 34 participants browsing
on two versions of the same website about science-fiction movies, with harmonious
vs. disharmonious colours as the between-subject factor. After conducting six information
search tasks, participants answered to questionnaires assessing usability, user experience,
non-instrumental and instrumental qualities. Measures of actual usability of the website,
navigation, eye movements and implicit memory performance were collected. Results
showed that disharmonious colours caused lower subjective ratings for pragmatic qualities,
appeared to distract visual attention but, surprisingly, lead to higher memory performances.
On the other hand, colour harmony did not impact the navigation and perceived usability
of the system, the perception of the aesthetics (apart from colour), hedonic qualities
as well as the experience of use. These findings comfort the hypothesis that aesthetic
features affect users' behavior and perception, but not on all dimensions of user
experience. Based on the findings, a model for future research in the field is suggested.
- Konstantinos Moustakas
- Emmanouel Rovithis
- Konstantinos Vogklis
- Andreas Floros
In this work we present an adaptive audio mixing technique to be implemented in the
design of Augmented Reality Audio (ARA) systems. The content of such systems is delivered
entirely through the acoustic channel: the real acoustic environment is mixed with
a virtual soundscape and returns to the listener as "pseudoacoustic" environment.
We argue that the proposed adaptive mixing technique enhances user immersion in the
augmented space in terms of the localization of sound objects. The need to optimise
our ARA mixing engine emerged from our previous research, and more specifically from
the analysis of the experimental results regarding the development of the Augmented
Reality Audio Game (ARAG) "Audio Legends" that was tested on the field. The purpose
of our new design was to aid sound localization, which is a crucial and demanding
factor for delivering an immersive acoustic experience. We describe in depth the adaptive
mixing along with the experimental test-bed. The results for the sound localization
scenario indicate a substantial increase of 55 percent in accuracy compared to the
legacy ARA mix model.
SESSION: MAISTR'20 Workshop
- Fahim A. Salim
- Fasih Haider
- Maite Frutos-Pascual
- Dennis Reidsma
- Saturnino Luz
- Bert-Jan van Beijnum
This paper briefly overviews the first workshop on Action Modelling for Interaction
and Analysis in Smart Sports and Physical Education (MAIStroPE). It focuses on the
main aspects intended to be discussed in the workshop reflecting the main scope of
the papers presented during the meeting. The MAIStroPE 2020 workshop is held in conjunction
with the 22nd ACM International Conference on Mulitmodal Interaction (ICMI 2020) taking
place in Utrecht, the Netherlands, in October 2020.
- Håvard D. Johansen
- Dag Johansen
- Tomas Kupka
- Michael A. Riegler
- Pål Halvorsen
Recent technological advances are adapted in sports to improve performance, avoid
injuries, and make advantageous decisions. In this paper, we describe our ongoing
efforts to develop and deploy PMSys, our smartphone-based athlete monitoring and reporting
system. We describe our first attempts to gain insight into some of the data we have
collected. Experiences so far are promising, both on the technical side and for athlete
performance development. Our initial application of artificial-intelligence methods
for prediction is encouraging and indicative.
- Emanuele Antonioni
- Vincenzo Suriani
- Nicoletta Massa
- Daniele Nardi
The world population currently counts more of 617 million people over 65 years old.
COVID-19 has exposed this population group to new restrictions, leading to new difficulties
in care and assistance by family members. New technologies can reduce the degree of
isolation of these people, helping them in the execution of healthy activities such
as performing periodic sports routines. NAO robots find in this a possible application;
being able to alternate voice commands and execution of movements, they can guide
elderly people in performing gymnastic exercises. Additional encouragement could come
through demonstrations of the exercises and verbal interactions using the voice of
loved ones (for example, grandchildren). These are transmitted in real time to the
NAO which streams the video of older people exercising, bringing the two parties involved
closer together. This proposal, realized with the robot NAO V6, allows to have a help
at home ready to motivate, teach the exercises and train the elderly living alone
at home.
- Cristian Militaru
- Maria-Denisa Militaru
- Kuderna-Iulian Benta
Monitoring and correcting the posture during physical exercises can be a challenging
task, especially for beginners that do not have a personal trainer. Recently, successful
mobile applications in this domain were launched on the market, but we are unable
to find prior studies that are general-purpose and able to run on commodity hardware
(smartphones). Our work focuses on static exercises (e.g. Plank and Holding Squat).
We create a dataset of 2400 images. The main technical challenge is achieving high
accuracy for as many circumstances as possible. We propose a solution that relies
on Convolutional Neural Networks to classify images into: correct, hips too low or
hips too high. The Neural Network is used in a mobile application that provides live
feedback for posture correction. We discuss limitations of the solution and ways to
overcome them.
- Iustina Ivanova
- Marina Andric
- Andrea Janes
- Francesco Ricci
- Floriano Zini
The automatic detection of climbers activities can be the basis of software systems
able to support trainers to assess the climbers performance and to define more effective
training programs. We propose an initial building block of such a system, for the
unobtrusive identification of the activity of someone pulling a rope after finishing
the ascent. We use a novel type of quickdraw, augmented with a tri-axial accelerometer
sensor. The acceleration data generated by the quickdraw during the climbs are used
by a Machine Learning classifier for detecting the rope pulling activity. The obtained
results show that this activity can be detected automatically with high accuracy,
particularly by a Random Forest classifier. Moreover, we show that data acquired by
the quickdraw sensor, as well as the detected rope pulling, can also be used to benchmark
climbers.
SESSION: MeC'20 Workshop
Having a clear understanding of people's behaviour is essential to characterise patient
progress, make treatment decisions and elicit effective and relevant coaching actions.
Hence, a great deal of research has been devoted in recent years to the automatic
sensing and analysis of human behaviour. Sensing options are currently unparalleled
due to the number of smart, ubiquitous sensor systems developed and deployed globally.
Instrumented devices such as smartphones or wearables enable unobtrusive observation
and detection of a wide variety of behaviours as we go about our physical and virtual
interactions with the world. The vast amount of data generated by such sensing infrastructures
can be then analysed by powerful machine-learning algorithms, which map the raw data
into predictive trajectories of behaviour. The processed data is combined with computerised
behaviour change frameworks and domain knowledge to dynamically generate tailored
recommendations and guidelines through advanced reasoning. This talk explores the
recent advances in the automatic sensing and analysis of human behaviour to inform
e-coaching actions. The H2020 research and innovation project "Council of Coaches"
is particularly used to illustrate the main concepts underpinning this novel area
as well as to provide some guidelines and directions for the development of human
behaviour measurement technologies to support the future generation of e-coaching
systems.
- Andoni Beristain Iraola
- Roberto Álvarez Sánchez
- Despoina Petsani
- Santiago Hors-Fraile
- Panagiotis Bamidis
- Evdokimos Konstantinidis
This paper presents a virtual coach for older adults at home to support active and
healthy aging, and independent living. It aids users in their behavior change process
for improving on cognitive, physical, social interaction and nutrition areas using
SMART goals. To achieve an effective behavior change of the user, the coach relies
on the I-Change behavioral change model. Using a combination of projectors, cameras,
microphones and support sensors, the older adult's home becomes an augmented reality
environment, where common objects are used for projection and sensed. Older adults
interact with this virtual coach in their home in a natural way using speech and body
gestures (including touch in certain objects).
- Irina Paraschivoiu
- Jakub Sypniewski
- Artur Lupp
- Magdalena Gärtner
- Nadejda Miteva
- Zlatka Gospodinova
In this work, we present our approach to designing a multimodal, persuasive system
for coaching older adults in four domains of daily living: activity, mobility, sleep,
social interaction. Our design choices were informed by considerations related to
the deployment of the system in four pilot sites and three countries: Austria, Bulgaria
and Slovenia. In particular, we needed to keep the system affordable, and design across
divides such as urban-rural and high-low technological affinity. We present these
considerations, together with our approach to coaching through text, audio, light
and color, and with the participation of the users' social circles and caregivers.
We conducted two workshops and found preference for voice and text. Participants in
Bulgaria also showed a preference for music-based rendering of coaching actions.
- Johannes Kropf
- Niklas Aron Hungerländer
- Kai Gand
- Hannes Schlieter
vCare is designing personalized rehabilitation programs that will lead to better continuity
of care and a better quality of life for patients with stroke, heart failure, Parkinson's
disease or ischemic heart disease. It's goal is to provide a holistic approach for
transferring rehabilitation pathways from stationary rehabilitation to the patient's
home. VCare persues two novel approaches in the field. First, it combines persuasive
system design (PSD) with a health psychological model (IMB) which are implemented
into a software system to motivate the user. Second, it integrates personalized rehabilitation
paths and a virtual coach with graphical representation to support the rehabilitation
process. The coach is based on patients' personalized care pathways. It engages with
patients so that they meet their individual care plans. This encourages compliance
with the patients' rehabilitation programs.
- Mohamed Ehab
- Hossam Mohamed
- Mohamed Ahmed
- Mostafa Hammad
- Noha ElMasry
- Ayman Atia
In sports, coaching remains an essential aspect of the efficiency of the athlete's
performance. This paper proposes a wrist wearable assistant for the swimmer called
iSwimCoach. The key aim behind the system is to detect and analyze incorrect swimming
patterns in a free crawl swimming style using an accelerometer sensor. iSwimCoach
collects patterns of a swimmer's stream which enables it to detect the strokes to
be analyzed in real-time. Therefore, introducing quick and efficient self-coaching
feature for mid-level athlete to enhance their swimming style. In our research, we
were able to monitor athlete strokes underwater and hence assist swimming coaches.
The proposed system was able to classify four types of strokes done by mid-level players
(correct strokes, wrong recovery, wrong hand entry and wrong high elbow). The system
informs both the swimmer and the coach when an incorrect movement is detected. iSwimCoach
achieved 91% accuracy for the detection and classification of incorrect strokes by
a fast non expensive dynamic time warping algorithm. These readings analyzed in real-time
to automatically generate reports for the swimmer and coach.
- Mira El Kamali
- Leonardo Angelini
- Denis Lalanne
- Omar Abou Khaled
- Elena Mugellini
Recently, research in the multimodal interaction area has shown a rapid development
and users have been embracing their experience of multimodal technologies. In fact,
having a multimodal system means allowing the user to choose one or more communication
channel to access the system. While, on the contrary, unimodal systems do not have
this option and can only settle to one-way of communication. Particularly, conversational
agents are also in rapid increment and older adults are becoming more and more exposed
to such agents. NESTORE is a virtual coach that aims to follow older adults in their
wellbeing journey. It comes in two forms of interfaces: (i) a chatbot which is a text-based
messaging application and (ii) a tangible coach which is a vocal assistant. This virtual
coach is multimodal, due to the fact that it can depict or send information and even
receive the user's input through two different interfaces which are the chatbot and
the tangible coach. Our aim is then to explore the modality that the user prefers
in terms of user experience. We experimented with older adults five different types
of scenarios where the virtual coach was interacting with a user through different
modalities, used individually or combined. The virtual coach was asking a set of questions
derived from a behavioral change model called HAPA. We measured the perceived user
experience for each scenario with the UEQ-S questionnaire, and asked to rank the scenarios
according to the users? preferences.
- Zoraida Callejas
- David Griol
- Kawtar Benghazi
- Manuel Noguera
- Gerard Chollet
- María Inés Torres
- Anna Esposito
Mental health e-coaches and technology-delivered services are showing considerable
benefits to foster mental health literacy, monitor symptoms, favour self-management
of different mental health conditions and scaffold positive behaviours. However, adherence
to these systems is usually low and generally declines over time. There exists a recent
body of work addressing engagement with mental health technology with the aim to understand
the factors that influence sustained use and inform the design of systems that are
able to generate sufficient engagement to attain their expected results. This paper
explores the different facets of engagement in mental health e-coaches, including
aspects related to the estimation of system use from log data, effective engagement,
user experience, motivation, incentives, user expectations, peer support and the specific
challenges of technologies addressed to mental health.
- Jacky Casas
- Marc-Olivier Tricot
- Omar Abou Khaled
- Elena Mugellini
- Philippe Cudré-Mauroux
Chatbots are computer programs aiming to replicate human conversational abilities
through voice exchanges, textual dialogues, or both. They are becoming increasingly
pervasive in many domains like customer support, e-coaching or entertainment. Yet,
there is no standardised way of measuring the quality of such virtual agents. Instead,
multiple individuals and groups have established their own standards either specifically
for their chatbot project or have taken some inspiration from other groups. In this
paper, we make a review of current techniques and trends in chatbot evaluation. We
examine chatbot evaluation methodologies and assess them according to the ISO 9214
concepts of usability: Effectiveness, Efficiency and Satisfaction. We then analyse
the methods used in the literature from 2016 to 2020 and compare their results. We
identify a clear trend towards evaluating the efficiency of chatbots in many recent
papers, which we link to the growing popularity of task-based chatbots that are currently
being deployed in many business contexts.
SESSION: MHFI'20 Workshop
- Tom Gayler
- Corina Sas
- Vaiva Kalnikaite
This initial study explores the design of flavor-based cues with older adults for
their self-defining memories. It proposes using food to leverage the connections between
odor and memory to develop new multisensory memory cues. Working with 4 older adults,
we identified 6 self-defining autobiographical memories for each participant, 3 related
to food, 3 unrelated to food. Flavor-based cues were then created for each memory
through a co-design process. Findings indicate the dominance of relationship themes
in the identified self-defining memories and that flavor-based cues related mostly
to multiple ingredient dishes. We discuss how these findings can support further research
and design into flavor-based memory cues through 3D food printing.
- Yifan Chen
- Zhuoni Jie
- Hatice Gunes
This paper focuses on: (i) Automatic recognition of taste-liking from facial videos
by comparatively training and evaluating models with engineered features and state-of-the-art
deep learning architectures, and (ii) analysing the classification results along the
aspects of facilitator type, and the gender, ethnicity, and personality of the participants.
To this aim, a new beverage tasting dataset acquired under different conditions (human
vs. robot facilitator and priming vs. non-priming facilitation) is utilised. The experimental
results show that: (i) The deep spatiotemporal architectures provide better classification
results than the engineered feature models; (ii) the classification results for all
three classes of liking, neutral and disliking reach F1 scores in the range of 71%
- 91%; (iii) the personality-aware network that fuses participants' personality information
with that of facial reaction features provides improved classification performance;
and (iv) classification results vary across participant gender, but not across facilitator
type and participant ethnicity.
- Abilash Nivedhan
- Line Ahm Mielby
- Qian Janice Wang
Eating is a process that involves all senses. Recent research has shown that both
food-intrinsic and extrinsic sensory factors play a role in the taste of the food
we consume. Moreover, many studies have explored the relationship between emotional
state and taste perception, where certain emotional states have been shown to alter
the perception of basic tastes. This opens up a whole new world of possibilities for
the design of eating environments which take into account both sensory attributes
as well as their emotional associations. Here, we used virtual reality to study the
effect of colours and music, with specific emotional associations, on the evaluation
of cold brew coffee. Based on an online study (N=76), two colours and two pieces of
music with similar emotional arousal but opposing valence ratings were chosen to produce
a total of eight virtual coloured environments. Forty participants were recruited
for the on-site experiment, which consisted of three blocks. First, a blind tasting
of four coffee samples (0%, 2.5%, 5%, 7.5% sucrose) was carried out. Next, participants
experienced the eight environments via an HTC Vive Pro headset and evaluated their
expected liking, sweetness and bitterness of a mug of coffee presented in VR. Finally,
they tasted identical 5% coffee samples in the same eight environments. Results revealed
One of the key findings of this study that, when only one factor (colour or music)
was manipulated, background colour significantly influenced coffee liking. When colour
and music were used in combination, however, we found an overall effect of music valence
on coffee sweetness, as well as an interaction effect of colour and music on liking.
These results reinforce the importance of the extrinsic sensory and emotion factors
on food expectations and liking. Overall, these results are in line with previous
research , where positive emotions can lead to increased food liking and higher sweetness
compared to negative emotions.
- Jasper J. van Beers
- Daisuke Kaneko
- Ivo V. Stuldreher
- Hilmar G. Zech
- Anne-Marie Brouwer
Implicit approach-avoidance tendencies can be measured by the approach-avoidance task
(AAT). The emergence of mobile variants of the AAT enable its use for both in-the-lab
and in-the-field experiments. Within the food domain, use of the AAT is concentrated
in research on eating disorders or healthy eating and is seldom used as an implicit
measure of food experience. Given the prevalence of explicit measures in this field,
the AAT may provide additional valuable insights into food experience. To facilitate
the use of the AAT as an implicit measure, a processing tool and accompanying graphical
user interface (GUI) have been developed for a mobile smartphone variant of the AAT.
This tool improves upon the existing processing framework of this mobile AAT by applying
more robust filtering and introduces additional sensor correction algorithms to improve
the quality of sensor data. Along with refined estimates of reaction time (RT) and
reaction force (RF), these processing improvements introduce a new metric: the response
distance (RD). The capabilities of the tool, along with the potential added value
of calculating RF and RD, are explained in this paper through the processing of pilot
data on molded and unmolded food. In particular, the RF and RD may be indicative of
participants' arousal. The tool developed in this paper is open source and compatible
with other experiments making use of the mobile AAT within, and beyond, the domain
of food experience.
- Conor Patrick Gallagher
- Radoslaw Niewiadomski
- Merijn Bruijnes
- Gijs Huisman
- Maurizio Mancini
Commensality is defined as "a social group that eats together", and eating in a commensality
setting has a number of positive effects on humans. The purpose of this paper is to
investigate the effects of technology on commensality by presenting an experiment
in which a toy robot showing non-verbal social behaviours tries to influence a participants'
food choice and food taste perception. We managed to conduct both a qualitative and
quantitative study with 10 participants. Results show the favourable impression of
the robot on participants. It also emerged that the robot may be able to influence
the food choices using its non-verbal behaviors only. However, these results are not
statistically significant, perhaps due to the small sample size. In the future, we
plan to collect more data using the same experimental protocol, and to verify these
preliminary results.
- Eleonora Ceccaldi
- Gijs Huisman
- Gualtiero Volpe
- Maurizio Mancini
Eating together is one of the most treasured human activities. Its benefits range
from improving the taste of food to mitigating the feelings of loneliness. In 2020,
many countries have adopted lock-down and social distancing policies, forcing people
to stay home,often alone and away from families and friends. Although technology can
help connecting those that are physically distant, it is not clear whether eating
together, at the same moment via video-call,is effective in creating the sense of
connectedness that comes with sharing a meal with a friend or a family member in person.
In this work, we report the results of an online survey on remote eating practices
during Covid-19 lock-down, exploring the psychological motivations behind remote eating
and behind deciding not to. Moreover, we sketch how future technologies could help
creating digital commensality experiences
- Heikki Aisala
- Jussi Rantala
- Saara Vanhatalo
- Markus Nikinmaa
- Kyösti Pennanen
- Roope Raisamo
- Nesli Sözer
Multisensory augmented reality systems have demonstrated the potential of olfactory
cues in the augmentation of flavor perception. Earlier studies have mainly used commercially
available sample products. In this study, custom rye-based cakes with reduced sugar
content were used to study the influence of different odorants on the perceived sweetness.
A custom olfactory display was developed for presenting the odorants. The results
showed that augmentation of a reduced sugar rye-based cake with localized maltol,
vanilla, and strawberry odor increased the perceived sweetness of the cake-odor pair
compared to a cake with deodorized airflow.
- Naoya Zushi
- Monica Perusquia-Hernandez
- Saho Ayabe-Kanamura
The emotions we experience shape our perception, and our emotion is shaped by our
perceptions. Taste perception is also influenced by emotions. Positive and negative
emotions alter sweetness, sourness, and bitterness perception. However, most previous
studies mainly explored valence changes. The effect of arousal on taste perception
is less studied. In this study, we asked volunteers to watch positive affect inducing
videos with high arousal and low arousal. Our results showed a successful induction
of high and low arousal levels as confirmed by self-report and electrophysiological
signals. Moreover, self-report affective ratings did not show a significant effect
on self-reported taste ratings. However, we found a negative correlation between smile
occurrence and sweetness ratings. In addition, EDA scores were positively correlated
with saltiness. This suggests that even if the self-reported affective state is not
granular enough, looking at more fine-grained affective cues can inform ratings of
taste.
- Roelof A. J. de Vries
- Gijs H. J. Keizers
- Sterre R. van Arum
- Juliet A. M. Haarman
- Randy Klaassen
- Robby W. van Delden
- Bert-Jan F. van Beijnum
- Janet H. W. van den Boer
This paper presents two use cases for a new multimodal interactive instrument: the
Sensory Interactive Table. The Sensory Interactive Table is an instrumented, interactive
dining table, that measures eating behavior - through the use of embedded load cells
- and interacts with diners - through the use of embedded LEDs. The table opens up
new ways of exploring the social dynamics of eating. The two use cases describe explorations
of the design space of the Sensory Interactive Table in the context of the social
space of eating. The first use case details the process of co-designing and evaluating
applications to stimulate children to eat more vegetables. The second use case presents
the process of designing and evaluating applications to stimulate young adults to
reduce their eating speed in a social setting. The results show the broad potential
of the design space of the table across user groups, types of interactions, as well
as the social space of eating.
- Chi Thanh Vi
- Asier Marzo
- Dmitrijs Dmitrenko
- Martin Yeomans
- Marianna Obrist
How food is presented and eaten influences the eating experience. Novel gustatory
interfaces have opened up new ways for eating at the dining table. For example, recent
developments in acoustic technology have enabled the transportation of food and drink
in mid-air, directly into the user's tongue. Basic taste particles like sweet, bitter
and umami have higher perceived intensity when delivered with acoustic levitation,
and are perceived as more pleasant despite their small size (approx. 20 L or 4mm diameter
droplets). However, it remains unclear if users are ready to accept this delivery
method at the dining table. Sixty-nine children aged 14 to 16 years did a taste test
of 7 types of foods and beverages, using two delivery methods: acoustic levitation,
and knife and fork (traditional way). Children were divided into two groups: one group
was shown a video demonstrating how levitating food can be eaten before the main experiment
whereas the other group was shown the videos after. Our results showed no significant
differences in liking of the foods and beverages between the two delivery methods.
However, playing the video prior to the test significantly increased the liking and
willingness to eat vegetables in the levitation method. Evaluative feedback suggested
that a bigger portion size of levitating foods could be the game-changer to integrate
this novel technology into real-life eating experiences.
- Jeannette Shijie Ma
- Marcello A. Gómez Maureira
- Jan N. van Rijn
Food identification technology potentially benefits both food and media industries,
and can enrich human-computer interaction. We assembled a food classification dataset
consisting of 11,141 clips, based on YouTube videos of 20 food types. This dataset
is freely available on Kaggle. We suggest the grouped holdout evaluation protocol
as evaluation method to assess model performance. As a first approach, we applied
Convolutional Neural Networks on this dataset. When applying an evaluation protocol
based on grouped holdout, the model obtained an accuracy of 18.5%, whereas when applying
an evaluation protocol based on uniform holdout, the model obtained an accuracy of
37.58%. When approaching this as a binary classification task, the model performed
well for most pairs. In both settings, the method clearly outperformed reasonable
baselines. We found that besides texture properties, eating action differences are
important consideration for data driven eating sound researches. Protocols based on
biting sound are limited to textural classification and less heuristic while assembling
food differences.
SESSION: MIP'20 Workshop
- Siavash Rezaei
- Abhishek Moturu
- Shun Zhao
- Kenneth M. Prkachin
- Thomas Hadjistavropo
- Babak Taati
Painful conditions are prevalent in older adults, yet may go untreated, especially
in people with severe dementia who often cannot verbally communicate their pain. Not
addressing the pain can lead to the worsening of underlying conditions or lead to
frustration and agitation. For older adults living in long-term care (LTC) facilities,
timely assessment of pain remains a challenge. The main reasons for this are staff
shortages at these facilities and/or insufficient expertise in cutting-edge pain assessment
methods reliant on non-verbal cues. Ambient monitoring of non-verbal cues of pain,
e.g. facial expressions, body movements, or vocalizations, is a promising avenue to
improve pain management in LTC. Despite extensive existing research in computer vision
algorithms for pain detection, the currently available techniques and models are not
ready or directly applicable for use in LTC settings. Publicly available video datasets
used for training and validating pain detection algorithms (e.g. the UNBC-McMaster
Shoulder Pain Expression Archive Database and the BioVid Heat Pain Database) do not
include older adults with dementia. Facial analysis models that are trained and validated
on data from healthy and primarily young adults are known to under-perform, sometimes
drastically, when tested on faces of older adults with dementia. As such, the performance
of existing pain detection models on the dementia population remains to be validated.
Furthermore, in existing datasets, participants are well-lit and face the camera;
so the developed algorithm's performance may not transfer to a realistic ambient monitoring
setting. In this work, we make three main contributions. First, we develop a fully-automated
pain monitoring system (based on a convolutional neural network architecture) especially
designed for and validated on a new dataset of over 162,000 video frames recorded
unobtrusively from 95 older adults, of which 47 were community dwelling and cognitively
healthy (age:~75.5~±~6.1), and 48 (age:~82.5~±~9.2) were individuals with severe dementia
residing in LTC. Second, we introduce a data efficient pairwise training and inference
method that calibrates to each individual face. Third, we introduce a contrastive
training method and show that it significantly improves cross-dataset performance
across UNBC-McMaster, cognitively healthy older adults, and older adults with dementia.
We perform 5-fold (leave-subjects-out) cross-validation. Our algorithm achieves a
Pearson correlation coefficient (PCC) of 0.48 for per-frame predictions of pain intensity
and a PCC of 0.82 for predictions aggregated over 20 second windows for participants
with dementia.
- Yaohan Ding
- Itir Onal Ertugrul
- Ali Darzi
- Nicole Provenza
- László A. Jeni
- David Borton
- Wayne Goodman
- Jeffrey Cohn
Continuous deep brain stimulation (DBS) of the ventral striatum (VS) is an effective
treatment for severe, treatment-refractory obsessive-compulsive disorder (OCD). Optimal
parameter settings are signaled by a mirth response of intense positive affect, which
is subjectively identified by clinicians. Subjective judgments are idiosyncratic and
difficult to standardize. To objectively measure mirth responses, we used Automatic
Facial Affect Recognition (AFAR) in a series of longitudinal assessments of a patient
treated with DBS. Pre- and post-adjustment DBS were compared using both statistical
and machine learning approaches. Positive affect was significantly higher after DBS
adjustment. Using XGBoost and SVM, the participant's pre- and post-adjustment responses
were differentiated with accuracy values of 0.76 and 0.75, which suggest feasibility
of objective measurement of mirth response.
- Daniel S. Messinger
- Lynn Perry
- Chaoming Song
- Yudong Tao
- Samantha Mitsven
- Regina Fasano
- Chitra Banarjee
- Yi Zhang
- Mei-Ling Shyu
The educational inclusion of children with communication disorders together with typically
developing (TD) peers is a national standard. However, we have little mechanistic
understanding of how interactions with peers and teachers contribute to the language
development of these children. To build that understanding, we combine objective measurement
of the quantity and quality of child and teacher speech with radio frequency identification
of their physical movement and orientation. Longitudinal observations of two different
sets of classrooms are analyzed. One set of classrooms contains children who require
hearing aids and cochlear implants. Another set of classrooms contains children with
autism spectrum disorder (ASD). Computational modeling of pair-wise movement/orientation
is used to derive periods of social contact when speech may occur. Results suggest
that children with ASD are isolated from peers but approach teachers relatively quickly.
Overall, talk with peers in social contact (and speech heard from teachers) promotes
children's own talk which, in turn, is associated with assessed language abilities.
- Yeojin Amy Ahn
- Jacquelyn Moffitt
- Yudong Tao
- Stephanie Custode
- Mei-Ling Shyu
- Lynn Perry
- Daniel S. Messinger
Autism spectrum disorder (ASD) is defined by persistent disturbances of social communication,
as well as repetitive patterns of behavior. ASD is identified on the basis of expert,
but subjective, clinician judgment during assessments such as the Autism Diagnostic
Observation Schedule-2 (ADOS-2). Quantification of key social behavioral features
of ASD using objective measurements would enrich scientific understanding of the disorder.
The current pilot study leveraged computer vision and audio signal processing to identify
a key set of objective measures of children's social communication behaviors during
the ADOS-2 (e.g., social gaze, social smile, vocal interaction) that were captured
with adult-worn camera-embedded eyeglasses. Objective measurements of children's social
communicative behaviors during the ADOS-2 showed relatively low levels of association
with the examiner-adjudicated ADOS-2 scores. Future directions and implications for
the use of objective measurements in diagnostic and treatment monitoring are discussed.
During mother-infant face-to-face communication, many modalities are always at play:
gazing at and away from the partner, facial expression, vocalization, orientation
and touch. Multi-modal information in social communication typically conveys congruent
information, which facilitates attention, learning, and interpersonal relatedness.
However, when different modalities convey discrepant information, social communication
can be disturbed. This paper illustrates forms of discrepant mother-infant communication
drawn from our prior studies in three risk contexts: maternal depression, maternal
anxiety, and the origins of disorganized attachment. Because many examples of discrepancies
emerged in the course of our studies, we consider inter-modal discrepancies to be
important markers of disturbed mother-infant communication
- Yan Li
- Xiaohan Xia
- Dongmei Jiang
- Hichem Sahli
- Ramesh Jain
Mental health applications are increasingly interested in using audio-visual and physiological
measurements to detect the emotional state of a person, where significant researches
aim to detect episodic emotional state. The availability of wearable devices and advanced
signals is attracting researchers to explore the detection of a continuous sequence
of emotion categories, referred to as emotion stream, for understanding mental health.
Currently, there are no established databases for experimenting with emotion streams.
In this paper, we make two contributions. First, we collect a Multi-modal EMOtion
Stream (MEMOS) database in the scenario of social games. Audio-video recordings of
the players are made via mobile phones and aligned Electrocardiogram (ECG) signals
are collected by wearable sensors. Totally 40 multi-modal sessions have been recorded,
each lasting between 25 to 70 minutes. Emotional states with time boundaries are self-reported
and annotated by the participants while watching the video recordings. Secondly, we
propose a two-step emotional state detection framework to automatically determine
the emotion categories with their time boundaries along the video recordings. Experiments
on the MEMOS database provide the baseline result for temporal emotional state detection
research, with average mean-average-precision (mAP) score as 8.109% on detecting the
five emotions (happiness, sadness, anger, surprise, other negative emotions) in videos.
It is higher than 5.47% where the emotions are detected by averaging the frame-level
confidence scores (obtained by Face++ emotion recognition API) in the segments from
a sliding window. We expect that this paper will introduce a novel research problem
and provide a database for related research.
SESSION: MSECP'20 Workshop
- Elif Gümüslü
- Duygun Erol Barkana
- Hatice Köse
Robot-assisted rehabilitation systems are developed to monitor the performance of
the patients and adapt the rehabilitation task intensity and difficulty level accordingly
to meet the needs of the patients. The robot-assisted rehabilitation systems can be
more prosperous if they are able to recognize the emotions of patients, and modify
the difficulty level of task considering these emotions to increase patient's engagement.
We aim to develop an emotion recognition model using electroencephalography (EEG)
and physiological signals (blood volume pulse (BVP), skin temperature (ST) and skin
conductance (SC)) for a robot-assisted rehabilitation system. The emotions are grouped
into three categories, which are positive (pleasant), negative (unpleasant) or neutral.
A machine-learning algorithm called Gradient Boosting Machines (GBM) and a deep learning
algorithm called Convolutional Neural Networks (CNN) are used to classify pleasant,
unpleasant and neutral emotions from the recorded EEG and physiological signals. We
ask the subjects to look at pleasant, unpleasant and neutral images from IAPS database
and collect EEG and physiological signals during the experiments. The classification
accuracies are compared for both GBM and CNN methods when only one sensory data (EEG,
BVP, SC and ST) or the combination of the sensory data from both EEG and physiological
signals are used.
- Jauwairia Nasir
- Barbara Bruno
- Pierre Dillenbourg
Intelligent Tutoring Systems (ITS) are required to intervene in a learning activity
while it is unfolding, to support the learner. To do so, they often rely on performance
of a learner, as an approximation for engagement in the learning process. However,
in learning tasks that are exploratory by design, such as constructivist learning
activities, performance in the task can be misleading and may not always hint at an
engagement that is conducive to learning. Using the data from a robot mediated collaborative
learning task in an out-of-lab setting, tested with around 70 children, we show that
data-driven clustering approaches, applied on behavioral features including interaction
with the activity, speech, emotional and gaze patterns, not only are capable of discriminating
between high and low learners, but can do so better than classical approaches that
rely on performance alone. First experiments reveal the existence of at least two
distinct multi-modal behavioral patterns that are indicative of high learning in constructivist,
collaborative activities.
- Daniel C. Tozadore
- Roseli A. F. Romero
Social robots' contributions to education are notorious but, in times, limited by
the difficulty in their programming by regular teachers. Our framework named R-CASTLE
aims to overcome this problem by providing the teachers with an easy way to program
their content and the robot's behavior through a graphical interface. However, the
robot's behavior adaptation algorithm maybe still not the best intuitive method for
teachers' understanding. Fuzzy systems have the advantage of being modeled in a more
human-like way than other methods due to their implementation based on linguistic
variables and terms. Thus, fuzzy modeling for robot behavior adaptation in educational
children-robot interactions is proposed for this framework. The modeling resulted
in an adaptation algorithm that considers a multimodal and autonomous assessment of
the students' skills: attention, communication, and learning. Furthermore, preliminary
experiments were performed considering videos with the robot in a school environment.
The adaptation was set to change the content approach difficulty to produce a suitably
challenging behavior according to each students' reactions. Results were compared
to a Rule-Based adaptive method. The fuzzy modeling showed similar accuracy to the
ruled-based method with a suggestion of a more intuitive interpretation of the process.
- Srinivas Parthasarathy
- Shiva Sundaram
Automatic audio-visual expression recognition can play an important role in communication
services such as tele-health, VOIP calls and human-machine interaction. Accuracy of
audio-visual expression recognition could benefit from the interplay between the two
modalities. However, most audio-visual expression recognition systems, trained in
ideal conditions, fail to generalize in real world scenarios where either the audio
or visual modality could be missing due to a number of reasons such as limited bandwidth,
interactors' orientation, caller initiated muting. This paper studies the performance
of a state-of-the art transformer when one of the modalities is missing. We conduct
ablation studies to evaluate the model in the absence of either modality. Further,
we propose a strategy to randomly ablate visual inputs during training at the clip
or frame level to mimic real world scenarios. Results conducted on in-the-wild data,
indicate significant generalization in proposed models trained on missing cues, with
gains up to 17% for frame level ablations, showing that these training strategies
cope better with the loss of input modalities.
- Nerea Urrestilla
- David St-Onge
Cognitive load covers a wide field of study that triggers the interest of many disciplines,
such as neuroscience, psychology and computer science since decades. With the growing
impact of human factor in robotics, many more are diving into the topic, looking,
namely, for a way to adapt the control of an autonomous system to the cognitive load
of its operator. Theoretically, this can be achieved from heart-rate variability measurements,
brain waves monitoring, pupillometry or even skin conductivity. This work introduces
some recent algorithms to analyze the data from the first two and assess some of their
limitations.
- Aurelien Lechappe
- Mathieu Chollet
- Jérôme Rigaud
- Caroline G.L. Cao
The use of robotic surgical systems disrupts existing team dynamics inside operating
rooms and constitutes a major challenge for the development of crucial non-technical
skills such as situation awareness (SA). Techniques for assessing SA mostly rely on
subjective assessments and questionnaires; few leverage multimodal measures combining
physiological, behavioural, and subjective indicators. We propose a conceptual model
relating SA with mental workload, stress and communication, supported by measurable
behaviours and physiological signals. To validate this model, we collect subjective,
behavioural, and physiological data from surgical teams performing radical prostatectomy
using robotic surgical systems. Statistical analyses will be performed to establish
relationships between SA, subjective assessment of stress and mental workload, communication
processes, and the surgeons' physiological signals.
- Felix Putze
- Merlin Burri
- Lisa-Marie Vortmann
- Tanja Schultz
Human attention determines to a large degree how users interact with technical devices
and how technical artifacts can support them optimally during their tasks. Attention
shifts between different targets, triggered through changing requirements of an ongoing
task or through salient distractions in the environment. Such shifts mark important
transition points which an intelligent system needs to predict and attribute to an
endogenous or exogenous cause for an appropriate reaction. In this paper, we describe
a model which performs this task through a combination of bottom-up and topdown modeling
components. We evaluate the model in a scenario with a dynamic task in a rich environment
and show that the model is able to predict attention future switches with a robust
classification performance.
- Sromona Chatterjee
- Kevin Scheck
- Dennis Küster
- Felix Putze
- Harish Moturu
- Johannes Schering
- Jorge Marx Gómez
- Tanja Schultz
Driving and biking are complex and attention-demanding tasks for which distractions
are a major safety hazard.
Modeling driving-related attention with regard to audio-visual distractions and assessing
the attentional overload could help drivers to reduce stress and increase safety.
In this work, we present a multimodal recording architecture using dry EEG-electrodes
and the eye-tracking capability of the HoloLens 2 for an outdoor Augmented Reality
(AR) scenario. The AR street scene contains visual distractions and moving cars and
is shown to the subject in a simulation through the Hololens 2. The system records
EEG and eye-tracking data to predict changes in the driver's attention. A preliminary
case study is presented here to detail the data acquisition setup and to detect the
occurrences of visual distractions in the simulation. Our first results suggest that
this approach may overall be viable. However, further research is still required to
refine our setup and models, as well as to evaluate the ability of the system to capture
meaningful changes of attention in the field.
SESSION: MSMT'20 Workshop
- Hiroyuki Ishihara
- Shiro Kumano
To analyze human interaction behavior in a group or crowd, identification and device
time synchronization are essential but time demanding to be performed manually. To
automate the two processes jointly without any calibration steps nor auxiliary sensor,
this paper presents an acceleration-correlation-based method for multi-person interaction
scenarios where each target person wears an accelerometer and a camera is stationed
in the scene. A critical issue is how to remove the time-varying gravity direction
component from wearable device acceleration, which degrades the correlation of body
acceleration between the device and video, yet is hard to estimate accurately. Our
basic idea is to estimate the gravity direction component in the camera coordinate
system, which can be obtained analytically, and to add it to the vision-based data
to compensate the degraded correlation. We got high accuracy results for 4 person-device
matching with only 40 to 60 frames (4 to 6 seconds). The average timing offset estimation
is about 5 frames (0.5 seconds). Experimental results suggest it is useful for analyzing
individual trajectories and group dynamics at low frequencies.
- Temitayo Olugbade
- Nicolas Gold
- Amanda C de C Williams
- Nadia Bianchi-Berthouze
The use of multiple clocks has been a favoured approach to modelling the multiple
timescales of sequential data. Previous work based on clocks and multi-timescale studies
in general have not clearly accounted for multidimensionality of data such that each
dimension has its own timescale(s). Focusing on body movement data which has independent
yet coordinating degrees of freedom, we propose a Movement in Multiple Time (MiMT)
neural network. Our MiMT models multiple timescales by learning different levels of
movement interpretation (i.e. labels) and further allows for separate timescales across
movements dimensions. We obtain 0.75 and 0.58 average F1 scores respectively for binary
frame-level and three-class window-level classification of pain behaviour based on
the MiMT. Findings in ablation studies suggest that these two elements of the MiMT
are valuable to modelling multiple timescales of multidimensional sequential data.
This paper describes to approaches to develop simple systems for expressive bodily
interaction with music, without prior musical knowledge on the user's part. It discusses
two almost oppositional models: 1. Modifying a preexisting recording through spatial
articulation, and 2. Rule based ad-hoc composition of a musical piece of indefinite
length, based on precomposed chord progression(s). The approaches differ both in interaction
models as well as in musical feedback.
- Olga Matthiopoulou
- Benoit Bardy
- Giorgio Gnecco
- Denis Mottet
- Marcello Sanguineti
- Antonio Camurri
The work reports ongoing research about a computational method, based on cooperative
games on graphs, aimed at detecting the perceived origin of full-body human movement
and its propagation. Compared with previous works, a larger set of movement features
is considered, and a ground truth is produced, able to assess and compare the effectiveness
of each such feature. This is done through the use of the Shapley Value as a centrality
index. An Origin of Movement Continuum is also defined, as the basis for creating
a repository of movement qualities.
SESSION: OHT'20 Workshop
- Arjan van Hessen
- Silvia Calamai
- Henk van den Heuvel
- Stefania Scagliola
- Norah Karrouche
- Jeannine Beeken
- Louise Corti
- Christoph Draxler
Interview data is multimodal data: it consists of speech sound, facial expression
and gestures, captured in a particular situation, and containing textual information
and emotion. This workshop shows how a multidisciplinary approach may exploit the
full potential of interview data. The workshop first gives a systematic overview of
the research fields working with interview data. It then presents the speech technology
currently available to support transcribing and annotating interview data, such as
automatic speech recognition, speaker diarization, and emotion detection. Finally,
scholars who work with interview data and tools may present their work and discover
how to make use of existing technology.
SESSION: SAMIH'20 Workshop
- Claire Dussard
- Anahita Basirat
- Nacim Betrouni
- Caroline Moreau
- David Devos
- François Cabestaing
- José Rouillard
In the context of Parkinson's disease, this preliminary work aims to study the recognition
profiles of emotional faces, dynamically expressed by virtual agents in a Healthy
Control (HC) population. In this online experiment, users had to watch 56 trials of
two-second animations, showing an emotion progressively expressed by an avatar and
then indicate the recognized emotion by clicking a button. 211 participants completed
this experiment online as HC. Of the demographics variables, only age influenced negatively
recognition accuracy in HC. The intensity of the expression influenced accuracy as
well. Interaction effects between gender, emotion, intensity, and avatar gender are
also discussed. The results of four patients with Parkinson's Disease are presented
as well. Patients tended to have lower recognition accuracy than age-matched HC (59%
for age-matched HC; 45.1% for patients). Joy, sadness and fear seemed less recognized
by patients.
- Hangyu Zhou
- Yuichiro Fujimoto
- Masayuki Kanbara
- Hirokazu Kato
In this paper, factors with positive effects in the playback of virtual reality (VR)
presentation in training are discussed. To date, the effectiveness of VR public speaking
training in both anxiety reduction and skills improvement has been reported. Though
the playback using videotape is an effective way in original public speaking training,
very few researchers focused on the effectiveness and possibility of VR playback.
In this research, A VR playback system for public speaking training is proposed, and
a pilot experiment is carried out, so as to figure out the effects of the virtual
agent, immersion and public speaking anxiety level in VR playback.
- Takeshi Saga
- Hiroki Tanaka
- Hidemi Iwasaka
- Satoshi Nakamura
Although Social Skills Training is a well-known effective method to obtain appropriate
social skills during daily communication, getting such training is difficult due to
a shortage of therapists. Therefore, automatic training systems are required to ameliorate
this situation. To fairly evaluate social skills, we need an objective evaluation
method. In this paper, we utilized the second edition of the Social Responsiveness
Scale (SRS-2) as an objective evaluation metric and developed an automatic evaluation
system using linear regression with multi-modal features. We newly adopted features
including 28 audio features and BERT-based sequential similarity (seq-similarity),
which indicates how well the meaning of users remains consistent within their utterances.
We achieved a 0.35 Pearson correlation coefficient for the SRS-2's overall score prediction
and 0.60 for the social communication score prediction, which is a treatment sub-scale
score of SRS-2. This experiment shows that our system can objectively predict the
levels of social skills. Please note that we only evaluated the system on healthy
subjects since this study is still at the feasibility phase. Therefore, further evaluation
of real patients is needed in future work.
- Tanja Schneeberger
- Naomi Sauerwein
- Manuel S. Anglet
- Patrick Gebhard
Mental stress is the psychological and physiological response to a high frequency
of or continuous stressors. If prolonged and not regulated successfully, it has a
negative impact on health. Developing stress coping techniques, as an emotion regulation
strategy, is a crucial part of most therapeutic interventions. Interactive biofeedback
agents can be employed as a digital health tool for therapists to let patients train
and develop stress-coping strategies. This paper presents an interactive stress management
training system using biofeedback derived from the heart rate variability (HRV), with
an Interactive Social Agent as an autonomous biofeedback trainer. First evaluations
have shown promising results.
- Kazuhiro Shidara
- Hiroki Tanaka
- Hiroyoshi Adachi
- Daisuke Kanayama
- Yukako Sakagami
- Takashi Kudo
- Satoshi Nakamura
In cognitive behavior therapy (CBT) with a virtual agent, facial expression processing
is expected to be useful for dialogue response selection empathic dialogue. Unfortunately,
its use in current works remains limited. One reason for this situation is the lack
of research on the relationship between mood changes facial expressions through CBT-oriented
interaction. This study confirms the improvement of negative moods through interaction
with a virtual agent and identifying facial expressions that correlate with mood changes.
Based on the cognitive restructuring of CBT, we created a fixed dialogue scenario
and implemented it in a virtual agent. We recorded facial expressions during dialogues
with 23 undergraduate and graduate students, calculated 17 types of action units (AUs),
which are the units of facial movements, and performed a correlation analysis using
the change rate of mood scores and the amount of the changes in the AUs. The mean
mood improvement rate was 35%, and the mood improvements showed correlations with
AU5 (r = -0.51), AU17 (r = 0.45), AU25 (r = -0.43), and AU45 (r = 0.45). These results
imply that mood changes are reflected in facial expressions. The AUs identified in
this study have the potential to be used for agent-interaction modeling.
- Hugues Ali Mehenni
- Sofiya Kobylyanskaya
- Ioana Vasilescu
- Laurence Devillers
In this research, Nudges, indirect suggestions which can affect the behaviour and
the decision making, are considered in the context of conversational machines. A first
long term goal of this work is to build an automatic dialog system able to nudge.
A second goal is to measure the influence of nudges exerted by conversational agents
and robots on humans in order to raise the awareness of their use or misuse and open
an ethical reflection on their consequences. The study involved primary school children
which are potentially more vulnerable in front of conversational machines. The children
verbally interacted in three different setups: with a social robot, a conversational
agent and a human. Each setup includes a Dictator Game adapted to children from which
we can infer a nudge metric. First results from the Dictator Game highlight that the
conversational agent and the robot seem more influential in nudging children than
an adult. In this paper, we seek to measure whether the propensity of the children
to be nudged can be predicted from personal and overall dialog features (e.g. age,
interlocutor, etc.) and expressive behaviour located at speaker turn level (e.g. emotions,
etc.). Features are integrated into vectors, with one vector by speaker turn, which
are fed to machine learning models. The speakers' characteristics, the type of interlocutor,
objective measures at speaker turn level (latency, duration) and also measures built
to quantify the reactions to two influencing questions (open-ended and incongruous)
correlate best with the reaction to the nudging strategies.
- Kana Miyamoto
- Hiroki Tanaka
- Satoshi Nakamura
Although emotion induction using music has been studied, the emotions felt by listening
to it vary among individuals. In order to provide personalized emotion induction,
it is necessary to predict an individual's emotions and select appropriate music.
Therefore, we propose a feedback system that generates music from the continuous value
of emotion estimated from electroencephalogram (EEG). In this paper, we describe a
music generator and a method of emotion estimation from EEG to construct a feedback
system. First, we generated music by calculating parameters from the valence and arousal
values of the desired emotion. Our generated music was evaluated by crowdworkers.
The median of the correlation coefficients between the input of the music generator
and the emotions felt by the crowdworkers were valence r=0.60 and arousal r=0.76.
Next, we recorded EEG when listening to music and estimated emotions from them. We
compared three regression models: linear regression and convolutional neural network
(with/without transfer learning). We obtained the lowest RMSE (valence: 0.1807, arousal:
0.1945) between the actual and estimated emotional values with a convolutional neural
network with transfer learning.
- Enora Gabory
- Mathieu Chollet
Virtual reality has demonstrated successful outcomes for treating social anxiety disorders,
or helping to improve social skills. Some studies showed that various factors can
impact the level of participants' anxiety during public speaking. However, the influence
of sound design on this anxiety has been less investigated, and it is necessary to
study the possible impacts that it can have. In this paper, we propose a model relating
sound design concepts to presence and anxiety during virtual reality interactions,
and present a protocol of a future experimental study aimed at investigating how sound
design and in particular sound distractions can influence anxiety during public speaking
simulations in virtual environments.
- Zixiu Wu
- Rim Helaoui
- Vivek Kumar
- Diego Reforgiato Recupero
- Daniele Riboni
Empathetic response from the therapist is key to the success of clinical psychotherapy,
especially motivational interviewing. Previous work on computational modelling of
empathy in motivational interviewing has focused on offline, session-level assessment
of therapist empathy, where empathy captures all efforts that the therapistmakes to
understand theclient's perspective and convey that understanding to the client. In
this position paper, we propose a novel task of turn-level detection of client need
for empathy. Concretely, we propose to leverage pre-trained language models and empathy-related
general conversation corpora in a unique labeller-detector framework, where the labeller
automatically annotates a motivational interviewing conversation corpus with empathy
labels to train the detector that determines the need for therapist empathy. We also
lay out our strategies of extending the detector with additional-input and multi-task
setups to improve its detection and explainability.
SESSION: WoCBU'20 Workshop
Therapists, psychologists, family counselors and coaches in youth care show a clear
need for social technology support, e.g. for education, motivation and guidance of
the children. For example, the Dutch Child and Family Center explores the possibilities
of social robot assistance in their regular care pathways. This robot should address
the affective processes in the communication appropriately. Whereas there is an enormous
amount of emotion research in human-robot interaction, there is not yet a proven set
of models and methods that can be put into this practice directly. Our research aims
at a model for robot's emotional recognition and expression that is effective in the
Dutch youth care. Consequently, it has to take account of personal differences (e.g.,
child's developmental phase and mental problems) and the context (e.g., family circumstances
and therapy approach). Our study distinguishes different phases that may partially
run in parallel. First, possible solutions for affective computing by social robots
are identified to set the general design space and understand the constraints. Second,
in an exploration phase, focus group sessions are conducted to identify core features
of emotional expressions that the robot should or could process, including the context-dependencies.
Third, in the testing phase, via scenario-based design and child-robot interaction
experiments a practical model of affect processing by a social robot in youth care
is derived. This short paper provides an overview of the general approach of this
research and some preliminary results of the design space and focus group.
- Gijs A. Holleman
- Ignace T. C. Hooge
- Jorg Huijding
- Maja Deković
- Chantal Kemner
- Roy S. Hessels
Face-to-face interaction is a primary mode of human social behavior which includes
verbal and non-verbal expressions, e.g. speech, gazing, eye contact, facial displays,
and gestures (Holler & Levinson, 2019). In this study, we investigated the relation
between speech and gaze behavior during 'face-to-face' dialogues between parents and
their preadolescent children (9-11 years). 79 child-parent dyads engaged in two semi-structured
conversations about family-related topics. We used a state-of-the-art dual-eye tracking
setup (Hessels et al. 2019) that is capable of concurrently recording eye movements,
frontal video recordings, and audio from two conversational partners. Crucially, the
setup is designed in such a way that eye contact can be maintained using half-silvered
mirrors, as opposed to e.g. Skype where the camera is located above the screen. Parents
and children conversed about two different topics for five minutes each, one 'conflict'
(e.g. bedtime, homework) and one 'cooperation' (e.g. organize a party) topic. Preliminary
analyses of speech behavior (Figure 1) show that children talked more in the cooperative
task and talked less when discussing a topic of disagreement with their parents. Conversely,
parents talked more during the conflict-task and less during the cooperative-task.
The next step is to combine measures of speech and gaze to investigate the interplay
and temporal characteristics of verbal and non-verbal behavior during face-to-face
interactions.
- Sofie Vettori
- Jannes Nys
- Bart Boets
Scanning faces is important for social interactions, and maintaining good eye contact
carries significant social value. Difficulty with the social use of eye contact constitutes
one of the clinical symptoms of autism spectrum disorder (ASD). It has been suggested
that individuals with ASD look less at the eyes and more at the mouth than typically
developing individuals, possibly due to gaze aversion (Tanaka & Sung, 2016) or gaze
indifference (Chevallier et al., 2012). Eye tracking evidence for this hypothesis
is mixed (e.g. Falck-Ytter & von Hofsten, 2011; Frazier et al., 2017). Face exploration
dynamics (rather than the overall looking time to facial parts) might be altered in
ASD. Recent studies have proposed a method for scanpath modeling and classification
to capture systematic patterns diagnostic of a given class of observers and/or stimuli
(Coutrot et al., 2018). We adopted this method combining Markov Models and classification
analyses to understand face exploration dynamics in boys with ASD and typically developing
school-aged boys (N = 42). Eye tracking data were recorded while participants viewed
static faces. Faces were divided in areas of interest (AOIs) by means of limited-radius
Voronoi tessellation (LRVT) (Hessels et al., 2016). Proportional looking time analyses
show that both groups looked longer to eyes than mouth and we did not observe group
differences in fixation duration to these features. TD boys look significantly longer
to the nose while the ASD boys looked more outside the face. We modeled the temporal
dynamics of the gaze behavior using Markov Models (MMs). To determine the individual
separability of the resulting transition matrices we constructed a classification
model using linear discriminant analysis (LDA). We found that the ASD group displays
more exploratory dynamic gaze behavior as compared to the TD group, as indicated by
higher transition probabilities of moving gaze between AOIs. Based on a leave-one-out
cross validation analysis, we find an accuracy of 72%, implying that there is 72%
chance to correctly predict group membership based on the face exploration dynamics.
These results indicate that atypical eye contact in ASD might be manifested through
more frequent gaze shifting, even when total looking time to the eyes is the same.
Whereas individual accuracy is modest in this experiment, we hypothesize that when
used in more realistic paradigms (e.g. real-life interaction), this method could be
highly accurate in individual separability.
- Niilo V. Valtakari
- Ignace T. C. Hooge
- Charlotte Viktorsson
- Pär Nyström
- Terje Falck-Ytter
- Roy S. Hessels
There is a long history of interest in looking behavior during human interaction.
With the advance of (wearable) video-based eye trackers, it has become possible to
measure gaze during many different interactions, even in challenging situations, such
as during interactions between young children and their caregivers. We outline the
different types of eye-tracking setups that currently exist to investigate gaze during
interaction. The setups differ mainly with regard to the nature of the eye-tracking
signal (head- or world-centered) and the freedom of movement allowed for the participants
(see Figure 1). These crucial, yet often overlooked features place constraints on
the research questions that can be answered about human interaction. Furthermore,
recent developments in machine learning have made available the measurement of gaze
directly from video recordings, without the need for specialized eye-tracking hardware,
widening the spectrum of possible eye-tracking setups. We discuss the link between
type of eye-tracking setup and the research question being investigated, and end with
a decision tree to help researchers judge the appropriateness of specific setups (see
Figure 2).
- Heysem Kaya
- Oxana Verkholyak
- Maxim Markitantov
- Alexey Karpov
This paper investigates different fusion strategies as well as provides insights on
their effectiveness alongside standalone classifiers in the framework of paralinguistic
analysis of infant vocalizations. The combinations of such systems as Support Vector
Machines (SVM) and Extreme Learning Machines (ELM) based classifiers, as well as its
weighted kernel version are explored, training systems on different acoustic feature
representations and implementing weighted score-level fusion of the predictions. The
proposed framework is tested on INTERSPEECH ComParE-2019 Baby Sounds corpus, which
is a collection of Home Bank infant vocalization corpora annotated for five classes.
Adhering to the challenge protocol, using a single test set submission we outperform
the challenge baseline Unweighted Average Recall (UAR) score and achieve a comparable
result to the state-of-the-art.
- Elena E. Lyakso
- Olga V. Frolova
The goal of the study is to reveal the correlation between speech peculiarities and
different aspects of development of children with autism spectrum disorders. The participants
in the study were 28 children with autism spectrum disorders (ASD) aged 4-11 years
and 64 adults - listening to children's speech samples. Children with ASD were divided
into two groups: ASD-1 - ASD is the leading symptom (F84, n=17); children assigned
to ASD-2 (n=11) had other disorders accompanied by ASD symptomatology (F83 + F84).
Recording of children's speech and behavior was carried out in the most similar situations:
a dialogue with the experimenter, viewing pictures and retelling a story about them
or answers to questions, book reading. The child's psychophysiological characteristics
were estimated according to the method which includes determining the leading hemisphere
by speech (dichotic listening test - DLT), phonemic hearing, and the profile of lateral
functional asymmetry (PLFA). All tasks and the time of the study were adapted to the
child's capacities. The study analyzed the level of speech formation in 4-11 year-old
children with ASD, identified direct and indirect relationships between the features
of early development, its psychophysiological indicators, and the speech development
level at the time of the study. The ability of adults to recognize the psychoneurological
state of children via their speech is determined. The results of the study support
the need to increase focus on and understanding of the language strengths and weaknesses
in children with ASD and an individual approach to teaching children.
- Anika van der Klis
- Frans Adriaans
- Mengru Han
- René Kager
This study assesses the performance of a state-of-the-art automatic speech recognition
(ASR) system at extracting target words in two different speech registers: infant-directed
speech (IDS) and adult-directed speech (ADS). We used the Kaldi-NL ASR-service, developed
by the Dutch Foundation of Open Speech Technology. The results indicate that the accuracy
of the tool is much lower in IDS than in ADS. There are differences between IDS and
ADS which negatively affect the performance of the existing ASR system. Therefore,
new tools need to be developed for the automatic annotation of IDS. Nevertheless,
the ASR system can already find more than half of the target words, which is promising.
- Laura Boeschoten
- Irene I. van Driel
- Daniel L. Oberski
- Loes J. Pouwels
Since the introduction of social media platforms, researchers have investigated how
the use of such media affects adolescents? well-being. Thus far, findings have been
inconsistent [1, 2, 3]. The aim of our interdisciplinary project is to provide a more
thorough understanding of these inconsistencies by investigating who benefits from
social media use, who does not and why it is beneficial for one yet harmful for another
[1]. In this presentation, we explain our approach to combining social scientific
self-report data with the use of deep learning to analyze personal Instagram archives.
The implementation of the GDPR in 2018 opened up new possibilities for social media
research. Each platform is legally mandated to provide its European users with their
social media archive in digitally readable format upon request, to which all large
platforms currently comply. These data download packages (DDPs) aid in resolving three
main challenges in current research. First, the reliability of social media use self-reports
suffer from recall bias, particularly among teens [2]. Instagram DDPs provide objective,
timestamped insights in Instagram use. Second, previous research has demonstrated
that time spent on social media has no or a small relationship with well-being [3].
The Instagram DDPs are an answer to recent calls for knowledge on adolescents? specific
activities (posting, messaging, commenting) on social media [3]. Third, DDPs resolve
selectivity issues related to research making use of APIs, analyzing public content
only, while adolescent use knows an important private component [1].
In a longitudinal study, we invited we invited 388 adolescents (8th and 9th graders
of a Dutch high school, mean age = 14.11, 54% girls) to participate in a panel survey,
an experience sample (ESM) and to share their Instagram DDP at the end of both studies.
Of this group, 104 Instagram users (mean age = 14.05, 66% girls) complied to sharing
their DDP. As DDPs contain private and third-party content, data managers, ethical
committee members and privacy officers have been closely monitoring the research process.
Here, we developed a script in Python that anonymizes parts of the DDP by removing
identifiers from images, videos, and text. Other parts of the DDP are pseudomyzed,
allowing us to for example connect befriended users within the study. During the presentation,
we report on this preparation process and the validation of the anonymization and
pseudonymization script.
With the combined data-set containing panel survey results, ESM data and Instagram
DDPs, we plan to perform a number of analyses, intended to investigate both the possibilities
of using DDPs for scientific research and to investigate the well-being of adolescents.
First, we plan to investigate the representativeness of the sub-sample that complied
with sharing their DDPs. Second, we plan to generate emotional classifications and
classifications of contextual factors of the images and text found in the DDPs using
Microsoft Azure Cognitive Services and relate this to self-reported trait levels of
well-being, derived from both the survey and the ESM. Third, we plan to develop natural
language processing and computer vision algorithms using the DDP content as data and
self-reported state levels of well-being as labels. By combining deep learning and
social science, we aim to understand differences in Instagram use between adolescents
who feel happy and those who feel less happy.
The keynote will present comparative experimental data on the formation of speech
and communication skills of typically developing children and children with atypical
development - with Autism Spectrum Disorders, Down syndrome, and intellectual disabilities.
Specificity of the analysis of children's speech will be noted, databases of children's
speech and their use will be presented. The main emphasis will be placed on the reflection
in the characteristics of the voice of the pathological states of infants and children,
on the revealing biomarkers of diseases according to the features of the speech and
voice of children.
AA |
|
|
|