Challenges
Developing systems that can robustly understand human-human
communication or respond to human input requires identifying the
best algorithms and their failure modes. In fields such as computer
vision, speech recognition, and computational linguistics, the
availability of datasets and common tasks have led to great
progress. This year we invited the ICMI community to collectively
define and tackle scientific Grand Challenges in multimodal
interaction for the next 5 years. We received a good response to
the call, and we are hosting four Challenge events at the ICMI 2013
conference. Multimodal Grand Challenges are driven by ideas that are
bold, innovative, and inclusive. We hope they will inspire new ideas
in the ICMI community and create momentum for future collaborative
work.
Multimodal Grand Challenge Chairs
Jean-Marc Odobez
IDIAP Research Institute, Switzerland
Vidhyasaharan Sethu
The University of New South Wales, Australia
1st International Challenge for Multimodal Mid-Air Gesture Recognition for Close HCI : ChAirGest 2013
The ChAirGest challenge is a research oriented competition designed
to compare multimodal gesture recognizers. The data provided in the
challenge has been recorded from multiple sensors to optimize
methods for gesture spotting and recognition. A common benchmark
tool will be used to compare quantitatively the various algorithms
submitted under strictly comparable conditions. The proposed
algorithms can use any combination of the data types available in
the dataset.
The provided data come from one Kinect camera and 4 Inertial Motion
Units (IMU) attached to the right arm and neck of the subject. The
dataset contains 10 different gestures, started from 3 different
resting postures and recorded in two different lighting conditions
by 10 different subjects. Thus, the total dataset contains 1200
annotated gestures split in continuous video sequences containing a
variable number of gestures. The goal of the challenge and related
workshop is to promote research on methods using multimodal data to
spot and recognize gestures in the context of close human-computer
interaction. However, many side-goals and research paths may be
explored with the provided dataset; notably sensor fusion or
enhancement of a single sensor recognition using multi-sensory
information. Many different research challenges need to be tackled
in this domain. Participants are given the opportunity to submit
short papers and share their innovative findings concerning their
research during the one-day workshop; these papers will be published
alongside the main ICMI proceedings in ACM Digital Library.
Please visit our website for more
information https://project.eia-fr.ch/chairgest/
Please do not hesitate to use our discussion group for questions and remarks: https://groups.google.com/forum/?fromgroups=#!forum/chairgest
Organizers
- Simon Ruffieux
- Department of Information and Telecommunication Technology, University of Applied Sciences of Western Switzerland.
- Denis Lalanne
- Department of Informatics, University of Fribourg, Switzerland.
- Elena Mugellini
- Department of Information and Telecommunication Technology, University of Applied Sciences of Western Switzerland.
- Daniel Roggen
- Department of Information Technology and Electrical Engineering, Swiss Federal Institute of Technology, Switzerland.
- Stefano Carrino
- Department of Information and Telecommunication Technology, University of Applied Sciences of Western Switzerland.
Important dates
Release of development data |
May 1st, 2013 |
Program executable & short-paper submission |
August 21st, 2013 |
Notification of acceptance |
September 15th, 2013 |
Camera-ready paper |
October 1st, 2013 |
Contacts
Multimodal Conversational Analytics
The MCA challenge concerns the multimodal analysis of primary cues
and qualities of conversations. It proposes to set up the basis for
comparison, analysis, and further improvement of multimodal data
annotations and multimodal interactive systems which are important
in building various multimodal applications. Such machine
learning-based challenges do not exist in the Multimodal Interaction
community, and by focusing on the elaboration of algorithms and
techniques on shared data sets, we aim to foster the research and
development of multimodal interactive systems.
Please visit our website for more
information http://mca.webconf.inrialpes.fr/
Organizers
- Xavier Alameda-Pineda
- INRIA Grenoble Rhône-Alpes, University of Grenoble, France
- Roman Bednarik
- University of Eastern Finland, Finland
- Kristiina Jokinen
- University of Helsinki, Finland
- Michal Hradis
- Brno University of Technology, Czech republic
Important dates
Paper deadline |
July 15th, 2013 |
Author notification |
September 1st, 2013 |
Camera-ready paper |
October 1st, 2013 |
Contact
ChaLearn Challenge and Workshop on Multi-modal Gesture Recognition
ChaLearn organizes in 2013 a challenge and workshop on multi-modal
gesture recognition from 2D and 3D video data using Kinect, in
conjunction with ICMI 2013, December 9-13, Sidney, Australia. Kinect
is revolutionizing the field of gesture recognition given the set of
input data modalities it provides, including RGB image, depth image
(using an infrared sensor), and audio. Gesture recognition is
genuinely important in many multi-modal interaction and computer
vision applications, including image/video indexing, video
surveillance, computer interfaces, and gaming. It also provides
excellent benchmarks for algorithms. The recognition of continuous,
natural signing is very challenging due to the multimodal nature of
the visual cues (e.g., movements of fingers and lips, facial
expressions, body pose), as well as technical limitations such as
spatial and temporal resolution and unreliable depth cues. The
workshop is devoted to the presentation of most recent and
challenging techniques from multi-modal gesture recognition. The
committee encourages paper submissions in the following topics (but
not limited to):
- Multi-modal descriptors for gesture recognition
- Fusion strategies for gesture recognition
- Multi-modal learning for gesture recognition
- Data sets and evaluation protocols for multi-modal gesture recognition
- Applications of multi-modal gesture recognition
The results of the challenge will be discussed at the workshop. It
features a quantitative evaluation of automatic gesture recognition
from a multi-modal dataset recorded with Kinect (providing RGB
images of face and body, depth images of face and body, skeleton
information, joint orientation and audio sources), including about
20,000 gestures from several users. The gestures are drawn from
different gesture vocabularies from very diverse domains. The
emphasis of the competition is on multi-modal automatic learning of
vocabularies of gestures performed by several different users, with
the aim of performing user independent continuous gesture
recognition.
Additionally, the challenge includes a live competition of
demos/systems of applications based on multi-modal gesture
recognition techniques. Demos using data from different modalities
and different kind of devices are welcome. The demos will be
evaluated in terms of multi-modality, technical quality, and
applicability.
Best workshop papers and top three ranked participants of the
quantitative evaluation will be invited to present their work at
ICMI 2013 and their papers will be published in the
proceedings. Additionally, there will be travel grants (based on
availability) and the possibility to be invited to present extended
versions of their works to a special issue in a high impact factor
journal. Moreover, all three top ranking participants in the
quantitative challenge will be awarded with a ChaLearn winner
certificate and an economic prize (based on availability). We will
also announce a best paper and best student paper awards among the
workshop contributions.
The ChaLearn Challenge organisers have negotiated a Special Topic
on Gesture Recognition call for papers with the Journal of Machine
Learning Research. More details can be found in
this downloadable call for
papers.
Please visit our website for more
information https://sites.google.com/a/chalearn.org/gesturechallenge/
Organizers
- Sergio Escalera
- Computer Vision Center (UAB) and University of Barcelona, Spain
- Jordi Gonzàlez
- Universitat Autònoma de Barcelona & Computer Vision Center, Spain
- Isabelle Guyon
- Clopinet, Berkeley, California, USA
- Thomas B. Moeslund
- Aalborg University, Denmark
- Oscar Lopes
- Computer Vision Center (UAB), Spain
- Miguel Reyes
- Computer Vision Center (UAB) and University of Barcelona, Spain
- Xavier Baró
- Computer Vision Center and Universitat Oberta de Catalunya, Spain
- Vassilis Athitsos
- University of Texas, USA
- Pat Jangyodsuk
- University of Texas, USA
- Hugo Jair Escalante
- INAOE, Puebla, Mexico
- Aaron Negrín
- University of Barcelona, Spain
Important dates
Quantitative Challenge
Beginning of the quantitative competition, release of the first data examples |
April 30th, 2013 |
Full release of development and validation data |
May 25th, 2013 |
Release of validation data |
June 3rd, 2013 |
Release of final evaluation data |
August 1st, 2013 |
Release of final evaluation data decryption key |
August 10th, 2013 |
End of the quantitative competition. Deadline for code submission and the prediction results on final evaluation data. The organizers start the code verification by running it on the final evaluation data |
August 20th, 2013 |
Deadline for submitting the fact sheets summarizing proposed methods |
August 25th, 2013 |
Release of the verification results to the participants for review |
September 1st, 2013 |
Workshop
Workshop paper submission deadline (top three ranked quantitative and qualitative participants will be invited to submit their contribution as a paper submission to the workshop) |
September 15th, 2013 |
Notification of workshop paper acceptance |
September 30th, 2013 |
Camera ready of workshop papers |
October 7th, 2013 |
Contact Persons
Emotion Recognition In The Wild Challenge and Workshop (EmotiW)
The Emotion Recognition In The Wild Challenge and Workshop (EmotiW)
2013 Grand Challenge consists of an audio-video based emotion
classification challenges, which mimics real-world
conditions. Traditionally, emotion recognition has been performed on
laboratory controlled data. While undoubtedly worthwhile at the
time, such lab controlled data poorly represents the environment and
conditions faced in real-world situations. With the increase in the
number of video clips online, it is worthwhile to explore the
performance of emotion recognition methods that work 'in the
wild'. The goal of this Grand Challenge is to define a common
platform for evaluation of emotion recognition methods in real-world
conditions.
The database in the 2013 challenge is the Acted Facial Expression
In Wild (AFEW), which has been collected from movies showing
close-to-real-world conditions. Three sets for training, validation
and testing will be made available.
Please visit our website for more
information http://cs.anu.edu.au/few
Organizers
- Abhinav Dhall
- Australian National University
- Roland Goecke
- University of Canberra / Australian National University
- Jyoti Joshi
- University of Canberra
- Michael Wagner
- University of Canberra / Australian National University
- Tom Gedeon
- Australian National University
Important dates
Training and validation data available |
March 20th, 2013 |
Testing data available |
June 30th, 2013 |
Paper submission deadline |
July 25th, 2013 |
Notification of acceptance |
August 30th, 2013 |
Camera ready paper |
September 15th, 2013 |
Contacts
Multimodal Learning Analytics (MMLA)
Multimodal learning analytics, learning analytics, and educational
data mining are emerging disciplines concerned with developing
techniques to more deeply explore unique data in learning
settings. They also use the results based on these analyses to
understand how students learn. Among other things, this includes how
they communicate, collaborate, and use digital and non-digital tools
during learning activities, and the impact of these interactions on
developing new skills and constructing knowledge. Advances in
learning analytics are expected to contribute new empirical
findings, theories, methods, and metrics for understanding how
students learn. It also can contribute to improving pedagogical
support for students' learning through new digital tools, teaching
strategies, and curricula. The most recent direction within this
area is multimodal learning analytics, which emphasizes the analysis
of natural rich modalities of communication during situated
interpersonal and computer-mediated learning activities. This
includes students' speech, writing, and nonverbal interaction (e.g.,
gestures, facial expressions, gaze, sentiment. The First
International Conference on Multimodal Learning Analytics
(http://tltl.stanford.edu/mla2012)
represented the first intellectual gathering of multidisciplinary
scientists interested in this new topic.
Please visit our website for more information: http://tltl.stanford.edu/mla2013
Important dates
Distribution of workshop announcement to email lists |
May 15th, 2013 |
MMLA database available for grand challenge participants |
June 15th, 2013 |
Paper submission deadline (extended) |
August 30th, 2013 |
Notification of acceptance |
September 15th, 2013 |
Camera-ready papers due |
October 8th, 2013 |
Workshop event |
December 9th, 2013 |
Grand Challenge Workshop and Participation Levels
The Second International Workshop on Multimodal Learning Analytics
will bring together researchers in multimodal interaction and
systems, cognitive and learning sciences, educational technologies,
and related areas to advance research on multimodal learning
analytics. Following the First International Workshop on Multimodal
Learning Analytics in Santa Monica in 2012, this second workshop
will be organized as a data-driven "Grand Challenge" event, to be
held at ICMI 2013 in Sydney Australia on December 9th of 2013. There
will be three levels of workshop participation, including attendees
who wish to:
- Participate in grand challenge dataset competition and report
results (using your own dataset, or the Math Data Corpus described
below which is available to access)
- Submit an independent research paper on MMLA, including
learning-oriented behaviors related to the development of domain
expertise, prediction techniques, data resources, and other topics
- Observe and discuss new topics and challenges in MMLA with other
attendees, for which a position paper should be submitted
For those wishing to participate in the competition using the Math
Data Corpus, they will be asked to contact the workshop organizers
and sign a "collaborator agreement" for IRB purposes to access the
dataset (see data corpus section). The dataset used for the
competition is well structured to support investigating different
aspects of multimodal learning analytics. It involves high school
students collaborating while solving mathematics problems.
The dataset will be available for a six-month period so researchers
can participate in the competition. The competition will involve
identifying one or more factors and demonstrating that they can
predict domain expertise: (1) with high reliability, and (2) as
early in a session as possible. Researchers will be asked to
accurately identify: (1) which of three students in each session is
the dominant domain expert, and (2) which of 16 problems in each
session is solved correctly versus incorrectly using their
predictor(s).
Available Data Corpus and Multimodal Analysis Tools
Existing Dataset: A data corpus is available for analysis during
the multimodal learning analytics competition. It involves 12
sessions, with small groups of three students collaborating while
solving mathematics problems (i.e., geometry, algebra). Data were
collected on their natural multimodal communication and activity
patterns during these problem-solving and peer tutoring sessions,
including students' speech, digital pen input, facial expressions,
and physical movements. In total, approximately 15-18 hours of
multimodal data is available during these situated problem-solving
sessions.
Participants were 18 high-school students, including 3-person male
and female groups. Each group of three students met for two
sessions. These student groups varied in performance
characteristics, with some low-to-moderate performers and others
high-performing students. During the sessions, students were engaged
in authentic problem solving and peer tutoring as they worked on 16
mathematics problems, four apiece representing easy, moderate, hard,
and very hard difficulty levels. Each problem had a canonical
correct answer. Students were motivated to solve problems correctly,
because one student was randomly called upon to explain the answer
after solving it. During each session, natural multimodal data were
captured from 12 independent audio, visual, and pen signal
streams. These included high-fidelity: (1) close-up camera views of
each student while working, showing the face and hand movements
while working at the table (waist up view), as well as a wide-angle
view for context and another top-down view of students' writing and
artifacts on the table; (2) close-talking microphone capture of each
students' speech, and a room microphone for recording group
discussion; (3) digital pen input for each student, who used an
Anoto-based digital pen and large sheet of digital paper for
streaming written input. Software was developed for accurate time
synchronization of all twelve of these media streams during
collection and playback. The data have been segmented by start and
end time of each problem, scored for solution correctness, and also
scored for which student solved the problem correctly. The data
available for analysis includes students':
- Speech signals
- Digital pen signals
- Video signals showing activity patterns (e.g., gestures, facial
expressions)
In addition, for each student group one session of digital pen data
has been coded for written representations, including (1) type of
written representation (e.g., linguistic, symbolic, numeric,
diagrammatic, marking), (2) meaning of representation, (3) start/end
time of each representation, and (4) presence of written
disfluencies. Note that lexical transcriptions of speech will not be
available with the dataset. But people are free to complete
transcriptions if they want to analyze the content.
Organizers
- Dr. Stefan Scherer
- USC Institute for Creative Technologies
- Dr. Nadir Weibel
- Department of Computer Science and Engineering
- Marcelo Worsley
- Transformative Learning Technologies Lab
- Dr. Louis-Philippe Morency
- USC Institute for Creative Technologies
- Dr. Sharon Oviatt
- President & Research Director, Incaa Designs Nonprofit
Contact
|