Still Looking at People
There is a great need for programs that can describe what people are doing from video. Among other applications, such programs could be used to search for scenes in consumer video; in surveillance applications; to support the design of buildings and of public places; to screen humans for diseases; and to build enhanced human computer interfaces.
Building such programs is difficult, because it is hard to identify and track people in video sequences, because we have no canonical vocabulary for describing what people are doing, and because phenomena such as aspect and individual variation greatly affect the appearance of what people are doing. Recent work in kinematic tracking has produced methods that can report the kinematic configuration of the body automatically, and with moderate accuracy. While it is possible to build methods that use kinematic tracks to reason about the 3D configuration of the body, and from this the activities, such methods remain relatively inaccurate. However, they have the attraction that one can build models that are generative, and that allow activities to be assembled from a set of distinct spatial and temporal components. The models themselves are learned from labelled motion capture data and are assembled in a way that makes it possible to learn very complex finite automata without estimating large numbers of parameters. The advantage of such a model is that one can search videos for examples of activities specified with a simple query language, without possessing any example of the activity sought. In this case, aspect is dealt with by explicit 3D reasoning.
An alternative approach is to model the whole problem as k-way classification into a set of known classes. This approach is much more accurate at present, but has the difficulty that we don't really know what the classes should be in general. This is because we do not know how to describe activities. Recent work in object recognition on describing unfamiliar objects suggests that activities might be described in terms of attributes --- properties that many activities share, that are easy to spot, and that are individually somewhat discriminative. Such a description would allow a useful response to an unfamiliar activity. I will sketch current progress on this agenda.
David Forsyth is a full professor at U. Illinois at Urbana-Champaign, where he moved from U.C Berkeley, where he was also full professor.
He has published over 130 papers on computer vision, computer graphics and machine learning.
He has served as program co-chair for IEEE Computer Vision and Pattern Recognition in 2000,
general co-chair for CVPR 2006, program co-chair for the European Conference on Computer Vision 2008,
and is a regular member of the program committee of all major international conferences on computer vision.
He has served four years on the SIGGRAPH program committee, and is a regular reviewer for that conference.
He has received best paper awards at the International Conference on Computer Vision and at the European Conference on Computer Vision.
He received an IEEE technical achievement award for 2005 for his research and became an IEEE fellow in 2009.
His recent textbook, "Computer Vision: A Modern Approach" (joint with J. Ponce and published by Prentice Hall)
is now widely adopted as a course text (adoptions include MIT, U. Wisconsin-Madison, UIUC, Georgia Tech and U.C. Berkeley).
Learning in and from humans: Recalibration makes (the) perfect sense
The brain receives information about the environment from all the sensory modalities, including vision, touch and audition. To efficiently interact with the environment, this information must eventually converge in the brain in order to form a reliable and accurate multimodal percept. This process is often complicated by the existence of noise at every level of signal processing, which makes the sensory information derived from the world imprecise and potentially inaccurate. There are several ways in which the nervous system may minimize the negative consequences of noise in terms of precision and accuracy. Two key strategies are to combine redundant sensory estimates and to utilize acquired knowledge about the statistical regularities of different sensory signals. In this talk, I elaborate on how these strategies may be used by the nervous system in order to obtain the best possible estimates from noisy sensory signals, such that we are able of efficiently interact with the environment. Particularly, I will focus on the learning aspects and how our perceptions are tuned to the statistical regularities of an ever-changing environment.
Marc Ernst is chair of the Cognitive Neuroscience Department and member of the CITEC cluster of Excellence at Bielefeld University, Germany. He received his Ph.D. from the Max Planck Institute for Biological Cybernetics for investigations on human visuomotor behavior. For this work he was awarded the Attempto-Prize (2000) from the University of Tübingen and the Otto-Hahn-Medaille (2001) from the Max Planck Society. After his Ph.D., he spent 2 years as a research associate at the University of California, Berkeley, USA working with Prof. Martin Banks on psychophysical experiments and computational models investigating the integration of visual-haptic information. In 2001, Marc Ernst returned to the Max Planck Institute and became principle investigator of the Sensorimotor Lab in the Department of Prof. Heinrich Bülthoff. In 2007 Marc Ernst then became leader of the Max Planck Research Group on Human Multisensory Perception and Action. In 2011 he then moved to Bielefeld.
The scientific interest of Marc Ernst is in human multisensory perception, sensorimotor integration and men-machine interaction. Marc Ernst has published over 50 papers and conference proceedings in high profile journals including Nature, Science and Nature Neuroscience. He was involved in several international collaborative grants, including several European Projects. Furthermore, Marc Ernst was coordinating the FP6 IST European Project CyberWalk, which developed an omnidirectional treadmill in order to enable natural free walking in Virtual Environments.
The Sounds of Social Life: Observing Humans in their Natural Habitat
This talk presents a novel methodology called the Electronically Activated Recorder or EAR. The EAR is a portable audio recorder that periodically records snippets of ambient sounds from participants' momentary environments. In tracking moment-to-moment ambient sounds, it yields acoustic logs of people's days as they naturally unfold. In sampling only a fraction of the time, it protects participants' privacy. As a naturalistic observation method, it provides an observer's account of daily life and is optimized for the assessment of audible aspects of social environments, behaviors, and interactions. The talk discusses the EAR method conceptually and methodologically and identifies three ways in which it can enrich research in the social and behavioral sciences. Specifically, it can (1) provide ecological, behavioral criteria that are independent of self-report, (2) calibrate psychological effects against frequencies of real-world behavior, and (3) help with the assessment of subtle and habitual behaviors that evade self-report.
Matthias Mehl is Associate Professor of Psychology and an Adjunct Associated Professor of Communication at the University of Arizona. He received his doctorate in social and personality psychology from the University of Texas at Austin. Over the last decade, he developed the Electronically Activated Recorder (EAR) as a novel methodology for the unobtrusive naturalistic observation of daily life. He has given workshops and published numerous articles on novel methods for studying daily life. Dr. Mehl is a founding member and the current Vice President of the Society for Ambulatory Assessment and co-editor of the Handbook of Research Methods for Studying Daily Life. His research has been published in various high impact journals (incl. Science, Psychological Science, Journal of Personality and Social Psychology, Psychological Assessment, and Health Psychology) and has been funded, among other sources, by the American Cancer Society and the NIH (NCI, NCCAM).