Recognition in Video
by Dr. Dmitry O. Gorodnichy
National Research Council Canada
Canadian Conference on Electrical and Computer Engineering (CCECE'06)
May 7 to 10, 2006
Ottawa Congress Centre, Ottawa, Canada.
May 7, 8:30 AM-12:00 PM
Where: Room SITE E0126, University of Ottawa
For registration information, visit http://www.ieee.ca/ccece06/registration.html
(NB: Early registration deadline - April 7, 2006)
This page is designed for those who are
interested in attending the tutorial and
would like to take the benefit of getting prepared for it in advance.
Below you will find the detailed outline of the
tutorial and the related additional information.
Tutorial is designed to be beneficial to both
beginners and researchers already familiar with video processing.
Each tutorial attendee will receive a printed
hand-out of tutorial materials.
If you have a webcam and Visual Studio C++
installed on your laptop, you may take it to the class. You will have
to download the following free software to be able to write a video
Video processing is no longer a prerogative of
a few. With video-cameras now affordable, computers powerful enough to process
video in real-time, and a large pool of Open Source video libraries available
for everybody, it is now possible practically for anybody with basic
engineering skill to create a video processing system. The variety of
applications for video processing is just as impressive: from surveillance,
video coding and annotation to computer-human interaction and collaborative
environments to multi-media, video conferencing and computer games, to name a
few. Building a vision system for a specific application however is a big
challenge, because of the complexity and the versatility of the video
recognition problem inherent to all video processing tasks, regardless of
whether itís tracking of objects, recognition of events or person
identification. While for humans it is very easy to give a meaning to observed
visual stimuli, for computers it is not, and engineering skills alone may not
be sufficient to resolve the problem. This tutorial is aimed at providing a
compressive background to the problem of recognition in video and presenting
an overview of the available solutions to this problem. Both live
demonstrations and simple video coding techniques will be shown. By the end of
the tutorial, the audience will be able to create their own perceptual vision
system using a web-cam in a Windows environment.
Keywords: video processing, video
analysis, content extraction, intelligent video.
Outline of the tutorial:
Part I. Getting the background. How we (human
visual system) do it.
What makes video processing special:
a) Highly demanded, many open problems - Overview of various applications:
from security, to multi-media, to computer-human interaction, to video
annotation and video-based search.
b) No longer prerogative a few - Anybody can do it now. Introduction to Open
Source Intel OpenCV library.
c) Critical differences from image processing: low-resolution and
d) Real-time nature of video - constraint or advantage?
e) Another critical difference and advantage: cues from biological vision.
From video input to recognition results
a) Video recognition as intersection of Computer Vision, Machine
Learning/Pattern Recognition and Neurobiology.
b) Four fundamental video recognition problems: Detection, Tracking,
Memorization & Association
c) Hierarchy of video processing tasks.
d) Four modalities of video information: motion, colour, luminance (orientational
and spatial frequency) and d) disparity. Their suitability for video
Live hands-on examples: Examining the tasks,
modalities and the issues.
a) "Hello world - I see you" example using Intel OpenCV video
capture and processing library.
b) Processing TV recordings.
c) On video formats, sizes & quality: webcams vs. firewire cam, VCD vs.
DVD, NTSC vs. PAL.
Main results from Neurobiology. (Video
processing and recognition by biological vision systems)
1. Dealing with low resolution/quality, video processing mechanisms:
a) Visual pathway in brain (From retina, to visual cortex, to hippocampus)
b) Attention-based vision, salience detection mechanisms, Accumulation over
c) Separations into channels, where vs what processing, top-down vs
d) Visual illusions
2. Recognition in brain. Associative recall and thinking: from neurons via
synapses to memories
Main results from Associative Learning and
a) Coding memories using neural networks,
b) Learning Rules for tuning inter-neuronal synapses.
c) Open source High-Performance Associative Neural Network library.
Part II. How to make computers do it. Building your own Perceptual Vision
Using Motion: Main results and applications.
a) Statistics-based Background computation (inc. Mixtures of Gaussians)
b) Change vs. motion detection. Illumination invariant change detection
(inc. non-linear change detection)
c) Motion history and patterns
Using Colour: Main results and applications.
a) Segmentation (inc. histeresis-, statistics- based)
b) Histogram-based tracking (inc. MeanShift)
c) Special consideration: Skin detection
Using Intensity: Main results and
a) Detection of salient features: edges, corners (inc. histeresis-based)
b) Detection of orientational and frequency saliencies (inc. Gabor filters
simulating visual cortex processing)
c) Using shape-from-shading information
Using depth/stereo-vision (briefly:
Image processing techniques.
a) Manipulations with blobs: classification, grouping
b) Deformable templates
Accumulation of information over time:
a) Histograms and Evidence accumulation (inc. Dempster-Shafer evidence
b) Probabilistic framework
c) Correction learning framework (inc. associative neural network)
Special consideration: faces in video
a) Four basic face processing tasks: detection, tracking, classification
b) FaceRec from video vs FaceRec from photos, Nominal face resolution.
c) Overview of solutions
Putting it all together: Building Perceptual
Vision Systems (computer vision systems with recognition capability) to
tailor your needs.
a) General framework: detection->tracking->classification,
b) Spatio-temporal decisions (accumulation over time and space)
c) Doing it with Intel OpenCV library
d) Overview of other Open Source Video Processing libraries and datasets
e) On testing and benchmarks, using your films collection.
Examples: NRC-developped technologies:
a) Perceptual Vision Interface Nouse
b) (Multicamera, multi-person, robust, back-) tracking of faces
c) Face annotation, recognition of faces in TV recordings
d) Pianist hands/fingers recognition
a) Areas for future development
- Acknowledgement of authorship is required when using the
slides for this tutorial: slides.
- For full set of slides, contact the author.
Dmitry Gorodnichy, Copyright © 2006
Last updated: 2007-III-6