See also:

Other Tutorials

Recognition in Video
by Dr. Dmitry O. Gorodnichy 
National Research Council Canada

Canadian Conference on Electrical and Computer Engineering (CCECE'06)
 May 7 to 10, 2006
Ottawa Congress Centre, Ottawa, Canada. 

When: Sunday, May 7, 8:30 AM-12:00 PM
Where: Room SITE E0126, University of Ottawa

For official CCECE '05 tutorial site, visit
For registration information, visit 
(NB: Early registration deadline - April 7, 2006)

This page is designed for those who are interested in attending the tutorial and 
would like to take the benefit of getting prepared for it in advance.

Below you will find the detailed outline of the tutorial and the related additional information.



Video processing is no longer a prerogative of a few. With video-cameras now affordable, computers powerful enough to process video in real-time, and a large pool of Open Source video libraries available for everybody, it is now possible practically for anybody with basic engineering skill to create a video processing system. The variety of applications for video processing is just as impressive: from surveillance, video coding and annotation to computer-human interaction and collaborative environments to multi-media, video conferencing and computer games, to name a few. Building a vision system for a specific application however is a big challenge, because of the complexity and the versatility of the video recognition problem inherent to all video processing tasks, regardless of whether itís tracking of objects, recognition of events or person identification. While for humans it is very easy to give a meaning to observed visual stimuli, for computers it is not, and engineering skills alone may not be sufficient to resolve the problem. This tutorial is aimed at providing a compressive background to the problem of recognition in video and presenting an overview of the available solutions to this problem. Both live demonstrations and simple video coding techniques will be shown. By the end of the tutorial, the audience will be able to create their own perceptual vision system using a web-cam in a Windows environment.

Keywords: video processing, video analysis, content extraction, intelligent video.

Outline of the tutorial: 

Part I. Getting the background. How we (human visual system) do it.

  1. What makes video processing special:
    a) Highly demanded, many open problems - Overview of various applications: from security, to multi-media, to computer-human interaction, to video annotation and video-based search. 
    b) No longer prerogative a few - Anybody can do it now. Introduction to Open Source Intel OpenCV library.
    c) Critical differences from image processing: low-resolution and low-quality,
    d) Real-time nature of video - constraint or advantage?
    e) Another critical difference and advantage: cues from biological vision. 

  2. From video input to recognition results output
    a) Video recognition as intersection of Computer Vision, Machine Learning/Pattern Recognition and Neurobiology.
    b) Four fundamental video recognition problems: Detection, Tracking, Memorization & Association 
    c) Hierarchy of video processing tasks.
    d) Four modalities of video information: motion, colour, luminance (orientational and spatial frequency) and d) disparity. Their suitability for video recognition problems.

  3. Live hands-on examples: Examining the tasks, modalities and the issues.
    a) "Hello world - I see you" example using Intel OpenCV video capture and processing library.
    b) Processing TV recordings. 
    c) On video formats, sizes & quality: webcams vs. firewire cam, VCD vs. DVD, NTSC vs. PAL.

  4. Main results from Neurobiology. (Video processing and recognition by biological vision systems)
    1. Dealing with low resolution/quality, video processing mechanisms:
    a) Visual pathway in brain (From retina, to visual cortex, to hippocampus)
    b) Attention-based vision, salience detection mechanisms, Accumulation over time.
    c) Separations into channels, where vs what processing, top-down vs bottom-up processing
    d) Visual illusions
    2. Recognition in brain. Associative recall and thinking: from neurons via synapses to memories

  5. Main results from Associative Learning and Recognition
    a) Coding memories using neural networks, 
    b) Learning Rules for tuning inter-neuronal synapses. 
    c) Open source High-Performance Associative Neural Network library.

Part II. How to make computers do it. Building your own Perceptual Vision System.

  1. Using Motion: Main results and applications.
    a) Statistics-based Background computation (inc. Mixtures of Gaussians)
    b) Change vs. motion detection. Illumination invariant change detection (inc. non-linear change detection)
    c) Motion history and patterns

  2. Using Colour: Main results and applications.
    a) Segmentation (inc. histeresis-, statistics- based) 
    b) Histogram-based tracking (inc. MeanShift)
    c) Special consideration: Skin detection 

  3. Using Intensity: Main results and applications.
    a) Detection of salient features: edges, corners (inc. histeresis-based)
    b) Detection of orientational and frequency saliencies (inc. Gabor filters simulating visual cortex processing)
    c) Using shape-from-shading information

  4. Using depth/stereo-vision (briefly: references only)

  5. Image processing techniques.
    a) Manipulations with blobs: classification, grouping
    b) Deformable templates 

  6. Accumulation of information over time:
    a) Histograms and Evidence accumulation (inc. Dempster-Shafer evidence theory)
    b) Probabilistic framework
    c) Correction learning framework (inc. associative neural network)

  7. Special consideration: faces in video
    a) Four basic face processing tasks: detection, tracking, classification & identification. 
    b) FaceRec from video vs FaceRec from photos, Nominal face resolution.
    c) Overview of solutions 

  8. Putting it all together: Building Perceptual Vision Systems (computer vision systems with recognition capability) to tailor your needs. 

    a) General framework: detection->tracking->classification, motion->colour->intensity
    b) Spatio-temporal decisions (accumulation over time and space)
    c) Doing it with Intel OpenCV library
    d) Overview of other Open Source Video Processing libraries and datasets
    e) On testing and benchmarks, using your films collection.

  9.  Examples: NRC-developped technologies:
    a) Perceptual Vision Interface Nouse 
    b) (Multicamera, multi-person, robust, back-) tracking of faces
    c) Face annotation, recognition of faces in TV recordings
    d) Pianist hands/fingers recognition

  10. Concluding remarks: 
    a) Areas for future development
    b) References


  • Acknowledgement of authorship is required when using the slides for this tutorial: slides.
  • For full set of slides, contact the author. 
Dmitry Gorodnichy, Copyright © 2006
Last updated: 2007-III-6
What is Video Recognition?

2001-2009 © Dmitry O. Gorodnichy