NRC-IIT - Institute for Information Technology NRC-IIT Facial Video Database

NRC-CNRC | IIT-ITI | CV Group | Perceptual Vision TechSite
  FRiV technology | Associative Neural Networks | Publications

Location: /db/video/faces/cvglab 

User agreement:
  The data from this database is freely granted for research use, provided the reference for one of the following papers is given.

  • Dmitry O. Gorodnichy  Video-based framework for face recognition in video. 
    Second Workshop on Face Processing in Video (FPiV'05) in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV'05), pp. 330-338, Victoria, BC, Canada, 9-11 May, 2005. ISBN 0-7695-2319-6. NRC 48216.  [Abstract & Pdf

  • Dmitry O. Gorodnichy, Associative neural networks as means for low-resolution video-based recognition
    International Joint Conference on Neural Networks (IJCNN'05), Montreal, Quebec, Canada, July 31-August 4, 2005. NRC 48217. [Abstract & Pdf

More about this research: FRiV Technical Website (publications).

Preface & Goal: This video-based face database has been created   in order to provide the performance evaluation criteria for the techniques developed and to be developed for face recognition in video (FRiV) and also in order to study the effect of different factors and parameters, of which there many influencing the recognition performance in the long chain from the capturing the video to saying a name of person in the video, on the recognition performance.

The Driving Premise: With the help of this and other databases we are collecting, we aim  at showing that, as humans, computers can also recognize faces in video quite adequately as long as faces have at least the resolution of 12 pixels between the eyes, which we call the nominal face resolution.

Description: this database contains pairs of short video clips each showing a face of a computer user sitting in front of the monitor exhibiting a  wide range of facial expressions and  orientations as captured by a Intel webcam mounted on the computer monitor. 
The video capture resolution is kept to 160 x 120. With the face occupying 1/4 to 1/8 of the image (measured by width), this translates into a commonly observed on a TV screen situation when a face of a TV show occupies 1/8 to 1/16 of the screen.

Setup: For this Database, two video clips of each person are taken one after another, under approximately the same illumination conditions (no sunlight, only ceiling light evenly distributed over the room), the same setup and almost the same background, for all persons in the database.
This database is thus most suited for testing  the recognition performance with respect to such inherent to video-based recognition factors as: 
- low resolution,   - motion blur, -  out-of focus factor, - facial expression variation,  - facial orientation variation,-  occlusions.

We are also in process of creating other Databases with a) the same individuals  taken in a different office (with sunlight) a few months later with the same type webcam and b) the same individuals captured by a hand held VHS video camera, which will be possible to use to test the recognition performance wrt to illumination and hairstyle variation  factors. 


  • Resolution: 160x120. 
  • Average file size: 1.5 MB  (Video : 900 KB, Audio : 600 KB - not used )
  • Average duration: 10 - 20 secs. Average total number of frames in a clip: 300.
  • Video type and compression: colour AVI, 20.0 fps (unless indicated otherwise). Intel webcam-provided codec compressed at 481 Kbps.

Number of registered individuals: 11

Available for download:

  • All 11*2=22 video sequences can be downloaded as one 28MB zip file or individually from /avi directory or by clicking on snapshots below.
  • Executable files developed for the project are available from our FRiV Technical Website.
  • Log files showing our recognition recognition results are also available. See also Readme for Results and Logs.

Snapshots and representative frame-based  association-based recognition results

In order to provide an idea of the quality/amount/uniqueness of facial data in each video pair (and also as a reference data for other face recognition techniques to be tested using this database), the representative recognition results for each person are given in the third column, as obtained by our technology. For explanation of the numbers see Remarks below.

Click on the snapshot to view the video.

ID clips used to 
memorize a person
clips used to test recognition of a person most frequent neural outcome:   0000000000
& recognition results
**** 10 | 11 | 01 (02) | 00  
228. 77/121 *

249. 131/63 *

0 | 0 | 73 (15) | 112 
237. 96/1

329. 54/0

49 | 4 | 0 | 1 
257. 176/3

339. 184/2 

175 | 0 | 3 | 8 
286. 232/2

357. 308/2
288 | 1 | 2 | 19 
353. 172/1 

404. 276/3

163| 1 | 11 | 98
78. 78/0

128. 121/1
84 | 2 | 3 | 36 
324. 231/10

353. 237/1
202 | 2 | 3 | 15 
258. 206/5 

328. 239/1
208 | 3 | 12 | 17 
346. 351/12

426. 401/1 
353 | 3 | 8  | 38 
318. 300/1

388. 273/26
191 | 8 | 30 | 62 
338. 184/81

378. 274/36

259 | 0 | 10 (17) | 24


The numbers below the images (N.Y/Z) indicate the number of frames in clips (N) and number of those of them where 1 face-looking (Y) region  or more (Z) was/were detected  using the Haar-like features filters available from Intel OpenCV. Note that the face (or face looking region) is not detected in every frame. Also note that sometimes more than one face is detected, i.e. part of a scene is erroneously detected as a face. For the results presented we made sure, using video processing techniques based on motion and colour tracking, that these false "faces" are not taken into account. 

* There are many falsely detected "faces" (in shelf area) in clips of ID=0. Therefore, these clips are  used to test the performance on an unknown face and false "faces".
** In some video clips  (e.g. ID 10) "false" faces are not completely eliminated and/or faces show extremely challenging for recognition orientation, which lowers the individual-frame based recognition rate,
*** The final recognition decision is not based on the individual frame but on a sequence of them. Thus the actual recognition rate on an entire video clip is much higher than that based on any individual frame of the clip (e.g. this log file and ReadMe-logs).
**** The numbers in third column  show  the total number of single-frame based (i.e. image-based) results, i.e. results obtained on each individual video frame without taking into account the time domain.   For each video pair, the following following 5 statistics, indicated as S10 S11 S01 S00 and S02, derived from the neuro-biological associative memory based treatment of the recognition process,  are computed.

10 -  The numbers of frames in 2nd video clip of the pair,  when the face in a frame is associated with the correct person (i.e. the one seen in the 1st video clip of the pair), without any association with other seen persons. - best (non-hesitant) case 11 - ... when the face is not associated with one individual, but rather with several individuals, one of which is the correct one. - good (hesitating) case 01 - ... when  the face is associated with someone else - worst case

02 - ... when the face is associated with several individuals (none of which is correct) -  wrong but hesitating case

00 - ... when  the face is not associated with any of the seen faces - not bad case
S10: neuron corresponding to the correct person's ID fired (+1), while all other neurons remained at rest (-1). This is the best case performance: no hesitation in saying the person's name from a single video frame. S11: neuron corresponding to the correct person's ID fired (+1), but there were some other neurons (at least one) which also fired. This "hesitating" performance can also be considered good, as it can be taken into account whn  making the final decision based on the average (or majority) of decisions over several video frames. This result can also be used to disregard the frame as "confusing". S01: neuron corresponding to the correct person's ID did not fire (-1), while another name tag neuron corresponding to a different person fired (+1). This is the worst case result. It however is  not always bad either. First , when this happens often there are other neurons  which fire too, indicating the inconsistent decision - this is denoted as S02 result. Second, unless  this results persists within several consecutive frames (which in most cases it is not - see the log file) it can also be identified as invalid result and thus be discarded. S00: none of the name tag neurons fired. This result can also be considered as a good one, as it indicates that the network does not recognize anybody. This is, in fact, what we want the network to produce when it examines a face which has not been previously seen or when it examines part of the video image which has been erroneously classified as a face by the video processing modules, which, as our experiments show, happens too. (
Created: 10.XII.2004. Last updated: 2005-XII-6
Copyright © 20
Computational Video Group
 Project Leader: Dmitry Gorodnichy