Associative Memory 
for Perceptual Vision Technology


www.perceptual-vision.com/memory
(moved from www.cv.iit.nrc.ca/research/Nouse)
This  page presents an example on how to use the  Pseudo-inverse Associative Neural Network, the description and the CPP source code of which is given at http://www.cv.iit.nrc.ca/~dmitry/pinn, for real-time (on-fly) memorization and recognition from video, as  described in the paper:

Using associative memory principles to enhance perceptual ability of vision systems (by Dmitry O. Gorodnichy et al)
Presented at the First IEEE CVPR Workshop on Face Processing in Video (FPIV'04), Washington DC, June 28, 2004

Abstract: The so called associative thinking, which humans are known to perform on every day basis, is attributed to the fact that human brain memorizes information using the dynamical system made of interconnected neurons. Retrieval of information in such a system is accomplished in associative sense; starting from an arbitrary state, which might be an encoded representation of a visual image, the brain activity converges to another state, which is stable and which is what the brain remembers. In this paper we explore the possibility of using an associative memory for the purpose of enhancing the interactive capability of perceptual vision systems. By following the biological memory principles, we show how vision systems can be designed to recognize faces, facial gestures and orientations, using low-end video-cameras and little computational power. In doing that we use the public domain associative memory code.

Paper: pdf, Talk slides: 2.7Mb

  • The gist of this approach: The binary representations of faces are stored as global attractors of the binary fully connected neural network. Starting from an arbitrary (unseen) state, such as a new video image, the network converges to one of those attractors. I.e. attractors represent the memories; there are as many attractors, as faces shown in training stage. NB: for a network on size N, there only about 50%N good attractors. 
    Another, more biologically justified approach and which does not limit the number of presented training images is to have extra few neurons, the states of which are used to encode different faces (or classes). See  Further Developments below.


  • The videos of the experiments described in the paper: (also downloadable from this AVI directory)

Demo 1: Memorizing/recognizing user's face orientation: demo-fr-rot-diff-lighting-3fps.avi
Demo 2: Memorizing/recognizing user's facial expressions: memorizing-expressions-2fps.avi
Demo 3: Memorizing/recognizing user identities: demo-fr-m-d-2fps.avi

A few other recoded videos:

Demo 4: Shows how to memorize a new face (expressions):
memorizing-A-10rot.avi 

Demo 5: Several runs with 30 faces shown at left (stored in this directory and loaded from this face-names.txt file). Last four pictures are from the photograph:
 rot-exp-id-30.avi,   rot-exp-id-31.avi

  • A simple program which you can use to test  the technology yourself:  (also downloadable from this BIN directory)

      video-memory-may04.exe  

    In order to run this program on your PC, you only need to have a web-cam and the following .dll files: CV, cvaux, ilp, plpx, Msvcrtd  (downloadable from here)  put in the directory from where you run the program. 

    Description of the program:

  1. It runs in either 0) Memorize or 1) Recognize mode. What is memorized is determined by the Video channels selected. In the current version, the Luminance channel is used only - the program memorizes  the transformed to the canonical 24x24 representation (described in paper) faces detected by Haar-like wavelets using OpenCV library. Colour and Motion channels are included for completeness only.

  2. Faces can be taken either from 0) video (by selecting Video Source Device as shown in the figure) or 1) hard-drive (from the location specified in face-names.txt file which is read by default at the start of the program or from the file selected though the menu). After a face is memorized from video, the program switches automatically to recognition mode, so that not to get saturated.

  3. The black-n-white image at right shows the contents of the memory (as described in the paper). To the the entire 574x574 synaptic weight matrix select View-> Video Stream-> Extended Memory Contents
    At any point, you can clear the memory by checking "Clear Memory!"

  4. To store faces from video on hard-drive (so that you can make your own list faces, select View-> Video Stream-> Trace Mode

  5. The result of recognition is shown as such:
    - The face (out of all stored) closest to the attractor into which the network converged is  shown in read. 
    - The result which is consistent over time is shown also as Response, where the  number of video frames used in consistency verification is set by Temporal filter slider.
    - The Hemming distances from the Response image to the original (Stimulus) image and the attractor-converged image is shown as Correction ratio. The number of network iterations before reaching an attractor is shown in brackets.
    - 30(#2) refers to the description of the response image as given in face-names.txt file, such a person's name or code for facial expression/orientation, (2) - means consistent over 2 frames results.


 Copyright © 2004  IIT, NRC
 Project Leader: Dmitry O. Gorodnichy. Email for sending comments: memory@perceptual-vision.com