Go to Dmitry Gorodnichy site Video Annotation of Piano Playing

 NRC-CNRC | IIT-ITI | Computational Video Group | Perceptual Vision TechWeb
Introduction | Demos
Presentations | Testing Sequences | Publications


In Partnership with Piano Pedagogy Lab of the Music Department of the University of Ottawa

Introductory Video:

Click here if video does not start automatically.
It is also available on YouTube from here.

Three music pieces of increasing complexity are played. 
A camera observes from above and the following is performed by the C-MIDI program:

  • Piano and piano keys are automatically detected.
  • Pianist hands are detected and tracked.
  • The edges of the fingers are detected
  • The fingers and hands that pressed the keys are marked: in colour (hands), by number (fingers).


Video recognition for piano playing: New application

Video-conferencing (VC) for distant piano learning. A conventional session includes the transmission of a video image only. Video recognition technology allows one to transmit also the annotated video image.

From [1]: "Current music recording and transmitting technology allows teachers to teach piano remotely. This is in many cases the only way to teach music, especially in rural or distant areas where the ratio of piano teachers to piano students is extremely low. MIDI recording technology allows a teacher to play a piano at one place and to see a piano played by itself, as by an ``invisible teacher", at another place: the piano keys are pressed exactly at the same place, velocity and duration on a remote piano. However, to know how these keys were played by a teacher remains unknown. This includes the knowledge of which hand played a key, which finger was used, and who (in case of a four hand musical piece) was playing. With this technology  this knowledge can now also be transmitted."

Piano playing for video recognition: New test-bed 

From [1]: "Recognition of hands and fingers using video, which is a very challenging video recognition problem, has been considered so far in the context of such applications as computer-human interaction, automatic sign language recognition, robotic hand posture learning, and multimedia.  In all of these applications, the motion of the hand and fingers is limited to a predefined number of states, which often constitute a hand/finger gesture vocabulary that a computer vision system attempts to identify.  Furthermore, in all of these applications hands and fingers are manipulated by humans in order to be detected, i.e. they are used to send visual commands or signs to either a computer or a human. Because of that the set of possible hand and finger configurations is such that it makes them easier to be  visually distinguished from one another. In the case of detecting pianist fingers playing piano, the situation is very different. Pianists use hands to play music and therefore put all their attention on the acoustic quality that the motion of their hands produce, rather than on how they visually appear to a viewer. Therefore, pianist hand/finger motion can be considered as an example of  non-collaborative and unbiased visual data, which can be used as a unique test-bed for hand/finger recognition algorithms."


Video recognition problems tackled:

Piano playing applications benefited:

  • Piano detection
  • Piano keys recognition
  • Piano-midi calibration
  • Hand detection
  • Hand tracking
  • Finger detection
  • Distant and offline learning
  • Storing music pieces for a searchable databases
  • Producing music sheets. 
  • Synthetic (score driven) hand/finger motion generation 

Promising directions for future work: 

  • In the domain of music pedagogy: annotation of guitar and violin playing .
  • In the domain of video recognition: better finger detection by imposing additional constraints on the finger inter-relationship and their temporal coherency

Technology description and results:

Setups developed (click on the image to enlarge): 

a) b)   c)
Video camera observes  pianist hands from above as s/he plays and sends the video data to a computer, where they are processed in real-time and displayed back annotated: a) in home environment with Yamaha MIDI-keyboard and b,c) in a professional piano studio environment with MIDI-equipped grand piano.

Closer look at hand and finger detection and annotation: snapshots

When a MIDI signal is received, meaning that a piano key was pressed, the hand and finger which are believed to press the piano key are shown: hand is highlighted in red, the finger number is shown on top of the image. 

Final output (Graphical User Interface) of C-MIDI video annotation program:

Shown are (in a clockwise order from the top): 1) the image captured by camera; 2) the computed background image of the keyboard which is used to detect hands as the foreground; 3) the binarized image used to detect the black keys of the piano keyboard which is also used for video-MIDI calibration; 4) the automatically detected piano keys (highlighted as white rectangles on the bottom right image), 5) the segmented blobs in the foreground images (coloured by the number of blobs detected in the bottom left image); 6)  the final finger and hand detection results shown upside down, as camera sees it (on the top left), and in a vertically flipped for a convenient viewing by a pianist (bottom middle), where the label of the finger that played a key is shown on the top of the image; and 7) the result of the vision-based MIDI annotation (in a separate window at bottom right): each received MIDI event receives a visual label for the hand (either 1 or 2, i.e. left or right) and finger (either 1,2,3,4, or 5, counted from right to left) that played it. When the finger can not be determined, the annotation is omitted. 

Publications & Presentations

[1] Talk for the University of Ottawa's Piano Pedagogy Lab's grand opening (click here), October 5, 2005.

[2]   Dmitry Gorodnichy, Arjun Yogeswaran and Gilles Comeau.  C-MIDI stands for MIDI that "sees" the notes. Video recognition of pianist hands and fingers. Toronto-Montreal Computer Vision Workshop, Ottawa, Canada, May 29-30, 2006.  [ Poster ]

[3]    Dmitry O. Gorodnichy and Arjun Yogeswaran. Detection and tracking of pianist hands and fingers. In Proc. of the Canadian conference Computer & Robot Vision (CRV'06), Quebec, Canada, June 7-9, 2006. [ PDF ]

Abstract: Current midi recording and transmitting technology allows teachers to teach piano playing remotely (or off-line): a teacher plays a midi-keyboard at one place and a student observes the played piano keys on another midi-keyboard at another place. What this technology does not allow is to see how the piano keys are played, namely: which hand and finger was used to play a key.  In this paper we present a video recognition tool that makes it possible to provide this information. A video-camera is mounted on top of the piano keyboard and video recognition techniques are then used to calibrate piano image with midi sound, then to detect and track pianist hands and then to annotate the fingers that play the piano. The result of the obtained video annotation of piano playing can then be shown on a computer screen for further perusal by a piano teacher or a student.

Original idea of the above paper appeared in the following CVPR'06 submission that has been  published as an internal technical report.

[4]    Dmitry O. Gorodnichy and Arjun Yogeswaran. Pianist finger detection using the crevice detection operator. IIT-NRC Technical Report, March 2006. 

Abstract: Recognition of fingers using video is a very challenging computer vision problem which does not have yet a good solution for a general hand motion. This paper addresses a specific case of hand motion - that of a piano player playing a piano as observed by a camera mounted on top of the piano, and proposes a technique which significantly facilitates detection and tracking of fingers for this type of hand motion. The technique is based on the assumption of the convexity of the finger shapes and the knowledge of the hand positions. In order to retrieve the hands positions, the deformable hand template tracking technique is developed which allows one to track the moving hands as they change their appearances and get occluded, while self-adjusting piano background detection and skin colour segmentation are used to detect hands prior to their tracking. The described approaches are incorporated into a program which can be used by piano teachers as a visual aid  and which is also believed to be able to assist in automatic piano score writing.

Demonstrations: click here
(video recordings of live annotation of pianist hands and hingers playing piano )

  • From grand opening of the Piano Pedagogy Lab (Oct. 5, 2005): 
    Gilles Comeau playes three pieces of increasing complexity. NB: Also shown replayed in slow motion for better comprehension of video recognition results.

    - finger-det-demo-Oct2005-320x240.wmv (19 MB)
    - SeePlayingPlay-more.wmv (8 MB), Except (with slow replay)
  • From final project report (Sept. 9, 2005): 
    Arjun improvises around Beethoven 
    (NB: Detection of piano keys, piano image rectification is most clearly visible).

    - demo-Sept9-2005-MS-320x240.avi (31 MB)


  • Try C-MIDI program yourself (Sorry, no longer available): 
    • Can be tested with provided video sequences or your own webcam and keyboard.
    • Read License Agreement first
    • Then download C-MIDI-v1.zip, unzip it, and run

  • Video sequences captured from above - used for testing: click here. 
    • On Yamaha keyboard (in the office): by by Gilles Comeau
    • On Disclavier (in Piano Pedagogy Lab): by Gilles Comeau

      Naming convention: *.avi - uncompressed, *-MS.avi - compressed by Microsoft MPEG4 Codec V2 (highest quality). You can also use AVIcodec - a free video analyzer program, to find out the compression rates/codecs of each video clip.

  • When artistic interests work with professional ones: Music of Dmitry Gorodnichy -  also used for testing.



 www.perceptual-vision.com (synapse.vit.iit.nrc.ca)
Last Updated: 2006-03-25
Copyright © 2005 
Computational Video Group
 Project Leader: Dmitry Gorodnichy