Programs & Events

Digitization Plan

Approximately 1200 hours of video oral histories were MPEG-1 digitized and auto-indexed to establish a collection under the formats established by the Carnegie Mellon University Informedia Digital Video Library (1,2,3). Video data and derived metadata were stored to an on-line video server and database server respectively.  Data was backed up onto digital tapes.  Single copy DVD’s were created for each video oral history.

The Video Processing System Architecture

The architecture of the integrated Informedia video analysis system divides into 3 components, a set of video processing modules, a database representation for the extracted data and a user interface for searching and analyzing the extracted data in the database. A conceptual view of the processing architecture is given in Figure 1.  The data flow of the processing modules is designed in the following sequence:

  1. A video is digitized into MPEG-1. MPEG-2 and/or other video formats may be added at a later time.
  2. From the MPEG stream, the audio channel is extracted and processed by the CMU Sphinx speech recognition system (or another suitable speech recognition component). If corresponding transcripts are available, they are used in conjunction with the speech recognition. Silence information is also extracted. The timing information of the speech system allows tight synchronization of the transcript with the image data extracted in step 3.
  3. The video channel is also separated and analyzed by the following processes:
    • Shotbreak detection to determine where camera cuts occur in produced video or image content changes in unedited video.
    • Face detection to find human faces in the video.
    • Text detection to locate overlayed or scene text in the video
    • Motion detection to determine which shots contain object motion.
    • Other image processing modules for object detection as these modules become available.
  4. Using the transcripts, audio, silence information and shotbreak information, the video is segmented into video paragraphs, intended to be semantically coherent units averaging between 30 seconds and 3 minutes in length. For interview video, a video paragraph generally contains a single question and response.
  5. The information associated with a segment is used to
    • derive titles for each segment,
    • classify each segment into up to five topic areas,
    • extract places, names and locations mentioned in the transcript or the overlay text
    • associate faces with names.
  6. All extracted data is stored in a database, where the timing information for each datum is retained. A pointer to the externally archived video source is also stored in the database.
  7. User annotations can be manually added to the extracted metadata in the database and searched by request.

Inside the database, videos are grouped into collections based on content.  Collections are grouped into libraries, which are indexed and can be searched for text, images, or other multimedia objects. Once the extracted data is in a data base, indexing routines can compile a search index for each field of textual information. Figure 2shows a subset of the entity relations in the database tables. Images in the database can now be indexed with routines utilizing S-R Trees, or M-trees if vector information is not available. Similarly, vehicles or other objects that have been detected, can be indexed and searched in this framework. The image indexing results can also be stored external to the database, if necessary.

This architecture, which was initally developed and refined with NSF funding in the Digital Library Initiative, allows simple integration of other new or improved video processing modules. It already utilizes other multimedia information such as speech and closed-captioning. It provides for an efficient database representation as well as a flexible user interface to search and access the extracted video metadata.


(1) Intelligent Access to Digital Video: The Informedia Project Wactlar, H., Stevens, S., Smith, M., Kanade, T., IEEE Computer, 29(5), Digital Library Initiative Special Issue., May, 1996;

(2) Digital Video Archives: Managing through Metadata Wactlar, H., Christel, M., In "Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving"; Commissioned for and sponsored by the National Digital Information Infrastructure and Preservation Program, Library of Congress, p. 80-95, April, 2002;

(3) http://www.informedia.cs.cmu.edu

1900 South Michigan Avenue   Chicago, IL 60616   312-674-1900   312-674-1915 (fax)
All content herein Copyright 2008© of The HistoryMakers® | webmaster@thehistorymakers.com