Generation of Multimedia TV News Contents for WWW

Hsin Chia Fu

Department of computer science
National Chiao-Tung University
Hsinchu, Taiwan, 300
3 571-2121
Yeong Y. Xu

Department of computer science
National Chiao-Tung University
Hsinchu, Taiwan, 300 R
(886) 3 573-1930
C.L. Tseng

Department of computer science
National Chiao-Tung University
Hsinchu, Taiwan, 300 R
(886) 3 573-1930


In this paper, we present a system we have developed forautomatic TV News video indexing that successfully combinesresults from the fields of speaker verification, acoustic analysis,very large vocabulary video OCR, content basedsampling of video, information retrieval, dialogue systems, and ASFmedia delivery over IP. The prototype of TV news content processing Web was completed in July 2003. Since then, the system has been up running continuously. Up to the date when this message is written (March 27, 2006), the system records and analyzes the prime time evening news program in Taiwan every day of these years, except a few power failure shutdown. The TV news web is athttp://

Categories and Subject Descriptors

H.3.5 [INFORMATION STORAGE AND RETRIEVAL]: Online Information ServicesVCommercial services, Data sharing, Web-based services

General Terms

Documentation, Design, Experimentation, Human Factors


TV news, video OCR, information retrieval, content analysis.


The fierce competition among TV news programs makethe news contents become more and more fruitful. However,people may not be patient enough to wait for the arrivalof favored news eventswhile a long seriesof uninterested TV news are on the air. Therefore, news on-demand becomes an attractingservice [1]. Since, some TV news programs are broadcasted aroundthe clock, manually indexing news video into Web contentscan be a boring and tedious work. Therefore, automaticallyindexing news stories becomes an emergent issue for multimediainformation processing. Increasing computing powerand gradually maturing multimedia technologies provide apowerful working environment for automatically segmentingnews video into semantically meaningful units such asstories and summaries, and then into hierarchical contents.


In general, an automated hierarchical TV-news web system needs to have at least the following features:

Segmenting a TV news program into story based units

Generating keywords and titles for each news stories

Interactively displaying hierarchical TV news contents

Providing users for searching related news stories.


Copyright is held by IW3C2.

WWW 2006, May 22V26, 2006, Edinburgh, UK.

The proposed hierarchical news content processing system consists three modules: (1) TV news acquisition, (2) news content analysis, and (3) user interface for news query and search. The major task of the acquisition module is to record TV news programs in proper video format, and fetch related news documents from Internet web. Content analysis module receives and segments the recorded news video into news story units, and then extracts keywords and news titles from each story. Providing a friendly searching and browsing environment for retrieving interested news is the task of user interface module. Figure 1 shows the architectureand data flow of the proposed TV news indexing system.

The architecture and data flow of the proposed TV news indexing and browsing system

Figure 1. The architecture and data flow of the proposed TV news indexing and browsing system.

The most technology intense part in the proposed web system is the TV news index generator [2]. Figure 2 depicts theprocessing flow of the indexing generator. At first, a TV news program is captured and encoded into stream video. In the meantime, a shot detector is used to segment the streaming video into shots for key-frame generation. Within a shot, speaker identification techniques [3] are then applied to detect anchor frames. Each closed caption on the anchor frames is then extracted and recognized by video OCR techniques [4]. By matching the characters from closed caption with news document retrieved from Internet, the proposed system can construct links between TV news stories and Internet news stories.


A prototype of the multimedia TV news WWWhas beenimplemented on1+Npersonalcomputers, where N is the number of TV News channels. The database and web server are installed in one (1) machine, called WebDB. The rest NPCsare called Indexers,which are used to generate news index, key frames, etc. All these PCscontain aPentium III or higher grade CPU, with at least 256 MB of RAM. In addition, a databaseserver is installed in WebDB as a news data manager. The Indexer automatically records prime time TV-news programs every day, and in the mean time, segmenting and indexing news stories are also processed in parallel. Then, the Indexer produces hierarchical news contents, include news video, keywords, titles, and time code of a story.

The flow of TV news indexing generation

Figure 2. The flow of TV news indexing generation.


A snapshot of the TV news browsing results is shown in Figure 3. A user can select a particular or a favor channel and dates at the pull down menu. Then, click the browse button to start the browsing processes. The lower-left panel will show the news titles of the selected TV program. Key-frames of each news story are shown at the lower-right panel at the same time. (S)he can select an interested news story by clicking on the news title, or on a key-frame, to activate the playing of the corresponding video clip and the displaying of the key-frames of the selected news story. In addition, keyword queryis also available in theuser-interface window.


The prototype of the TV news WWW was completed in July 2003. Since then, the system has been up running continuously. Up to the date when this report is written (March 27, 2006), the system records and analyzes CTS evening News program daily, except a few short shutdown due to power failure. Recently, we have setup a new WWW site, PDA browsing.

A sample of the TV news web-page

Figure 3. A sample of the TV news web-page.


This research was supported in part by the National Science Council under Grant NSC 93-2213-E009-060. The authors sincerely thank to the faculty and students associated with the Neural Network Multimedia laboratory of National Chiao-Tung University for their suggestions and contributions.


[1] Merialdo, B., Lee, K. T., Luparello,D., and Roudaire, J. Automatic construction of personalized tv news programs. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), ACM Press, 1999. 8., 323--331.

[2] Chen,Y.H., Tseng, C.L., Cheng, S.S., Fang, T.M., Chan, H.Y., and Fu, H.C. On the sceneclassification for the automated generation of hierarchical contents from broadcasting TV news. In: Proceedings of KES02, Milan, Italy 2002.

[3] Cheng, S., Chen, Y., Tseng, C., Fu, H.C., and Pao, H.T. A self-growing probabilisticdecision-based neural network with applications to anchor/speaker identification. In: Proceedings of HIS02, Santiago, Chile, 2002.

[4] Sato, T., Kanade, T., Hughes, E.K., Smith, M.A.: Video optical character recognitionfor digital news archive. In: Proc. Workshop on Content-Based Access ofImage and Video Databases, Los Alamitos, CA (1998) 52V60.