Article ID: | iaor2006491 |
Country: | Japan |
Volume: | 87 |
Issue: | 11 |
Start Page Number: | 73 |
End Page Number: | 82 |
Publication Date: | Jan 2004 |
Journal: | Electronics and Communications in Japan, Part III Fundamentals |
Authors: | Maruyama I., Abe Y., Ehara T., Shirai K. |
Keywords: | programming: dynamic |
This paper considers a technique for prerecorded TV programs in which captions for the hearing-impaired are automatically superimposed on the basis of the program VTR and the advance electronic script. A method of detecting the caption presentation timing by detecting the synchronization timing for the speech and captions is described. For broadcast speech on which background sound is superimposed, it is difficult to achieve high detection accuracy by timing detection based only on a phoneme HMM word spotter. Consequently, this paper proposes the following method. For each sentence in the program script, multiple timing candidates are detected by word spotting. The optimal timing for the whole program is selected by using dynamic programming based on three scores (the time-order in the manuscript, the ratio of the utterance times estimated from the manuscript, and the likelihood of being speech) in addition to the acoustic likelihood. An evaluation experiment was performed on 10 sessions of a documentary program. Assuming tolerable detection errors of 1 and 3 seconds, detection rates of 99.0% and 99.7% were obtained, respectively, indicating that the method is of practical value.