Evaluation of Various Parameter Sets in Spoken Digits Recognition

Akira Ichikawa, Yasuaki Nakano and Kazuo Nakata

Trans. IEEE on Audio, AU-21 [3], pp.202-209 (1973)

Abstract

Various parameter sets--including a spectrum envelope, cepstrum, auto-correlation function, linear predictive coefficients, and partial auto-correlation coefficients (PAC's)--are evaluated experimentally to determine which constitutes the best parameters in spoken digit recognition.

The principle of recognition is simple pattern matching in the parameter space with nonlinear adjustment of the time axis.

The spectrum envelope and cepstrum attain the best recognition score of 100% for ten spoken digits of a single male speaker. PAC's seem to be preferable because of their ease of extraction and theoretical orthogonalities; however, these PAC's tend to suffer from computation errors when computed by fixed- point arithmetic with a short accumulator length. We find two effective means to improve the errors; one is variable use of the PAC dimensions controlled by computation accuracy, and the other is smoothing along of time axis. With these improvements the PAC's offer almost 100% recognition.

[音声認識][中野の研究紹介][中野の目次]

mail address: ← お手数ですが打ち込んで下さい

First Written Before June 16, 1998
Transplanted to KSU Before May 14, 2003
Transplanted to So-net May 3, 2005
Last Update April 8, 2007

© Yasuaki Nakano 1998-2007