Articulation Entropy: An Unsupervised Measure of Articulatory Precision


Abstract—Articulatory precision is a critical factor that influences speaker intelligibility. In this letter, we propose a new measure we call “articulation entropy” that serves as a proxy for the number of distinct phonemes a person produces when he or she speaks. The method is based on the observation that the ability of a speaker to achieve an articulatory target, and hence clearly produce distinct phonemes, is related to the variation of the distribution of speech features that capture articulation—the larger the variation, the larger the number of distinct phonemes produced. In contrast to previous work, the proposed method is completely unsupervised, does not require phonetic segmentation or formant
estimation, and can be estimated directly from continuous speech.

We evaluate the performance of this measure with several experiments on two data sets: a database of English speakers with various neurological disorders and a database of Mandarin speakers with Parkinson’s disease. The results reveal that our measure correlates with subjective evaluation of articulatory precision and reveals differences between healthy individuals and individuals with neurological impairment.

Convex Weighting Criteria for Speaking Rate Estimation


Abstract—Speaking rate estimation directly from the speech waveform is a long-standing problem in speech signal processing. In this paper, we pose the speaking rate estimation problem as that of estimating a temporal density function whose integral over a given interval yields the speaking rate within that interval. In contrast to many existing methods, we avoid the more difficult task of detecting individual phonemes within the speech signal and we avoid heuristics such as thresholding the temporal envelope to estimate the number of vowels. Rather, the proposed method aims to learn an optimal weighting function that can be directly applied to time-frequency features in a speech signal to yield a temporal density function. We propose two convex cost functions for learning the weighting functions and an adaptation strategy to customize the approach to a particular speaker using minimal training. The algorithms are evaluated on the TIMIT corpus, on a dysarthric speech corpus, and on the ICSI Switchboard spontaneous speech corpus. Results show that the proposed methods outperform three competing methods on both healthy and dysarthric speech. In addition, for spontaneous speech rate estimation, the result show a high correlation between the estimated speaking rate and ground truth values.

Automatic Assessment of Vowel Space Area


Abstract: Vowel space area (VSA) is an attractive metric for the study of speech production deficits and reductions in intelligibility, in addition to the traditional study of vowel distinctiveness. Traditional VSA estimates are not currently sufficiently sensitive to map to production deficits. The present report describes an automated algorithm using healthy, connected speech rather than single syllables and estimates the entire vowel working space rather than corner vowels. Analyses reveal a strong correlation between the traditional VSA and automated estimates. When the two methods diverge, the automated method seems to provide a more accurate area since it accounts for all vowels.