|
[1]
|
Michael A. Berger, Gregor Hofer, and Hiroshi Shimodaira.
Carnival - combining speech technology and computer animation.
IEEE Computer Graphics and Applications, 31:80-89, 2011.
[ bib |
DOI ]
|
|
[2]
|
Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, and Ricardo
Gutierrez-Osuna.
Relying on critical articulators to estimate vocal tract spectra in
an articulatory-acoustic database.
In Proc. Interspeech, pages 1990-1993, September 2010.
[ bib |
.pdf ]
We present a new phone-dependent feature weighting
scheme that can be used to map articulatory
configurations (e.g. EMA) onto vocal tract spectra
(e.g. MFCC) through table lookup. The approach consists
of assigning feature weights according to a feature's
ability to predict the acoustic distance between
frames. Since an articulator's predictive accuracy is
phone-dependent (e.g., lip location is a better
predictor for bilabial sounds than for palatal sounds),
a unique weight vector is found for each phone.
Inspection of the weights reveals a correspondence with
the expected critical articulators for many phones. The
proposed method reduces overall cepstral error by 6%
when compared to a uniform weighting scheme. Vowels
show the greatest benefit, though improvements occur
for 80% of the tested phones.
Keywords: speech production, speech synthesis
|
|
[3]
|
Michael Berger, Gregor Hofer, and Hiroshi Shimodaira.
Carnival: a modular framework for automated facial animation.
Poster at SIGGRAPH 2010, 2010.
Bronze award winner, ACM Student Research Competition.
[ bib |
.pdf ]
|
|
[4]
|
Gregor Hofer, Korin Richmond, and Michael Berger.
Lip synchronization by acoustic inversion.
Poster at Siggraph 2010, 2010.
[ bib |
.pdf ]
|
|
[5]
|
Richard S. McGowan and Michael A. Berger.
Acoustic-articulatory mapping in vowels by locally weighted
regression.
Journal of the Acoustical Society of America,
126(4):2011-2032, 2009.
[ bib |
.pdf ]
A method for mapping between simultaneously measured
articulatory and acoustic data is proposed. The method
uses principal components analysis on the articulatory
and acoustic variables, and mapping between the domains
by locally weighted linear regression, or loess
[Cleveland, W. S. (1979) J. Am. Stat. Assoc. 74,
829-836]. The latter method permits local variation in
the slopes of the linear regression, assuming that the
function being approximated is smooth. The methodology
is applied to vowels of four speakers in the Wisconsin
X-ray Microbeam Speech Production Database, with
formant analysis. Results are examined in terms of (1)
examples of forward (articulation-to-acoustics)
mappings and inverse mappings, (2) distributions of
local slopes and constants, (3) examples of
correlations among slopes and constants, (4)
root-mean-square error, and (5) sensitivity of formant
frequencies to articulatory change. It is shown that
the results are qualitatively correct and that loess
performs better than global regression. The forward
mappings show different root-mean-square error
properties than the inverse mappings indicating that
this method is better suited for the forward mappings
than the inverse mappings, at least for the data chosen
for the current study. Some preliminary results on
sensitivity of the first two formant frequencies to the
two most important articulatory principal components
are presented.
|