|
[1]
|
Fergus R. McInnes and Sharon J. Goldwater.
Unsupervised extraction of recurring words from infant-directed
speech.
In Proceedings of CogSci 2011, Boston, Massachusetts, July
2011.
[ bib |
.pdf ]
To date, most computational models of infant word
segmentation have worked from phonemic or phonetic
input, or have used toy datasets. In this paper, we
present an algorithm for word extraction that works
directly from naturalistic acoustic input:
infant-directed speech from the CHILDES corpus. The
algorithm identifies recurring acoustic patterns that
are candidates for identification as words or phrases,
and then clusters together the most similar patterns.
The recurring patterns are found in a single pass
through the corpus using an incremental method, where
only a small number of utterances are considered at
once. Despite this limitation, we show that the
algorithm is able to extract a number of recurring
words, including some that infants learn earliest, such
as "Mommy" and the child’s name. We also introduce a
novel information-theoretic evaluation measure.
|
|
[2]
|
S. Renals, D. McKelvie, and F. McInnes.
A comparative study of continuous speech recognition using neural
networks and hidden Markov models.
In Proc IEEE ICASSP, pages 369-372, Toronto, 1991.
[ bib ]
|
|
[3]
|
Briony J. Williams, S. M Hiller, F. McInnes, and J. Dalby.
A knowledge-based nasal classifier for use in continuous speech
recognition.
In Proceedings of the European Conference on Speech
Communication and Technology, Paris, France, 1989.
[ bib ]
|