|
[1]
|
Theresa Wilson and Gregor Hofer.
Using linguistic and vocal expressiveness in social role recognition.
In Proc Int. Conf. on Intelligent User Interfaces, IUI2011,
Palo Alto, USA, 2011. ACM.
[ bib |
.pdf ]
In this paper, we investigate two types of
expressiveness, linguistic and vocal, and whether they
are useful for recog- nising the social roles of
participants in meetings. Our ex- periments show that
combining expressiveness features with speech activity
does improve social role recognition over speech
activity features alone.
|
|
[2]
|
Michael A. Berger, Gregor Hofer, and Hiroshi Shimodaira.
Carnival - combining speech technology and computer animation.
IEEE Computer Graphics and Applications, 31:80-89, 2011.
[ bib |
DOI ]
|
|
[3]
|
Gregor Hofer and Korin Richmond.
Comparison of HMM and TMDN methods for lip synchronisation.
In Proc. Interspeech, pages 454-457, Makuhari, Japan,
September 2010.
[ bib |
.pdf ]
This paper presents a comparison between a hidden
Markov model (HMM) based method and a novel artificial
neural network (ANN) based method for lip
synchronisation. Both model types were trained on
motion tracking data, and a perceptual evaluation was
carried out comparing the output of the models, both to
each other and to the original tracked data. It was
found that the ANN-based method was judged
significantly better than the HMM based method.
Furthermore, the original data was not judged
significantly better than the output of the ANN method.
Keywords: hidden Markov model (HMM), mixture density network,
lip synchronisation, inversion mapping
|
|
[4]
|
Michael Berger, Gregor Hofer, and Hiroshi Shimodaira.
Carnival: a modular framework for automated facial animation.
Poster at SIGGRAPH 2010, 2010.
Bronze award winner, ACM Student Research Competition.
[ bib |
.pdf ]
|
|
[5]
|
Michael Pucher, Friedrich Neubarth, Volker Strom, Sylvia Moosmüller, Gregor
Hofer, Christian Kranzler, Gudrun Schuchmann, and Dietmar Schabus.
Resources for speech synthesis of viennese varieties.
In Proc. Int. Conf. on Language Resources and Evaluation,
LREC'10, Malta, 2010. European Language Resources Association (ELRA).
[ bib |
.ps |
.pdf ]
This paper describes our work on developing corpora of
three varieties of Viennese for unit selection speech
synthesis. The synthetic voices for Viennese varieties,
implemented with the open domain unit selection speech
synthesis engine Multisyn of Festival will also be
released within Festival. The paper especially focuses
on two questions: how we selected the appropriate
speakers and how we obtained the text sources needed
for the recording of these non-standard varieties.
Regarding the first one, it turned out that working
with a ‘prototypical’ professional speaker was much
more preferable than striving for authenticity. In
addition, we give a brief outline about the differences
between the Austrian standard and its dialectal
varieties and how we solved certain technical problems
that are related to these differences. In particular,
the specific set of phones applicable to each variety
had to be determined by applying various constraints.
Since such a set does not serve any descriptive
purposes but rather is influencing the quality of
speech synthesis, a careful design of such a (in most
cases reduced) set was an important task.
|
|
[6]
|
Gregor Hofer, Korin Richmond, and Michael Berger.
Lip synchronization by acoustic inversion.
Poster at Siggraph 2010, 2010.
[ bib |
.pdf ]
|
|
[7]
|
Michal Dziemianko, Gregor Hofer, and Hiroshi Shimodaira.
HMM-based automatic eye-blink synthesis from speech.
In Proc. Interspeech, pages 1799-1802, Brighton, UK, September
2009.
[ bib |
.pdf ]
In this paper we present a novel technique to
automatically synthesise eye blinking from a speech
signal. Animating the eyes of a talking head is
important as they are a major focus of attention during
interaction. The developed system predicts eye blinks
from the speech signal and generates animation
trajectories automatically employing a ”Trajectory
Hidden Markov Model”. The evaluation of the
recognition performance showed that the timing of
blinking can be predicted from speech with an F-score
value upwards of 52%, which is well above chance.
Additionally, a preliminary perceptual evaluation was
conducted, that confirmed that adding eye blinking
significantly improves the perception the character.
Finally it showed that the speech synchronised
synthesised blinks outperform random blinking in
naturalness ratings.
|
|
[8]
|
Gregor Hofer, Junichi Yamagishi, and Hiroshi Shimodaira.
Speech-driven lip motion generation with a trajectory HMM.
In Proc. Interspeech 2008, pages 2314-2317, Brisbane,
Australia, September 2008.
[ bib |
.pdf ]
Automatic speech animation remains a challenging
problem that can be described as finding the optimal
sequence of animation parameter configurations given
some speech. In this paper we present a novel technique
to automatically synthesise lip motion trajectories
from a speech signal. The developed system predicts lip
motion units from the speech signal and generates
animation trajectories automatically employing a
’¡ÉTrajectory Hidden Markov Model’¡É. Using the MLE
criterion, its parameter generation algorithm produces
the optimal smooth motion trajectories that are used to
drive control points on the lips directly.
Additionally, experiments were carried out to find a
suitable model unit that produces the most accurate
results. Finally a perceptual evaluation was conducted,
that showed that the developed motion units perform
better than phonemes.
|
|
[9]
|
Gregor Hofer and Hiroshi Shimodaira.
Automatic head motion prediction from speech data.
In Proc. Interspeech 2007, Antwerp, Belgium, August 2007.
[ bib |
.pdf ]
In this paper we present a novel approach to generate
a sequence of head motion units given some speech. The
modelling approach is based on the notion that head
motion can be divided into a number of short
homogeneous units that can each be modelled
individually. The system is based on Hidden Markov
Models (HMM), which are trained on motion units and act
as a sequence generator. They can be evaluated by an
accuracy measure. A database of motion capture data was
collected and manually annotated for head motion and is
used to train the models. It was found that the model
is good at distinguishing high activity regions from
regions with less activity with accuracies around 75
percent. Furthermore the model is able to distinguish
different head motion patterns based on speech features
somewhat reliably, with accuracies reaching almost 70
percent.
|
|
[10]
|
Gregor Hofer, Hiroshi Shimodaira, and Junichi Yamagishi.
Speech-driven head motion synthesis based on a trajectory model.
Poster at Siggraph 2007, 2007.
[ bib |
.pdf ]
|
|
[11]
|
Gregor Hofer, Hiroshi Shimodaira, and Junichi Yamagishi.
Lip motion synthesis using a context dependent trajectory hidden
Markov model.
Poster at SCA 2007, 2007.
[ bib |
.pdf ]
|
|
[12]
|
G. Hofer, K. Richmond, and R. Clark.
Informed blending of databases for emotional speech synthesis.
In Proc. Interspeech, September 2005.
[ bib |
.ps |
.pdf ]
The goal of this project was to build a unit selection
voice that could portray emotions with varying
intensities. A suitable definition of an emotion was
developed along with a descriptive framework that
supported the work carried out. A single speaker was
recorded portraying happy and angry speaking styles.
Additionally a neutral database was also recorded. A
target cost function was implemented that chose units
according to emotion mark-up in the database. The
Dictionary of Affect supported the emotional target
cost function by providing an emotion rating for words
in the target utterance. If a word was particularly
'emotional', units from that emotion were favoured. In
addition intensity could be varied which resulted in a
bias to select a greater number emotional units. A
perceptual evaluation was carried out and subjects were
able to recognise reliably emotions with varying
amounts of emotional units present in the target
utterance.
|
|
[13]
|
Hiroshi Shimodaira, Keisuke Uematsu, Shin'ichi Kawamoto, Gregor Hofer, and
Mitsuru Nakai.
Analysis and Synthesis of Head Motion for Lifelike Conversational
Agents.
In Proc. MLMI2005, July 2005.
[ bib |
.pdf ]
|