|
[1]
|
Gabriel Murray, Thomas Kleinbauer, Peter Poller, Tilman Becker, Steve Renals,
and Jonathan Kilgour.
Extrinsic summarization evaluation: A decision audit task.
ACM Transactions on Speech and Language Processing, 6(2):1-29,
2009.
[ bib |
DOI |
http |
.pdf ]
In this work we describe a large-scale extrinsic
evaluation of automatic speech summarization
technologies for meeting speech. The particular task is
a decision audit, wherein a user must satisfy a complex
information need, navigating several meetings in order
to gain an understanding of how and why a given
decision was made. We compare the usefulness of
extractive and abstractive technologies in satisfying
this information need, and assess the impact of
automatic speech recognition (ASR) errors on user
performance. We employ several evaluation methods for
participant performance, including post-questionnaire
data, human subjective and objective judgments, and a
detailed analysis of participant browsing behavior. We
find that while ASR errors affect user satisfaction on
an information retrieval task, users can adapt their
browsing behavior to complete the task satisfactorily.
Results also indicate that users consider extractive
summaries to be intuitive and useful tools for browsing
multimodal meeting data. We discuss areas in which
automatic summarization techniques can be improved in
comparison with gold-standard meeting abstracts.
|
|
[2]
|
Gabriel Murray, Thomas Kleinbauer, Peter Poller, Steve Renals, and Jonathan
Kilgour.
Extrinsic summarization evaluation: A decision audit task.
In Machine Learning for Multimodal Interaction (Proc. MLMI
'08), number 5237 in Lecture Notes in Computer Science, pages 349-361.
Springer, 2008.
[ bib |
DOI |
.pdf ]
In this work we describe a large-scale extrinsic
evaluation of automatic speech summarization
technologies for meeting speech. The particular task is
a decision audit, wherein a user must satisfy a complex
information need, navigating several meetings in order
to gain an understanding of how and why a given
decision was made. We compare the usefulness of
extractive and abstractive technologies in satisfying
this information need, and assess the impact of
automatic speech recognition (ASR) errors on user
performance. We employ several evaluation methods for
participant performance, including post-questionnaire
data, human subjective and objective judgments, and an
analysis of participant browsing behaviour.
|
|
[3]
|
Gabriel Murray and Steve Renals.
Detecting action items in meetings.
In Machine Learning for Multimodal Interaction (Proc. MLMI
'08), number 5237 in Lecture Notes in Computer Science, pages 208-213.
Springer, 2008.
[ bib |
DOI |
http |
.pdf ]
We present a method for detecting action items in
spontaneous meeting speech. Using a supervised approach
incorporating prosodic, lexical and structural
features, we can classify such items with a high degree
of accuracy. We also examine how well various feature
subclasses can perform this task on their own.
|
|
[4]
|
Gabriel Murray and Steve Renals.
Meta comments for summarizing meeting speech.
In Machine Learning for Multimodal Interaction (Proc. MLMI
'08), number 5237 in Lecture Notes in Computer Science, pages 236-247.
Springer, 2008.
[ bib |
DOI |
http |
.pdf ]
This paper is about the extractive summarization of
meeting speech, using the ICSI and AMI corpora. In the
first set of experiments we use prosodic, lexical,
structural and speaker-related features to select the
most informative dialogue acts from each meeting, with
the hypothesis being that such a rich mixture of
features will yield the best results. In the second
part, we present an approach in which the
identification of “meta-comments” is used to create
more informative summaries that provide an increased
level of abstraction. We find that the inclusion of
these meta comments improves summarization performance
according to several evaluation metrics.
|
|
[5]
|
Gabriel Murray and Steve Renals.
Towards online speech summarization.
In Proc. Interspeech '07, 2007.
[ bib |
.PDF ]
The majority of speech summarization research has
focused on extracting the most informative dialogue
acts from recorde d, archived data. However, a
potential use case for speech sum- marization in the
meetings domain is to facilitate a meeting in progress
by providing the participants - whether they are at
tend- ing in-person or remotely - with an indication of
the most im- portant parts of the discussion so far.
This requires being a ble to determine whether a
dialogue act is extract-worthy befor e the global
meeting context is available. This paper introduces a
novel method for weighting dialogue acts using only
very lim- ited local context, and shows that high
summary precision is possible even when information
about the meeting as a whole is lacking. A new
evaluation framework consisting of weighted precision,
recall and f-score is detailed, and the novel onl ine
summarization method is shown to significantly increase
recall and f-score compared with a method using no
contextual infor- mation.
|
|
[6]
|
Gabriel Murray and Steve Renals.
Term-weighting for summarization of multi-party spoken dialogues.
In A. Popescu-Belis, S. Renals, and H. Bourlard, editors,
Machine Learning for Multimodal Interaction IV, volume 4892 of Lecture
Notes in Computer Science, pages 155-166. Springer, 2007.
[ bib |
.pdf ]
This paper explores the issue of term-weighting in the
genre of spontaneous, multi-party spoken dialogues,
with the intent of using such term-weights in the
creation of extractive meeting summaries. The field of
text information retrieval has yielded many
term-weighting tech- niques to import for our purposes;
this paper implements and compares several of these,
namely tf.idf, Residual IDF and Gain. We propose that
term-weighting for multi-party dialogues can exploit
patterns in word us- age among participant speakers,
and introduce the su.idf metric as one attempt to do
so. Results for all metrics are reported on both manual
and automatic speech recognition (ASR) transcripts, and
on both the ICSI and AMI meeting corpora.
|
|
[7]
|
G. Murray and S. Renals.
Dialogue act compression via pitch contour preservation.
In Proceedings of the 9th International Conference on Spoken
Language Processing, Pittsburgh, USA, September 2006.
[ bib |
.pdf ]
This paper explores the usefulness of prosody in
automatically compressing dialogue acts from meeting
speech. Specifically, this work attempts to compress
utterances by preserving the pitch contour of the
original whole utterance. Two methods of doing this are
described in detail and are evaluated
subjectively using human annotators and
objectively using edit distance with a
human-authored gold-standard. Both metrics show that
such a prosodic approach is much better than the random
baseline approach and significantly better than a
simple text compression method.
|
|
[8]
|
G. Murray, S. Renals, J. Moore, and J. Carletta.
Incorporating speaker and discourse features into speech
summarization.
In Proceedings of the Human Language Technology Conference -
North American Chapter of the Association for Computational Linguistics
Meeting (HLT-NAACL) 2006, New York City, USA, June 2006.
[ bib |
.pdf ]
The research presented herein explores the usefulness
of incorporating speaker and discourse features in an
automatic speech summarization system applied to
meeting recordings from the ICSI Meetings corpus. By
analyzing speaker activity, turn-taking and discourse
cues, it is hypothesized that a system can outperform
solely text-based methods inherited from the field of
text summarization. The summarization methods are
described, two evaluation methods are applied and
compared, and the results clearly show that utilizing
such features is advantageous and efficient. Even
simple methods relying on discourse cues and speaker
activity can outperform text summarization approaches.
|
|
[9]
|
B. Hachey, G. Murray, and D. Reitter.
Dimensionality reduction aids term co-occurrence based multi-document
summarization.
In Proceedings of ACL Summarization Workshop 2006, Sydney,
Australia, June 2006.
[ bib |
.pdf ]
A key task in an extraction system for query-oriented
multi-document summarisation, necessary for computing
relevance and redundancy, is modelling text semantics.
In the Embra system, we use a representation derived
from the singular value decomposition of a term
co-occurrence matrix. We present methods to show the
reliability of performance improvements. We find that
Embra performs better with dimensionality reduction.
|
|
[10]
|
G. Murray, S. Renals, and M. Taboada.
Prosodic correlates of rhetorical relations.
In Proceedings of HLT/NAACL ACTS Workshop, 2006, New York City,
USA, June 2006.
[ bib |
.pdf ]
This paper investigates the usefulness of prosodic
features in classifying rhetorical relations between
utterances in meeting recordings. Five rhetorical
relations of contrast, elaboration,
summary, question and cause
are explored. Three training methods - supervised,
unsupervised, and combined - are compared, and
classification is carried out using support vector
machines. The results of this pilot study are
encouraging but mixed, with pairwise classification
achieving an average of 68% accuracy in discerning
between relation pairs using only prosodic features,
but multi-class classification performing only slightly
better than chance.
|
|
[11]
|
B. Hachey, G. Murray, and D. Reitter.
The Embra system at DUC 2005: Query-oriented multi-document
summarization with a very large latent semantic space.
In Proceedings of the Document Understanding Conference (DUC)
2005, Vancouver, BC, Canada, October 2005.
[ bib |
.pdf ]
Our summarization system submitted to DUC 2005, Embra
(or Edinburgh), is novel in that it relies on building
a very large semantic space for the purposes of
determining relevance and redundancy in an MMR-style
framework. We address specificity by detecting the
presence or absence of Named Entities in our extract
candidates, and we implemented a sentence-ordering
algorithm to maximize sentence cohesion in our final
summaries.
|
|
[12]
|
G. Murray, S. Renals, and J. Carletta.
Extractive summarization of meeting recordings.
In Proc. Interspeech, September 2005.
[ bib |
.pdf ]
Several approaches to automatic speech summarization
are discussed below, using the ICSI Meetings corpus. We
contrast feature-based approaches using prosodic and
lexical features with maximal marginal relevance and
latent semantic analysis approaches to summarization.
While the latter two techniques are borrowed directly
from the field of text summarization, feature-based
approaches using prosodic information are able to
utilize characteristics unique to speech data. We also
investigate how the summarization results might
deteriorate when carried out on ASR output as opposed
to manual transcripts. All of the summaries are of an
extractive variety, and are compared using the software
ROUGE.
|
|
[13]
|
G. Murray, S. Renals, J. Carletta, and J. Moore.
Evaluating automatic summaries of meeting recordings.
In Proceedings of the 43rd Annual Meeting of the Association for
Computational Linguistics, Ann Arbor, MI, USA, June 2005.
[ bib |
.pdf ]
The research below explores schemes for evaluating
automatic summaries of business meetings, using the
ICSI Meeting Corpus. Both automatic and subjective
evaluations were carried out, with a central interest
being whether or not the two types of evaluations
correlate with each other. The evaluation metrics were
used to compare and contrast differing approaches to
automatic summarization, the deterioration of summary
quality on ASR output versus manual transcripts, and to
determine whether manual extracts are rated
significantly higher than automatic extracts.
|