Stochastic Language Generation Using WIDL-expressions and itsApplication in Machine Translation and Summarization Radu Soricut Information Sciences Institute University of Southern Calif
Trang 1Stochastic Language Generation Using WIDL-expressions and its
Application in Machine Translation and Summarization
Radu Soricut
Information Sciences Institute University of Southern California
4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292
radu@isi.edu
Daniel Marcu
Information Sciences Institute University of Southern California
4676 Admiralty Way, Suite 1001 Marina del Rey, CA 90292
marcu@isi.edu
Abstract
We propose WIDL-expressions as a
flex-ible formalism that facilitates the
integra-tion of a generic sentence realizaintegra-tion
sys-tem within end-to-end language
process-ing applications WIDL-expressions
rep-resent compactly probability distributions
over finite sets of candidate realizations,
and have optimal algorithms for
realiza-tion via interpolarealiza-tion with language model
probability distributions We show the
ef-fectiveness of a WIDL-based NLG system
in two sentence realization tasks:
auto-matic translation and headline generation
1 Introduction
The Natural Language Generation (NLG)
com-munity has produced over the years a
consid-erable number of generic sentence realization
1991), FUF (Elhadad, 1991), Nitrogen (Knight
and Hatzivassiloglou, 1995), Fergus (Bangalore
and Rambow, 2000), HALogen (Langkilde-Geary,
2002), Amalgam (Corston-Oliver et al., 2002), etc
However, when it comes to end-to-end,
text-to-text applications – Machine Translation,
Summa-rization, Question Answering – these generic
sys-tems either cannot be employed, or, in instances
where they can be, the results are significantly
below that of state-of-the-art, application-specific
systems (Hajic et al., 2002; Habash, 2003) We
believe two reasons explain this state of affairs
First, these generic NLG systems use input
rep-resentation languages with complex syntax and
se-mantics These languages involve deep,
semantic-based subject-verb or verb-object relations (such
as ACTOR, AGENT, PATIENT, etc., for Penman
object, premod, etc., for HALogen), or lexi-cal dependencies (Fergus, Amalgam) Such inputs cannot be accurately produced by state-of-the-art analysis components from arbitrary textual input
in the context of text-to-text applications
Second, most of the recent systems (starting with Nitrogen) have adopted a hybrid approach
to generation, which has increased their robust-ness These hybrid systems use, in a first phase, symbolic knowledge to (over)generate a large set
of candidate realizations, and, in a second phase, statistical knowledge about the target language (such as stochastic language models) to rank the candidate realizations and find the best scoring
– from the perspective of integrating these sys-tems within end-to-end applications – is that the two generation phases cannot be tightly coupled More precisely, input-driven preferences and tar-get language–driven preferences cannot be inte-grated in a true probabilistic model that can be trained and tuned for maximum performance
In this paper, we propose WIDL-expressions (WIDL stands for Weighted Interleave, Disjunc-tion, and Lock, after the names of the main op-erators) as a representation formalism that facil-itates the integration of a generic sentence real-ization system within end-to-end language appli-cations The WIDL formalism, an extension of the IDL-expressions formalism of Nederhof and Satta (2004), has several crucial properties that differentiate it from previously-proposed NLG
sim-ple syntax (expressions are built using four oper-ators) and a simple, formal semantics (probability distributions over finite sets of strings) Second,
it is a compact representation that grows linearly
1105
Trang 2in the number of words available for generation
(see Section 2) (In contrast, representations such
as word lattices (Knight and Hatzivassiloglou,
1995) or non-recursive CFGs (Langkilde-Geary,
2002) require exponential space in the number
of words available for generation (Nederhof and
Satta, 2004).) Third, it has good computational
properties, such as optimal algorithms for
inter-section with -gram language models (Section 3)
Fourth, it is flexible with respect to the amount of
linguistic processing required to produce
WIDL-expressions directly from text (Sections 4 and 5)
Fifth, it allows for a tight integration of
input-specific preferences and target-language
prefer-ences via interpolation of probability distributions
effec-tiveness of our proposal by directly employing
a generic WIDL-based generation system in two
end-to-end tasks: machine translation and
auto-matic headline generation
2 The WIDL Representation Language
2.1 WIDL-expressions
In this section, we introduce WIDL-expressions, a
formal language used to compactly represent
prob-ability distributions over finite sets of strings
, atomic
a probability distribution
# $%'&)(
Complex WIDL-expressions are created from
other WIDL-expressions, by employing the
fol-lowing four operators, as well as operator
Weighted Disjunction. If 87
%595959:%
<; are
%595959-%
B;,
&D%595959:%
!
# $%'&)(
, specified
@KO 3CPRQ/S
, is a
# $%'&)(
, where
V
4XW<7
* PY, and the
&[Z]\^Z
%ba
,
@ed^fg hji'k l-monphji'k n)q , its semantics is a
%ba ,
x {
d i'k l and
r sIt u~v xzy|{wxĂA{
x {
di'k n
Precedence. If 7
semantics is a probability distribution
S!
# $%'&)(
strings that obey the precedence imposed over the arguments, and the probability values are
7Ư>?b50`
%ba
,c dfgBh[i-k nq , andÂƠ
>?:0
%K
,c
d2fgFhi'k 'mon hi-k 'q , then Ư7 Ê
%ba
%ba|
sIt u~v xzy|{wxX}MẢb{
x {
g di'k l , r
sDt u~v xzy{wxX}MI{
x {
x { di-k Đn , etc.
Weighted Interleave. If87
%595959:%
<; are
BÂ
%595959:%
<;, with
@5fbMăÁđâMê'ôâMêXơưẶq)fKẶwÁ5ẨbẪâwẶwqhẬà i'mbgwÈ,ẺÉẼ°1ẸDỀp; ,
?@ O 3CA01
, is a WIDL-expression Its semantics is a probability
# $%'&)(
, where
consists of all the possible interleavings of
&ẾZả\·Z
, and the
over ẺỄḚ̃1ẸDỀỈp; (the set of all permutations of
Be-cause the set of argument permutations is a
specify the probability mass for the strings that
aư%
-, c d_f5gẵnh i-k l)i'm*MăwÁđâđê5ôâMêXơưẶặắẩầẫ h ấ ậẩè ẻ i-kẽgKé5mẶÁ5ẨbẪ8âwẶẵắầẩẫ h ấ ậẩè ẻ i-k i)éq , its
5`
aư%
`A
by r sDt u~v xzy{wxX}đĂAẢb{
g8n
d[i'k l)i , r sIt u~v xzy|{wxẢb}đĂA{
Ð0ẹbề ậXểẽễếXèRệẩếXè ẻ5Ứzì
di-kẽgbé , rDsIt u~vxzy|{wxX}MẢwĂD{
Ð0ẹbề
ỨỉễắíĩDếzỨíì
di'k i)é
Lock. If *ị is a WIDL-expression, then ]
ịX is a WIDL-expression The semantic
that contains strings in which no
0` Ê
-, c
@[d,fgÞnÌh i-k l)i'm|đăwÁđâMê5ô)âMêXơưẶòhải-k nMiq , its semantics is a
+-5`
ao%
g8n
i-k l)i , r sIt u~v xzy|{wxẢb}MĂI{
Ð0ẹđề ậẩểễXếXèRệXếXè ẻỨỏì
di-k nMi
In Figure 1, we show a more complex
&oó
; from a probability mass of 0.7, it assigns uniformly, for each of the remaining
ó½ọồƯ&
$9X&ở
The
Trang 3¢?~: ¿ºMÀÁ² ¾5» I±D²Á ±-º¢ >?A ¾ »:º -ºwºM± ¾Mb z²Á
3:7+:á
&oâ
$9
Dº)»:±D²½¼P±D² ¾
JÛL
$9 Á%
¾5»ÁÀ'Âñ¾!
J!ÛL
$9X&
3 +
$9#"
á !
$9Ùâ åÁ.
Figure 1: An example of a WIDL-expression
remaining probability mass of 0.1 is left for the
12 shuffles associated with the unlocked
expres-sion z² , for a shuffle probability of C)ç7
7¹
$9Î$A$%$
¾bº)²
%('
¿¾bº)²
proba-bility distribution defined by our example:
rebels fighting turkish government in iraq 0.130
in iraq attacked rebels turkish goverment 0.049
in turkish goverment iraq rebels fighting 0.005
The following result characterizes an important
representation property for WIDL-expressions
Theorem 1 A WIDL-expression over
and 6
using atomic expressions has space complexity
O( ), if the operator distribution functions of
have space complexity at most O( ).
For proofs and more details regarding
to (Soricut, 2006) Theorem 1 ensures that
high-complexity hypothesis spaces can be represented
efficiently by WIDL-expressions (Section 5)
2.2 WIDL-graphs and Probabilistic
Finite-State Acceptors
WIDL-graphs. Equivalent at the representation
level with WIDL-expressions, WIDL-graphs
al-low for formulations of algorithms that process
we illustrate in Figure 2(a) the WIDL-graph
cor-responding to the WIDL-expression in Figure 1
? , 1
with out-going edges
?~ , 6
?¹ ,
? ,
probability distribution The domain of this
dis-tribution is the finite collection of strings that can
be generated from the paths of a WIDL-specific
Each path (and its associated string) has a proba-bility value induced by the probaproba-bility distribution
WIDL-graphs and Probabilistic FSA. Proba-bilistic finite-state acceptors (pFSA) are a well-known formalism for representing probability
set of WIDL-graph vertices that can be reached simultaneously when traversing the graph State
which state
+PC;+04<+D
?b
áÁ.
(at the bottom)
(see the WIDL-graph in Figure 2(a)), by first
?~ , 6 2
?b ,
:P7 and :: in 9
in the
-labeled transitions in this path For example, transition
+AC;+04<+D
?b
áÁ.
?>A@>CB D
+AC<+½754<+A
?~
áÁ.
+½7¹C
#>C@>AB D
! +½7b7GE !H+½7¹
!I+½754 (Figure 2(a)) A
%595959-%
in the
!M+
N* ,
&Z«\ Z
(see transition
+O,
%~(
+AC<+/;+AbC
?~;)w?~.
), or if
%595959-%
P:Y
!Q+
R* ,
& Z´\FZ
Trang 4" $
45454
45454
65656
65656
75757
75757
85858
85858
95959 95959 :5:5:
:5:5:
;5;5;5;
;5;5;5;
<5<5<5<
<5<5<5<
attacked
attacked
attacked attacked
rebels
rebels
rebels
rebels
rebels rebels
rebels fighting
fighting fighting
fighting
turkish
turkish
turkish turkish
turkish
turkish turkish government
government
government
government
in
iraq in
in
in
iraq iraq iraq
iraq iraq
ε ε δ1
government
turkish
:0.3
:0.3
:1
:1
rebels
:0.2 :1
fighting
:1
rebels
:1 δ1
:0.18
:0.18
:1
rebels
:1
rebels
:1
ε
0 6 21
0 6 23 0 9 23
0 9 20
0 1520
6 20 2
0 21
s
e
(b) (a)
rebels
rebels fighting
(
δ1
δ1
δ1
δ1 δ1
δ1
δ2
1
2 3 2 1 1
3
) 1
δ2
attacked
in iraq
ε ε
ε
ε
turkish 1 1 government 1
2
v
v v
v
1
v
v6
19
5
0 6 20
0 23 δ1
0 23
[v , ]
0 23 δ1
0 23
[v v v ,<32]
[v v v ,<32]
[v v v ,<3]
[v v v ,<3] [v v v ,<32]
[v v v ,<0] [v v v ,<32]
[v v v ,<2]
[v v v ,<2]
[v , ]
[v v v ,<1 ] ε
ε
δ1 δ1
δ1
δ1 δ1
δ1
δ1 δ1 δ1
δ1 0.1 }
shuffles 0.7, δ1= { 2 1 3 0.2, other perms
δ2 = { 1 0.65, 2 0.35 }
δ1 [v v v ,< > ]δ1
[v v v ,< 0 > ] [v v v ,< 321 > ]δ1δ1
Figure 2: The WIDL-graph corresponding to the WIDL-expression in Figure 1 is shown in (a) The probabilistic finite-state acceptor (pFSA) that corresponds to the WIDL-graph is shown in (b)
are responsible for adding and removing,
respec-tively, the
to (Soricut, 2006)
3 Stochastic Language Generation from
WIDL-expressions
3.1 Interpolating Probability Distributions in
a Log-linear Framework
Let us assume a finite set
of strings over a
, representing the set of possi-ble sentence realizations In a log-linear
> > 7
95959
>@? ), and a vector of parameters A¶
& 95959
, the interpolated
model as in Equation 1:
C·DBI*
EGFIH
W|C
A > DBI
.LK EGFIH
W|C
> DB
We can formulate the search problem of finding
as shown in Equation 2, and therefore we do not
need to be concerned about computing expensive
normalization factors
`NMPORQÃ`
CÞDB:*T`NMPOSQÃ`
EGFIH
W|C
A > DB:
(2)
we want to employ may be added in Equation 2 as
\UT&
3.2 Algorithms for Intersecting WIDL-expressions with Language Models
the search problem defined by Equation 2 for a
%595959:%
(i.e., on-demand computation of the
uses an admissible heuristic function to compute
cur-rent bdc5eGfNgih into a priority queue m , which sorts
is the one defined in (Soricut and Marcu, 2005), using Equation 1 (unnormalized) for computing the event costs Given the existence of the ad-missible heuristic and the monotonicity property
Trang 5WIDL-NGLM-AV*
A
1 XZY5[^]o_la +
+%,
+A.
2
X
3 while
X
4 do bIc\ejfNgkh UNFOLD *
XZY5[^]o_laD
A|
6 if XZY\[^]`_Na +
+0.
+A.
X
8 for each Xl[ a inbIc\ejfNgih
doPUSH
Xl[ aP
X Y\[^]`_Na POP
9 returnXZY5[^]o_la
WIDL-expressions with -gram language models
computed only partially, for those states for
which the total cost is less than the cost of the
optimal path This results in important savings,
both in space and time, over simply running a
single-source shortest-path algorithm for directed
acyclic graphs (Cormen et al., 2001) over the full
4 Headline Generation using
WIDL-expressions
We employ the WIDL formalism (Section 2) and
summarization application that aims at producing
both informative and fluent headlines Our
head-lines are generated in an abstractive, bottom-up
manner, starting from words and phrases A more
common, extractive approach operates top-down,
by starting from an extracted sentence that is
com-pressed (Dorr et al., 2003) and annotated with
ad-ditional information (Zajic et al., 2004)
Automatic Creation of WIDL-expressions for
Headline Generation. We generate
WIDL-expressions starting from an input document
First, we extract a weighted list of topic keywords
from the input document using the algorithm of
with phrases created from the lexical
dependen-cies the topic keywords have in the input
docu-ment We associate probability distributions with
these phrases using their frequency (we assume
kurdish 0.17,turkish 0.14,attack 0.10
Phrases
iraq + in iraq 0.4,northern iraq 0.5,iraq and iran 0.1 ,
syria + into syria 0.6,and syria 0.4
rebels + attacked rebels 0.7,rebels fighting 0.3
WIDL-expression & trigram interpolation
TURKISH GOVERNMENT ATTACKED REBELS IN IRAQ AND SYRIA
Figure 4: Input and output for our automatic head-line generation system
that higher frequency is indicative of increased im-portance) and their position in the document (we assume that proximity to the beginning of the doc-ument is also indicative of importance) In Fig-ure 4, we present an example of input keywords and lexical-dependency phrases automatically ex-tracted from a document describing incidents at the Turkey-Iraq border
WIDL-expressions combines the lexical-dependency
the associated probability values for each phrase multiplied with the probability value of each
expressions into a single WIDL-expression using
WIDL-expression in Figure 1 is a (scaled-down) example
of the expressions created by this algorithm
On average, a WIDL-expression created by this
keywords and an average
lexical-dependency phrases per keyword, compactly encodes a candidate set of about 3 million possible realizations As the specification
Theorem 1 guarantees that the space complexity
Finally, we generate headlines from
algo-rithm, which interpolates the probability distribu-tions represented by the WIDL-expressions with -gram language model distributions The output presented in Figure 4 is the most likely headline realization produced by our system
Headline Generation Evaluation. To evaluate the accuracy of our headline generation system,
we use the documents from the DUC 2003
are used as development set (283 documents),
Trang 6ALG (uni) (bi) Len Rouge Rouge
Extractive
Abstractive
Table 1: Headline generation evaluation We
com-pare extractive algorithms against abstractive
al-gorithms, including our WIDL-based algorithm
and the other half is used as test set (273
docu-ments) We automatically measure performance
by comparing the produced headlines against one
reference headline produced by a human using
For each input document, we train two language
models, using the SRI Language Model Toolkit
(with modified Kneser-Ney smoothing) A
gen-eral trigram language model, trained on 170M
English words from the Wall Street Journal, is
used to model fluency A document-specific
tri-gram language model, trained on-the-fly for each
input document, accounts for both fluency and
content validity We also employ a word-count
model (which counts the number of words in a
proposed realization) and a phrase-count model
(which counts the number of phrases in a proposed
realization), which allow us to learn to produce
headlines that have restrictions in the number of
words allowed (10, in our case) The interpolation
objective function, on the development set
The results are presented in Table 1 We
com-pare the performance of several extractive
algo-rithms (which operate on an extracted sentence
to arrive at a headline) against several abstractive
algorithms (which create headlines starting from
is a baseline which simply proposes as headline
the lead sentence, cut after the first 10 words
HedgeTrimmer
is our implementation of the Hedge
our implementation of the Topiary system (Zajic
WATER IS LINK BETWEEN CLUSTER OF E COLI CASES SRI LANKA ’S JOINT VENTURE TO EXPAND EXPORTS OPPOSITION TO EUROPEAN UNION SINGLE CURRENCY EURO
OF INDIA AND BANGLADESH WATER BARRAGE
Figure 5: Headlines generated automatically using
a WIDL-based sentence realization system
This evaluation shows that our WIDL-based approach to generation is capable of obtaining headlines that compare favorably, in both content and fluency, with extractive, state-of-the-art re-sults (Zajic et al., 2004), while it outperforms a previously-proposed abstractive system by a wide margin (Zhou and Hovy, 2003) Also note that our evaluation makes these results directly compara-ble, as they use the same parsing and topic identi-fication algorithms In Figure 5, we present a sam-ple of headlines produced by our system, which includes both good and not-so-good outputs
5 Machine Translation using WIDL-expressions
We also employ our WIDL-based realization en-gine in a machine translation application that uses
a two-phase generation approach: in a first phase, WIDL-expressions representing large sets of pos-sible translations are created from input foreign-language sentences In a second phase, we use our generic, WIDL-based sentence realization
-gram language model In the experiments reported here, we translate between Chinese (source lan-guage) and English (target lanlan-guage)
Automatic Creation of WIDL-expressions for
Chi-nese strings by exploiting a phrase-based trans-lation table (Koehn et al., 2003) We use an al-gorithm resembling probabilistic bottom-up pars-ing to build a WIDL-expression for an input
“con-stituent”, and the “non-terminals” associated with each constituent are the English phrase
Trang 7
"!
.12#
3+'"43567 ( 8 %
#9
#:%( 7 :%( 7 ,:%( 7
->@?
BDC EF G C A H
I J
K)L
M>N?
JOI K; PQ&K; KPKR; K; TQ)
>@?
JMI
KR;
P&K;
JU K;
JV K; KWR T
K; PQKR;
KK; PTK; PXR
K;
WK;
QK;
K;
W&K;
US K; P L
K;
JU
K;
JJ K;
JV KR; KW)
>N?
JOI K; XQ&K; PPKR; TQ&K; XP)
UYI
K;
XK; TWK; T KR; T T
K; TPKR; QKK; QPK; TXR
K;
KK; PPK;
KKR;
K;
TKR;
X&K; TTK;
JJ
WIDL-expression & trigram interpolation
gunman was killed by police
Figure 6: A Chinese string is converted into a
WIDL-expression, which provides a translation as
the best scoring hypothesis under the interpolation
with a trigram language model
out low probability translation alternatives At this
Tiles that are adjacent are joined together in
+P¼P±D² ¾
>[R\ LJ] \>
op-erators (assigned non-zero probability), but the
longer the movement from the original order of
the tiles, the lower the probability (This
distor-tion model is similar with the one used in (Koehn,
2004).) When multiple tiles are available for the
distri-butions specified in the translation table Usually,
statistical phrase-based translation tables specify
not only one, but multiple distributions that
experi-ments, we consider four probability distributions:
^`_ BD
%('
B_ ^o
%('Ya
'ca
they appear in the translation table In Figure 6,
we show an example of WIDL-expression created
On average, a WIDL-expression created by this
â0$
tiles per sentence (for an average input sentence length of
trans-lations per tile, encodes a candidate set of about
10
guarantees that these WIDL-expressions encode
In the second phase, we employ our WIDL-based realization engine to interpolate the distri-bution probabilities of WIDL-expressions with a trigram language model In the notation of
%595959:%
> for the WIDL-expression distributions (one for each probability distribution encoded); a feature func-tion >Of for a trigram language model; a feature
for a word-count model, and a feature
As acknowledged in the Machine Translation
not usually possible, due to the large size of the search spaces We therefore use an
, which considers for unfolding only the nodes extracted
a path of length greater than or equal to the
QÃTá in the experiments reported here)
MT Performance Evaluation. When evaluated against the state-of-the-art, phrase-based decoder Pharaoh (Koehn, 2004), using the same experi-mental conditions – translation table trained on the FBIS corpus (7.2M Chinese words and 9.2M English words of parallel text), trigram lan-guage model trained on 155M words of English
trained using discriminative training (Och, 2003) (on the 2002 NIST MT evaluation set),
– and BLEU (Papineni et al., 2002) as our
translations that have a BLEU score of 0.2570, while Pharaoh translations have a BLEU score of 0.2635 The difference is not statistically signifi-cant at 95% confidence level
These results show that the WIDL-based ap-proach to machine translation is powerful enough
to achieve translation accuracy comparable with state-of-the-art systems in machine translation
6 Conclusions
The approach to sentence realization we advocate
in this paper relies on WIDL-expressions, a for-mal language with convenient theoretical proper-ties that can accommodate a wide range of gener-ation scenarios In the worst case, one can work with simple bags of words that encode no context
Trang 8preferences (Soricut and Marcu, 2005) One can
also work with bags of words and phrases that
en-code context preferences, a scenario that applies to
current approaches in statistical machine
transla-tion (Sectransla-tion 5) And one can also encode context
and ordering preferences typically used in
summa-rization (Section 4)
The generation engine we describe enables
a tight coupling of content selection with
sen-tence realization preferences Its algorithm comes
with theoretical guarantees about its optimality
Because the requirements for producing
WIDL-expressions are minimal, our WIDL-based
genera-tion engine can be employed, with state-of-the-art
results, in a variety of text-to-text applications
Acknowledgments This work was partially
sup-ported under the GALE program of the Defense
Advanced Research Projects Agency, Contract
No HR0011-06-C-0022
References
Srinivas Bangalore and Owen Rambow 2000 Using
TAG, a tree model, and a language model for
gen-eration In Proceedings of the Fifth International
Workshop on Tree-Adjoining Grammars (TAG+).
Thomas H Cormen, Charles E Leiserson, Ronald L.
Rivest, and Clifford Stein 2001 Introduction to
Algorithms The MIT Press and McGraw-Hill.
Simon Corston-Oliver, Michael Gamon, Eric K
Ring-ger, and Robert Moore 2002 An overview of
Amalgam: A machine-learned generation module.
In Proceedings of the INLG.
Bonnie Dorr, David Zajic, and Richard Schwartz.
2003 Hedge trimmer: a parse-and-trim approach
to headline generation In Proceedings of the
HLT-NAACL Text Summarization Workshop, pages 1–8.
Michael Elhadad 1991 FUF User manual — version
5.0 Technical Report CUCS-038-91, Department
of Computer Science, Columbia University.
Ulrich Germann, Mike Jahr, Kevin Knight, Daniel
Marcu, and Kenji Yamada 2003 Fast decoding and
optimal decoding for machine translation Artificial
Intelligence, 154(1–2):127-143.
Nizar Habash 2003 Matador: A large-scale
Spanish-English GHMT system In Proceedings of AMTA.
J Hajic, M Cmejrek, B Dorr, Y Ding, J Eisner,
D Gildea, T Koo, K Parton, G Penn, D Radev,
and O Rambow 2002 Natural language
genera-tion in the context of machine translagenera-tion Summer
workshop final report, Johns Hopkins University.
K Knight and V Hatzivassiloglou 1995 Two level,
many-path generation In Proceedings of the ACL.
Philipp Koehn, Franz J Och, and Daniel Marcu 2003.
Statistical phrase based translation In Proceedings
of the HLT-NAACL, pages 127–133.
Philipp Koehn 2004 Pharaoh: a beam search decoder for phrase-based statistical machine transltion
mod-els In Proceedings of the AMTA, pages 115–124.
I Langkilde-Geary 2002 A foundation for general-purpose natural language generation: sentence re-alization using probabilistic models of language.
Ph.D thesis, University of Southern California Chin-Yew Lin 2004 ROUGE: a package for
auto-matic evaluation of summaries In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).
Christian Matthiessen and John Bateman 1991.
Text Generation and Systemic-Functional Linguis-tic Pinter Publishers, London.
Mehryar Mohri, Fernando Pereira, and Michael Ri-ley 2002 Weighted finite-state transducers in speech recognition. Computer Speech and Lan-guage, 16(1):69–88.
Mark-Jan Nederhof and Giorgio Satta 2004 IDL-expressions: a formalism for representing and pars-ing finite languages in natural language processpars-ing.
Journal of Artificial Intelligence Research, pages
287–317.
Franz Josef Och 2003 Minimum error rate training
in statistical machine translation In Proceedings of the ACL, pages 160–167.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu 2002 BLEU: a method for automatic
evaluation of machine translation In In Proceedings
of the ACL, pages 311–318.
Stuart Russell and Peter Norvig 1995 Artificial Intel-ligence A Modern Approach Prentice Hall.
Radu Soricut and Daniel Marcu 2005 Towards devel-oping generation algorithms for text-to-text
applica-tions In Proceedings of the ACL, pages 66–74 Radu Soricut 2006 Natural Language Generation for Text-to-Text Applications Using an Information-Slim Representation Ph.D thesis, University of
South-ern California.
David Zajic, Bonnie J Dorr, and Richard Schwartz.
2004 BBN/UMD at DUC-2004: Topiary In Pro-ceedings of the NAACL Workshop on Document Un-derstanding, pages 112–119.
Liang Zhou and Eduard Hovy 2003 Headline sum-marization at ISI. In Proceedings of the NAACL Workshop on Document Understanding.