The natural language processing task of generating text from a meaning representation.
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{blue}{\text{type = "hotel"}}\\ & \color{green}{\text{count = "182"}}\\ & \color{red}{\text{dogs_allowed = dont_care}} \end{align}
There are 182 hotels if you do not care whether dogs are allowed.
State: meaning representation, incomplete sentence.
Action space:
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{serves = food}\\ & \text{eattype = restaurant}\\ & \text{near = riverside}\\ & \text{area = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name}} \text{ = X-name-1}\\ & \text{serves = food}\\ & \text{eattype = restaurant}\\ & \text{near = riverside}\\ & \text{area = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name}}\text{ = X-name-1}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \text{near = riverside}\\ & \text{area = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name}}\text{ = X-name-1}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype}}\text{ = restaurant}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype = restaurant}}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype = restaurant}}\\ & \color{red}{\textbf{near}}\text{ = riverside}\\ & \color{red}{\textbf{area}}\text{ = X-area-1, X-area-2} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{red}{\textbf{name = X-name-1}}\\ & \text{serves = food}\\ & \color{red}{\textbf{eattype = restaurant}}\\ & \color{red}{\textbf{near = riverside}}\\ & \color{red}{\textbf{area = X-area-1, X-area-2}} \end{align}
Word actions are chosen based on content actions.
Different classifiers are required for content and word actions.
BLEU: % of n-grams predicted present in the gold standard,
i.e. $L=1-BLEU(s_{final}, \mathbf{y})$</span></li>
Same as NLG's evaluation metric.
Content actions are ignored by loss function.
No explicit supervision on how each word is aligned to which attribute.
The loss function also penalizes undesirable behaviour:
For word actions:
If content actions:
Need alignments between MR and words in gold standard.
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{blue}{\text{type = "hotel"}}\\ & \color{green}{\text{count = "182"}}\\ & \color{red}{\text{dogs_allowed = dont_care}} \end{align}
There are 182 hotels if you do not care whether dogs are allowed.
The expert policy depends on naive heuristics.
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \color{blue}{\text{type = "hotel"}}\\ & \color{green}{\text{count = "182"}}\\ & \color{red}{\text{dogs_allowed = false}} \end{align}
There are 182 hotels if you are traveling without animals.
Imitating the suboptimal expert policy is not enough.
\begin{align} & \textbf{Input:} \; D_{train} = \{(\mathbf{x}^1,\mathbf{y}^1)...(\mathbf{x}^M,\mathbf{y}^M)\}, \; expert\; \pi^{\star}, \; loss \; function \; L\\ & \text{set} \; training\; examples\; \cal E = \emptyset\\ & \color{red}{\text{initialize a classifier } H_{0}}\\ & \mathbf{for}\; i = 0 \;\mathbf{to} \; N\; \mathbf{do}\\ & \quad \color{red}{\text{set} \; rollin \; policy \; \pi^{in} = H_i}\\ & \quad \color{red}{\text{set} \; rollout \; policy \; \pi^{out} = mix(H_i,\pi^{\star})}\\ & \quad \mathbf{for} \; (\mathbf{x},\mathbf{y}) \in D_{train} \; \mathbf{do}\\ & \quad \quad \text{rollin to predict} \; \hat \alpha_1\dots\hat \alpha_T = \pi^{in}(\mathbf{x},\mathbf{y})\\ & \quad \quad \mathbf{for} \; \hat \alpha_t \in \hat \alpha_1\dots\hat \alpha_T \; \mathbf{do}\\ & \quad \quad \quad \text{rollout to obtain costs}\; c \; \text{for all possible actions using}\; L\; \\ & \quad \quad \quad \text{extract features}\; f=\phi(\mathbf{x},S_{t-1}) \\ & \quad \quad \quad \cal E = \cal E \cup (f,c)\\ & \quad \color{red}{\text{learn classifier} \; H_{i+1} \; \text{from}\; \cal E}\\ & \color{red}{H = \; \mathbf{avg}\{H_{0}\mathbf{\dots}H_{N}\}}\\ \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{eattype = restaurant}\\ & \text{food = chinese} \end{align}
Exponential decay schedule
Introduced in SEARN (Daumé III et al., 2009)
LOLS uses the same policy throughout each rollout.
Rollin with the classifiers:
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{dogs_allowed = yes} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{dogs_allowed = yes} \end{align}
\begin{align} & \text{Predicate: INFORM}\\ & \text{______________________}\\ & \text{name = X-name-1}\\ & \text{dogs_allowed = yes} \end{align}
After a suboptimal action, we apply sequence correction:
After a suboptimal action, we apply sequence correction:
After a suboptimal action, we apply sequence correction:
If suboptimal actions are encountered further in the new sequence, sequence correction may again be performed.
Before sequence correction, allow at most $E$ actions after the suboptimal one.