Imitation Learning
Meta-learning for action-based models:
improve classifier by generating training from demonstrations
For each training instance:
For each training instance:
Or perform rollouts to cost the actions:
To represent the syntax of a sentence as directed labeled arcs between words.
\begin{align} & \textbf{Input:} \; D_{train} = \{(\mathbf{x}^1,\mathbf{y}^1)...(\mathbf{x}^M,\mathbf{y}^M)\}, \; \text{expert}\; \pi^{\star}, \; \text{classifier} \; H\\ & \text{set training examples}\; \cal E = \emptyset ,\; \color{red}{\pi^{\star}\; \mathrm{probability}\; \beta=1}\\ & \mathbf{while}\; \text{termination condition not reached}\; \mathbf{do}\\ & \quad \color{red}{\text{set rollin policy} \; \pi^{in} = \beta\pi^{\star} + (1-\beta)H}\\ & \quad \mathbf{for} \; (\mathbf{x},\mathbf{y}) \in D_{train} \; \mathbf{do}\\ & \quad \quad \color{red}{\text{generate trajectory} \; \hat \alpha_1\dots\hat \alpha_T = \pi^{in}(\mathbf{x},\mathbf{y})}\\ & \quad \quad \mathbf{for} \; \hat \alpha_t \in \hat \alpha_1\dots\hat \alpha_T \; \mathbf{do}\\ & \quad \quad \quad \color{red}{\text{ask expert for } \underline{\text{a set of best actions}}\; \{\alpha_{1}^{\star}\dots\alpha_{k}^{\star}\} = \pi^{\star}(\mathbf{x},S_{t-1})} \\ & \quad \quad \quad \text{extract features} \; \mathit{feat}=\phi(\mathbf{x},S_{t-1}) \\ & \quad \quad \quad \cal E = \cal E \cup (\mathit{feat},\alpha^{\star})\\ & \quad \text{learn}\; H \; \text{from}\; \cal E\\ & \quad \color{red}{\text{decrease} \; \beta}\\ \end{align}
To apply Imitation Learning on any task, we need to define:
We can assume any transition-based system (e.g. Arc-Eager).
State: arcs, stack, and buffer.
Action space:
Shift, Reduce, Arc-Left, and Arc-Right.</span>
The length of the transition sequence is variable.
Hamming loss: given predicted arcs, how many parents and labels were incorrectly predicted?
Returns the best action at the current state by looking at the gold standard assuming future actions are also optimal:
$$\alpha^{\star}=\pi^{\star}(S_t, \mathbf{y}) = \mathop{\arg \min}_{\alpha \in {\cal A}} L(S_t(\alpha,\pi^{\star}),\mathbf{y})$$We can derive a static transition sequence from initial to terminal state using the golden standard.
Static expert does not take previous actions into account.
Static expert has not encountered this state before.
A static expert may be sufficient for tasks where we do not care whether the previous actions were optimal.
Let's assume that we rollin using the classifier.
Static expert policy cannot recover from errors in the rollin.
Stack: 'her'
Buffer: 'a', 'letter', '.'
Two possible actions: Reduce 'her' / Shift 'a'
Static expert policy arbitarilly choses.
Determines best action, by considering the previous actions.
Reachable terminal state:
For each possible action at a time-step:
For each possible action at a time-step:
For each possible action at a time-step:
Instead of full expert rollout, we may use heuristics!