Context-free = Stack Recognizable
This lecture is going to be of a familiar nature, I think. We are going to show that two families of languages are equal.
Given an alphabet \(A\) of input letters, recall that a language \(L \subseteq A^*\) is context-free if there is a (context-free) grammar \(\mathcal G = (X, A, R)\) with a variable \(x \in X\) such that \(\mathcal L(\mathcal G, x) = L\). And also, that the family of context-free languages is written \(\mathsf{CFL}\). Recall that \(L\) isstack recognizable if there is a stack automaton with finitely many states \(\mathcal S = (Q, A, \Sigma, \delta, F)\) with a state \(x\in Q\) such that \(\mathcal L(\mathcal S, x) = L\), and that the family of stack recognizable languages is written \(\mathsf{Stack}\). What we are going to prove is the following theorem:
The proof proceeds in the usual way one proves that two sets are equal: we are going to show that \(\mathsf{CFL} \subseteq \mathsf{Stack}\) and then we are going to show that \(\mathsf{CFL} \supseteq \mathsf{Stack}\).
Every Context-free Language is a Stack Recognizable Language
Let's start by showing that \(\mathsf{CFL} \subseteq \mathsf{Stack}\). As might be expected, this is going to involve a construction that turns a grammar into a finite stack automaton.
The basic idea behind the construction can be framed as matching the obligations of a derivation. Every rewrite rule \(y \to \mu\) is going to correspond to a new obligation that the derivation has to fulfill. Note that this implies that both \(A\) and \(X\) are going to be contained in our set of stack symbols. This rewrite rule is modelled in the stack automaton with a path that pops \(y\) from the stack and pushes the characters of \(\mu\) to the stack in reverse order (because stacks are last-in-first-out) and pop each letter as it is being read. This captures that every derivation, say \(x_0 \Rightarrow \mu_1 \Rightarrow \cdots \Rightarrow \mu_n \Rightarrow w\), must end with a word \(w \in A^*\) (as opposed to a monomial expression \(\mu \in (X \cup A)^*\)). Every rewrite step, say \(\mu_i \Rightarrow \mu_{i+1}\), eliminates a variable from the previous expression, but it may introduce new ones. If a rewrite step introduces a variable to the derivation, i.e., \(\mu_{i+1}\) has a variable in it, the deriver is obliged to later remove that variable with a rewrite rule. Formally, the construction looks like this:
- For each \(a \in A\), add a self-loop \(s_{\circlearrowleft} \xrightarrow{a \mid {\uparrow} a} s_{\circlearrowleft}\).
- For each rewrite rule \(y \to b_1b_2\cdots b_{n-1}b_n\), where \(y \in X\) and \(b_i \in X \cup A\) for each \(i\), add a cycle \[ s_{\circlearrowleft} \xrightarrow{{\uparrow} y} s_{n+1} \xrightarrow{{\downarrow} b_n} s_{n} \xrightarrow{{\downarrow} b_{n-1}} s_{n-1} \xrightarrow{{\downarrow} b_{n-2}} \cdots \xrightarrow{{\downarrow} b_{2}} s_2 \xrightarrow{{\downarrow} b_{1}} s_{{\circlearrowleft}} \] Note that this cycle only runs stack programs, it does not read any letters.
Note the extra symbol \(\bot\), called "bottom". The role it plays in the structure of the automaton is going to be a running theme in later types of automata: it is really just there to delineate between two different parts of the program. In this case, it delineates between states in which the stack can be empty and states where it cannot.
- The word is \(w_1 = abc\), and the grammar \[\mathcal G_1 = \left\{ \begin{aligned} x &\to axy \mid \varepsilon \\ y &\to byc \mid \varepsilon \end{aligned} \right\} \]
- The word is \(w_2 = 0110\), and the grammar \[\mathcal G_2 = \left\{ \begin{aligned} x &\to 0x1 \mid 1 x 0 \mid \varepsilon \end{aligned} \right\} \]
- Use the Grammar-to-Automaton construction to design a stack automaton with a state that accepts \(L\).
- Can you think of a smaller stack automaton with a state that accepts \(L\)?
Every Stack Recognizable Language is a Context-free Language
Now let's show that \(\mathsf{Stack} \subseteq \mathsf{CFL}\). This is going to involve a construction that turns a stack automaton into a grammar. In fact, as we will see, a stack automaton is kind of already itself a compact representation of a grammar. But be warned, the word "compact" here is an understatement: the grammar corresponding to a stack automaton can be extremely large compared to the original automaton (although still finite). In particular, if the stack automaton has \(n\) states, the grammar corresponding to it has \(n^2\) variables (although a smaller grammar may be possible) and \(O(n^3)\) many rules.
- The variables are \(X = \{v_{xy} \mid x,y \in Q\}\). (The variable \(v_{xy}\) is just a formal name for the pair \((x,y)\).)
-
The rules \(R\) of the grammar are of three types:
- For every pair of identical states, we have the rewrite rule \[ v_{xx} \to \varepsilon \] that lets you eliminate it.
- For every triple of states, \(x,y,z \in Q\), we have the rewrite rule \[ v_{xz} \to v_{xy}v_{yz} \] that lets you split the variable into "intermediate paths".
- For every pair of transitions \(x \xrightarrow{a \mid p} z\) and \(u \xrightarrow{b \mid q} y\) in \(\mathcal S\) such that \(p{.}q = \mathtt{skip}\), we have the rewrite rule \[ v_{xy} \to av_{zu}b \]
This might seem a bit opaque at first, so let's start with the idea behind this construction. The main idea behind the construction is that for any states \(x,y \in Q\), if \(y \in F\), i.e., \(y\) is an accepting state of \(\mathcal S\), then \[ \mathcal L(\mathcal S, x) = \mathcal L(\mathcal G, v_{xy}) \] Technically, this will require some conditions on \(\mathcal S\) to get working, but for now let's stick to understanding the idea. Every accepting run of the automaton, now, should match a left-most derivation starting with the variable \(v_{xy}\). Let's start from the state \(x \in Q\) and look at such a run of the automaton: \[ x \xrightarrow{a_1 \mid p_1} x_1 \xrightarrow{a_2 \mid p_2} \cdots \xrightarrow{a_{n-1} \mid p_n} x_{n-1} \xrightarrow{a_n \mid p_n} y \] We would like to somehow turn this into a derivation in our grammar. The tactic this construction takes is to reduce our derivation to one of several types of shorter derivations (think about how mathematical induction works):
- In the simplest case, \(n = 0\) and \(y = x\). There are no shorter derivations, so this is a kind of "base case" for our reduction to shorter derivations. This situation suggests we include the rewrite rule \[ v_{xx} \to \varepsilon \]
- In the second case, if the stack is empty at some intermediate step in the run, i.e., \(p_1\dots p_i = \mathtt{skip}\) for some \(i < n\), then we would like to reduce our current derivation to two shorter derivations that we concatenate afterward: one for the first half of the run and one for the second, as in \[ \overbrace{x \xrightarrow{a_1 \mid p_1} \cdots \xrightarrow{a_{i} \mid p_i}}^{v_{xx_i}}~ x_{i} ~ \overbrace{\xrightarrow{a_{i+1} \mid p_{i+1}} \cdots \xrightarrow{a_n \mid p_n} y}^{v_{x_iy}} \] This suggests the rewrite rule \[ v_{xy} \to v_{xx_i} v_{x_iy} \]
- In the final case, the stack is only empty at the beginning and the end, never empty in-between (formally, this means that \(p_1\dots p_i = \mathtt{skip}\) if and only if \(i = 0\) or \(i = n\)). In this case, we would like it to be true that \(p_1p_n = \mathtt{skip}\) because \(p_n\) is the only stack program that has the opportunity to undo \(p_1\). In this case, we would also like it to be true that \(p_2 \dots p_{n-1} = \mathtt{skip}\). In such a case, we could add a derivation rule of the form \[ v_{xy} \to a_1 v_{x_1 x_{n-1}} a_n \] and reduce our derivation to one that starts with the variable \(v_{x_1 x_{n-1}}\).
Unfortunately, Statement 1 is not always true! Even if we take \(n = 3\), Statement 1 is not true.
- \(p_1p_2p_3 = \mathtt{skip}\)
- \(p_1 \neq \mathtt{skip}\)
- \(p_2 \neq \mathtt{skip}\)
- \(p_1p_2 \neq \mathtt{skip}\)
- \(p_1p_3 \neq \mathtt{skip}\)
What do we need to assume about our stack automaton to ensure that Statement 1 holds? Here is a partial answer:
- \(p_1 p_n = \mathtt{skip}\)
- \(p_2\dots p_{n-1} = \mathtt{skip}\)
- For any transition \(x \xrightarrow{\xi \mid p} y\) in \(\mathcal S\), \(p \in \{\mathtt{push}~\sigma, \mathtt{pop}~\sigma \mid \sigma \in \Sigma\}\), i.e., it is a basic stack program.
- There is exactly one accepting state, \(F = \{x_{acc}\}\).
- In the base case, \(n = 0\), and \(a_1 \cdots a_n = \varepsilon\). The empty run is a run that starts and ends with the empty stack. Also, in \(\mathcal G\), we have the rule \(v_{xx} \to \varepsilon\), so \(v_{xy} \Rightarrow^* \varepsilon\) as required.
-
For an induction hypothesis, assume that for any \(k < n\), the Claim holds (with \(k\) in place of \(n\)).
Let's start by showing the "if" direction. Suppose there is a run \[ x \xrightarrow{a_1 \mid p_1} x_1 \xrightarrow{a_2 \mid p_2} \cdots \xrightarrow{a_n \mid p_n} x_n = y \] such that \(p_1 \dots p_n = \mathtt{skip}\). There are two cases to consider:
- In the first case, there is some \(k < n\) such that \(p_1\dots p_k = \mathtt{skip}\) (the stack becomes empty before the end of the run). This means there are runs \[\begin{gathered} x \xrightarrow{a_1 \mid p_1} x_1 \xrightarrow{a_2 \mid p_2} \cdots \xrightarrow{a_k \mid p_k} x_k \\ x_{k} \xrightarrow{a_{k+1} \mid p_{k+1}} x_{k+1} \xrightarrow{a_{k+2} \mid p_{k+2}} \cdots \xrightarrow{a_n \mid p_n} x_n = y \end{gathered}\] such that \(p_1 \dots p_k = p_{k+1} \dots p_n = \mathtt{skip}\). But \(k < n\) and \((n-k) < n\), so we can apply the induction hypothesis to both: By the induction hypothesis, \[ v_{xx_k} \Rightarrow^* a_1\cdots a_k \qquad v_{x_ky} \Rightarrow^* a_{k+1} \cdots a_n \] In \(\mathcal G\), we therefore have the derivation \[\begin{aligned} v_{xy} &\Rightarrow v_{xx_k} v_{x_ky} &&(v_{xy} \to v_{xx_k} v_{x_ky}) \\ &\Rightarrow^* a_1 \cdots a_k v_{x_ky} &&\text{(ind. hyp.)} \\ &\Rightarrow^* a_1 \cdots a_k a_{k+1} \cdots a_n &&\text{(ind. hyp.)} \\ &= w \end{aligned}\]
- In the second case, \(p_1 \dots p_i = \mathtt{skip}\) implies \(i = 0\) or \(i = n\), i.e., the stack is empty only at the start and the end. Since \(\mathcal S\) is in grammar-ready form, every \(p_i\) is a basic stack program, either \(\mathtt{push}~\sigma\) or \(\mathtt{pop}~\sigma\). From the Basic Stack Programs Unite problem, we know that \(p_1 = \mathtt{push}~\sigma\) and \(p_n = \mathtt{pop}~\sigma\) for some \(\sigma \in \Sigma\). Hence, we have \[ x \xrightarrow{a_1 \mid {\downarrow}\sigma} x_1 \text{ and } x_{n-1} \xrightarrow{a_n \mid {\uparrow}\sigma} y \] In \(\mathcal G\), we therefore have the rewrite rule \[ v_{xy} \to a_1 v_{x_1x_{n-1}} a_n \] It therefore suffices to see that \(v_{x_1x_{n-1}} \Rightarrow^* a_2 \cdots a_{n-1}\). This follows from the induction hypothesis and Basic Stack Programs Unite: since \(p_2 \dots p_{n-1} = \mathtt{stack}\), \(x_1 \xrightarrow{a_2 \mid p_2} \cdots \xrightarrow{a_{n-1}\mid p_{n-1}} x_{n-1}\) begins and ends with an empty stack. By the induction hypothesis, \(v_{x_1x_{n-1}} \Rightarrow^* a_2 \cdots a_{n-1}\), so \[\begin{aligned} v_{xy} &\Rightarrow a_1 v_{x_1x_{n-1}} a_n &&(v_{xy} \to a_1 v_{x_1x_{n-1}} a_n) \\ &\Rightarrow a_1 a_2 \cdots a_{n-1} a_n &&\text{(ind. hyp.)} \\ &= w \end{aligned}\]
Now let's show the "only if" direction. Suppose \(v_{xy} \Rightarrow^* a_1 \cdots a_n\). Then there are three ways this derivation could start: either \(x=y\) and it starts with \(n = 0\) (we have already covered this case in the base case), it starts with \(v_{xy} \to v_{xz}v_{zy}\), or it starts with \(v_{xy} \to a_1v_{zu} a_n\). We consider the latter two cases separately.
- Suppose \(v_{xy} \Rightarrow^* a_1 \cdots a_n\) starts with the rule \(v_{xy} \to v_{xz}v_{zy}\). Then there is a \(k < n\) such that \(v_{xz} \Rightarrow^* a_1 \cdots a_k\) and \(v_{zy} \Rightarrow^* a_{k+1} \cdots a_n\). By the induction hypothesis, there are runs \[\begin{gathered} x \xrightarrow{a_1 \mid p_1} \cdots \xrightarrow{a_k \mid p_k} z \\ \hspace{4em} z \xrightarrow{a_{i+1} \mid p_{i+1}} \cdots \xrightarrow{a_n \mid p_n} y \end{gathered}\] that begin and end with an empty stack. Chain these together to complete this step.
- Suppose that \(v_{xy} \Rightarrow^* a_1 \cdots a_n\) starts with \(v_{xy} \to a_1v_{zu} a_n\). Then \(v_{zu} \Rightarrow^* a_2 \cdots a_{n-1}\) and for some \(\sigma \in \Sigma\), there are transitions \(x \xrightarrow{a_1 \mid {\downarrow}\sigma} z\) and \(u \xrightarrow{a_n \mid {\uparrow}\sigma}\) \(v_{zu}\). The induction hypothesis tells us that there is a run \[ z \xrightarrow{a_{2} \mid p_{2}} \cdots \xrightarrow{a_{n-1} \mid p_{n-1}} u \] that begins and ends with an empty stack. Slapping the other two transitions on, we have \[ x \xrightarrow{a_1 \mid {\downarrow}\sigma} z \xrightarrow{a_{2} \mid p_{2}} \cdots \xrightarrow{a_{n-1} \mid p_{n-1}} u \xrightarrow{a_n \mid {\uparrow}\sigma} \] This run begins and ends with an empty stack, so the proof is complete.
OK, this is all good, but didn't we make some assumptions about \(\mathcal S\)? Well, it turns out that every stack automaton is equivalent to one in grammar-ready form.
The three lemmas we have seen above prove the CFL=Stack theorem stated at the beginning.
Consequences of CFL=Stack
Knowing that context-free languages are the same as stack recognizable languages opens up our study of context-free languages by allowing us to use automata-theoretic tools to study them. In particular, we can make use of constructions like the product construction from earlier, although it's not as simply applied as before.
Here is the construction of \(\mathcal S \otimes \mathcal A\): the states of the automaton are \[ Q_1 \times Q_2 = \{(x, y) \mid x \in Q_1 \text{ and } y \in Q_2\} \] The alphabet is \(A\) and the stack symbols are \(\Sigma\). The transition relation is given by the following rule: we have \[ (x, y) \xrightarrow{a \mid p} (x', y') \] if and only if \[ x \xrightarrow{a \mid p} x' \text{ and } y \xrightarrow{a} y' \] The final states are \[ F = \{(x, y) \mid x \in F_1 \text{ and } y \in F_2\} \] The following claim finishes the proof.
- Use the Intersection-Product conctruction to design a stack automaton \(\mathcal S = (Q, A, \Sigma, \delta, F)\) with a state \(x\in Q\) such that \(L = \mathcal L(\mathcal S, x)\).
- Design a grammar \(\mathcal G = (X, A, R)\) with a variable that derives \(L\).
We can also make use of closure properties of \(\mathsf{CFL}\) to study \(\mathsf{Stack}\).
- Write down three words in each of \(L_1\) and \(L_2\).
- Use the Grammar-to-Automaton construction to turn each of \(\mathcal G_1\) and \(\mathcal G_2\) into stack automata with a state that accepts \(L_1\) and \(L_2\) respectively.
- Design a grammar \(\mathcal G_3\) with a variable that derives \(L_1 \cup L_2\), and use the Grammar-to-Automaton construction on \(\mathcal G_3\) to build a stack automaton with a state that accepts \(L_1 \cup L_2\).
- Design a grammar \(\mathcal G_4\) with a variable that derives \(L_1 L_2\), and use the Grammar-to-Automaton construction on \(\mathcal G_4\) to build a stack automaton with a state that accepts \(L_1L_2\).