Polynomial Systems and Context-free Grammars
Last time, we saw that a couple of very simple languages... balancing parentheses, same amounts of \(a\)s as \(b\), etc... are nonregular! By Kleene's Theorem, this amounts to the statement that there is no state in any finite automaton that accepts them (how rude!). The general slogan you hear people say about these languages is that they "require memory to recognize". This is true (and in a precise sense), but I think it is much easier to understand these languages from a systems-of-equations perspective. So that's what we are going to do today.
We have already seen one way of generalizing our left-affine systems of equations: by allowing them to be unguarded (for there to be terms \(r_{ij}x_i\) where \(r_{ij}\) does have the empty word property). Because of the dagger construction, we can always transform one of these unguarded systems into a guarded one, so this generalization didn't buy us a larger family of languages than \(\mathsf{Rel}\). Today we are going to generalize even more: we're going to let go of "left-affine" and replace it with "polynomial". This will lead us to a much bigger family of languages, including some languages you will be very familiar with.
Remember that a solution to this equation is a language \(L \subseteq A^*\) such that \[ L = \{\varepsilon\} \cup (\{a\} \cdot L \cdot \{b\}) \hspace{4em}\text{(*)} \] and that the least solution is the solution that is contained in all other solutions. The language \(L_{a = b}\) satisfies this equation: we can read directly from \(\text{(*)}\) that \(\varepsilon \in L_{a = b}\) (set \(n = 0\)) and for any \(n \in \mathbb N\), \(a(a^nb^n)b = a^{n+1}b^{n+1} \in L_{a=b}\).
To see that \(L_{a = b}\) is the least solution, we need to check that it only contains the words it has to. To that end, let's consider an arbitrary solution \(L\). We know that \(\varepsilon \in L\), because the right-hand side explicitly requires \(\varepsilon \in L\). Now, since \(\varepsilon \in L\), the second term on the right-hand side tells us that \[ a \varepsilon b = ab \in L \] Again, since \(ab \in L\), the equation also tells us that \[ a(ab)b = a^2b^2 \in L \] Continuing to plug words in \(L\) back into \(a \cdot (-) \cdot b\), it's not too hard to see that \(L_{a = b} \subseteq L\). This tells us that \(L_{a = b}\) is the smallest language that satisfies this equation!
In the example above, we just argued something very important: going from "left-affine" to "polynomial" does buy us more languages! That is, there are polynomial systems of equations whose least solutions are not regular.
- \(\emptyset\), \(\varepsilon\), and \(a\) are polynomials for any \(a \in A\)
- \(x\) is a polynomial expression for every \(x \in X\)
- If \(p_1(x_1, \dots, x_n), p_2(x_1, \dots, x_n)\) are polynomial expressions, then so are
- \(p_1(x_1, \dots, x_n) + p_2(x_1, \dots, x_n)\)
- \(p_1(x_1, \dots, x_n) \cdot p_2(x_1, \dots, x_n)\)
A polynomial system of equations in \(X\) variables is a system of equations of the form \[\begin{aligned} x_1 &= p_1(x_1, \dots, x_n) \\ &\ \vdots \\ x_n &= p_n(x_1, \dots, x_n) \end{aligned}\] where \(x_1, \dots, x_n \in X\) and \(p_i(x_1,\dots, x_n) \in \mathit{Poly}(X)\) for each \(i=1,\dots, n\). A solution to the polynomial system of equations in \(X\) variables above is a sequence of languages \(L_1, \dots, L_n\) such that \[\begin{aligned} L_1 &= p_1(L_1, \dots, L_n) \\ &\ \vdots \\ L_n &= p_n(L_1, \dots, L_n) \end{aligned}\] That is, when you plug \(L_i\) into \(x_i\) for each \(x_i \in X\), then the equations between languages you obtain are true. A least solution to the polynomial system of equations in \(X\) variables above is a solution \(U_1, \dots, U_n\) such that for any solution \(L_1, \dots, L_n\), \(U_i \subseteq L_i\) for all \(i = 1, \dots, n\).
The following individual exercise shows us that every regular language is the least solution to some polynomial system of equations.
The thing about polynomial systems is that they're hard to solve. And unfortunately, Arden's Rules can't save us when it comes to finding these least solutions. We need to use a slightly different but more direct approach.
Monomials and Grammar-Ready Polynomials
The more direct approach just uses the same type of reasoning as we have been using up until now (including in the A=B: Resurrection example): the words that are in the least solution are the ones that must be there. So, if we have an equation \[ x = \varepsilon + axb + baxb \qquad (\dagger) \] in our system, and we know that a word \(w\) is in the least solution this equation, then we also know that \(awb\) and \(bawb\) are in the least solution as well. From left to right, we can rewrite \(w\) in either way \[ w \to awb \quad w \to bawb \] without leaving the least solution. It stands to reason that all of the words in the least solution can be derived this way: write \(L\) for the least solution to \((\dagger)\). We know that \(\varepsilon \in L\), just by looking at the right hand side. We also know (from what we said above) that \(a\varepsilon b = ab\) and \(ba\varepsilon b = bab\) are in \(L\).
However, this "rewriting" approach to finding all of the words in a least solution only works in \((\dagger)\) because the right-hand side is written in a certain way: there is a \(+\) separating each of the different possible ways of discovering new words in the least solution. Let's make this more precise.
The set \((X \cup A)^*\) consists of all of the words you can form by expanding your alphabet to include \(X\). That is, every monomial expression \(\mu \in (X \cup A)^*\) is of the form \[ \mu = u_0 x_1 u_1 x_2 \cdots x_n u_n \] for some variables \(x_1, \dots, x_n \in X\) and \(u_0, \dots, u_n \in A^*\). In other words, monomials are polynomials formed without using the \(+\) formation rule.
For example, all of \[ axb \qquad xxyabxybc + \varepsilon \qquad by + \varepsilon x y a \] are in grammar-ready form, but \[ ax(b + y) \qquad xx(a + b)xybc + \varepsilon \qquad b(y + \varepsilon) x y a \] are not. However, every polynomial expression has a grammar-ready equivalent.
As you can probably guess, the translation from non-grammar-ready to grammar-ready is always the same sequence of steps. This is the basic idea behind proving the following lemma.
Grammars
Finding the least solution to a polynomial system of equations is always a matter of determining which words have to be in every solution. This is always a matter of plugging in words we have already determined are in the language into the right-hand side of the equation. There is a more systematic way of writing this process down: it's called a rewrite rule. Roughly speaking, a set of rewrite rules is called a grammar.
- A set \(X\) of variables
- An alphabet \(A\) of letters
- A set of rewrite rules \(R\), which relates variables to monomial expressions over \(A\) in \(X\) variables, \[ R \subseteq X \times (X \cup A)^* \]
For example, consider the grammar \[ \mathcal G = (\{x\}, \{a, b\}, \{(x, \varepsilon), (x, axb)\}) \] This grammar has \(X = \{x\}\) for its single variable, \(A = \{a, b\}\) for its alphabet, and two rewrite rules, \(x \to \varepsilon\) and \(x \to axb\). The more typical way you would see this grammar written as a kind of table, as in \[\begin{aligned} x &\to \varepsilon \mid axb \end{aligned}\] Here, the \(\mid\) just indicates a separation into one of several rules. When there are multiple variables in the grammar, multiple lines will appear: for example, the grammar \[ \mathcal G = (\{x, y\}, \{a, b\}, \{(x, \varepsilon), (x, ayb), (y, bxa)\}) \] can equivalently be written as \[\begin{aligned} x &\to \varepsilon \mid ayb \\ y &\to bxa \end{aligned}\]
Rewrite rules are essentially just translations of variables into words. This translation extends to a larger rewriting system that allows us to transform words into words.
In the future, if \(\mathcal G\) is clear from context, we are always going to write rules and reqrites as \(x \to \mu\) and \(\mu_1 \Rightarrow \mu_2\) instead of \(x \to_R \mu\) and \(\mu_1 \Rightarrow \mu_2\) (that is, the subscripts aren't strictly necessary).
- What are the variables \(X\)?
- What is the alphabet \(A\)?
- Derive the word \(aabba\) from the variable \(x\).
- What are the three shortest words that can be derived from \(x\) in \(\mathcal G\)?
Similar to how an automaton accepts words, in a grammar we can derive them.
The family of context-free languages is written \[ \mathsf{CFL} = \{L \subseteq A^* \mid L\text{ is context-free} \} \]
- \(L_1 = \{(ab)^n \mid n \in \mathbb N\}\)
- \(L_2 = \{a^nb^m \mid n < m\}\)
- \(L_3 = \{a^nb^{2n + 1} \mid n \in \mathbb N\}\)
- \(L_4 = \{w \mid w = w^{\mathrm{op}} \text{ (i.e., \(w\) is a palindrome)}\}\)
- Write down a grammar \(\mathcal G = (X, A, R)\) with a variable that derives \(L\), i.e., for some \(x \in X\), \(\mathcal L(\mathcal G, x) = L\).
- Use your grammar to derive each of the words in (*).
- Describe what prevents each of the words in (**) from being derivable from your grammar \(\mathcal G\).
It was probably mentioned at some point that a lot of the initial developments of the theory of computation we've seen so far was done by linguists. It turns out that context-free grammars are much older than we initially thought: we've been writing language grammars like this since antiquity.
- \(\texttt{the nice girl likes the mean dog}\)
- \(\texttt{the boy likes the nice boy}\)
- \(\texttt{the mean boy hates the dog}\)
Anyway. It should hopefully be clear at this point that grammars are precisely the tools we need to understand the least solutions to polynomial systems of equations!