CSCI 341 Theory of Computation

Fall 2025, with Schmid

Polynomial Systems and Context-free Grammars

Last time, we saw that a couple of very simple languages... balancing parentheses, same amounts of \(a\)s as \(b\), etc... are nonregular! By Kleene's Theorem, this amounts to the statement that there is no state in any finite automaton that accepts them (how rude!). The general slogan you hear people say about these languages is that they "require memory to recognize". This is true (and in a precise sense), but I think it is much easier to understand these languages from a systems-of-equations perspective. So that's what we are going to do today.

We have already seen one way of generalizing our left-affine systems of equations: by allowing them to be unguarded (for there to be terms \(r_{ij}x_i\) where \(r_{ij}\) does have the empty word property). Because of the dagger construction, we can always transform one of these unguarded systems into a guarded one, so this generalization didn't buy us a larger family of languages than \(\mathsf{Rel}\). Today we are going to generalize even more: we're going to let go of "left-affine" and replace it with "polynomial". This will lead us to a much bigger family of languages, including some languages you will be very familiar with.

(A = B: Resurrection) Let's start with a simple example and return to the language \[ L_{a = b} = \{a^n b^n \mid n \in \mathbb N\} \] from before. We already saw in the previous lecture that \(L \notin \mathsf{Reg}\) (we used the Pumping Lemma for this). But it can still be written as a least solution to a system of equations. In fact, it is the least solution to a single equation: \[ x = \varepsilon + a x b \] Let's see how this works.

Remember that a solution to this equation is a language \(L \subseteq A^*\) such that \[ L = \{\varepsilon\} \cup (\{a\} \cdot L \cdot \{b\}) \hspace{4em}\text{(*)} \] and that the least solution is the solution that is contained in all other solutions. The language \(L_{a = b}\) satisfies this equation: we can read directly from \(\text{(*)}\) that \(\varepsilon \in L_{a = b}\) (set \(n = 0\)) and for any \(n \in \mathbb N\), \(a(a^nb^n)b = a^{n+1}b^{n+1} \in L_{a=b}\).

To see that \(L_{a = b}\) is the least solution, we need to check that it only contains the words it has to. To that end, let's consider an arbitrary solution \(L\). We know that \(\varepsilon \in L\), because the right-hand side explicitly requires \(\varepsilon \in L\). Now, since \(\varepsilon \in L\), the second term on the right-hand side tells us that \[ a \varepsilon b = ab \in L \] Again, since \(ab \in L\), the equation also tells us that \[ a(ab)b = a^2b^2 \in L \] Continuing to plug words in \(L\) back into \(a \cdot (-) \cdot b\), it's not too hard to see that \(L_{a = b} \subseteq L\). This tells us that \(L_{a = b}\) is the smallest language that satisfies this equation!

In the example above, we just argued something very important: going from "left-affine" to "polynomial" does buy us more languages! That is, there are polynomial systems of equations whose least solutions are not regular.

(A = B: Resurrection: Resurrection) Rephrase the reasoning in the example above so that it is an explicit proof by induction of the inclusion \(L_{a = b} \subseteq L\) for any solution \(L\) to \(x = \varepsilon + axb\).

This is a proof by induction on \(n\): the goal is to show that \(a^nb^n \in L\) for all \(n \in \mathbb N\). The base case has been done for you in the example.

(Polynomial Expressions) Let \(A\) be an alphabet of input letters, and let \(X = \{x_1, \cdots, x_n\}\) be a set of variables (which you can think of as taking languages as values). A polynomial expression (over \(A\)) in \(X\) variables is an expression formed from the following formation rules:

\(\emptyset\), \(\varepsilon\), and \(a\) are polynomials for any \(a \in A\)
\(x\) is a polynomial expression for every \(x \in X\)
If \(p_1(x_1, \dots, x_n), p_2(x_1, \dots, x_n)\) are polynomial expressions, then so are
1. \(p_1(x_1, \dots, x_n) + p_2(x_1, \dots, x_n)\)
2. \(p_1(x_1, \dots, x_n) \cdot p_2(x_1, \dots, x_n)\)

Above, we write \(p(x_1, \dots, x_n)\) to emphasize that the variables in the polynomial expression that appear are \(x_1, \dots, x_n \in X\). The set of all polynomial expressions in \(X\) variables is written \(\mathit{Poly}(X)\).

A polynomial system of equations in \(X\) variables is a system of equations of the form \[\begin{aligned} x_1 &= p_1(x_1, \dots, x_n) \\ &\ \vdots \\ x_n &= p_n(x_1, \dots, x_n) \end{aligned}\] where \(x_1, \dots, x_n \in X\) and \(p_i(x_1,\dots, x_n) \in \mathit{Poly}(X)\) for each \(i=1,\dots, n\). A solution to the polynomial system of equations in \(X\) variables above is a sequence of languages \(L_1, \dots, L_n\) such that \[\begin{aligned} L_1 &= p_1(L_1, \dots, L_n) \\ &\ \vdots \\ L_n &= p_n(L_1, \dots, L_n) \end{aligned}\] That is, when you plug \(L_i\) into \(x_i\) for each \(x_i \in X\), then the equations between languages you obtain are true. A least solution to the polynomial system of equations in \(X\) variables above is a solution \(U_1, \dots, U_n\) such that for any solution \(L_1, \dots, L_n\), \(U_i \subseteq L_i\) for all \(i = 1, \dots, n\).

The following individual exercise shows us that every regular language is the least solution to some polynomial system of equations.

(Left-Affine is Polynomial) Show that every left-affine system of equations is a polynomial system of equations. Use Kleene's Theorem to explain why this means that every regular language is the least solution to some polynomial system of equations.

Set up the system of equations by using the formation rules for polynomial expressions to obtain left-affine equations.

(A=?) Use the formation rules for polynomial expressions to write the polynomial \[ p(x) = (\varepsilon + a(x + b)x)b \] Now find \(5\) different words that appear in the least solution to the polynomial equation \(x = p(x)\) in \(\{x\}\) variables.

Use Salomaa's axioms to simplify the polynomial up to language equivalence. What's the smallest word in the language?

(A=??) Use the formation rules for polynomial expressions to write the polynomial \[ p(x) = \varepsilon + (a + b)x(a + b) \] Describe the least solution to the equation.

Use Salomaa's axioms to simplify the polynomial up to language equivalence. What's the smallest word in the language?

The thing about polynomial systems is that they're hard to solve. And unfortunately, Arden's Rules can't save us when it comes to finding these least solutions. We need to use a slightly different but more direct approach.

Monomials and Grammar-Ready Polynomials

The more direct approach just uses the same type of reasoning as we have been using up until now (including in the A=B: Resurrection example): the words that are in the least solution are the ones that must be there. So, if we have an equation \[ x = \varepsilon + axb + baxb \qquad (\dagger) \] in our system, and we know that a word \(w\) is in the least solution this equation, then we also know that \(awb\) and \(bawb\) are in the least solution as well. From left to right, we can rewrite \(w\) in either way \[ w \to awb \quad w \to bawb \] without leaving the least solution. It stands to reason that all of the words in the least solution can be derived this way: write \(L\) for the least solution to \((\dagger)\). We know that \(\varepsilon \in L\), just by looking at the right hand side. We also know (from what we said above) that \(a\varepsilon b = ab\) and \(ba\varepsilon b = bab\) are in \(L\).

However, this "rewriting" approach to finding all of the words in a least solution only works in \((\dagger)\) because the right-hand side is written in a certain way: there is a \(+\) separating each of the different possible ways of discovering new words in the least solution. Let's make this more precise.

(Monomial) Let \(A\) be an alphabet and \(X\) be a set of variables and assume that \(A \cap X = \{\}\) (there is no overlap between input letters and variables---this is an assumption we will always make). Then a monomial expression is a word \(\mu \in (X \cup A)^*\). We write \(\mu(x_1, \dots, x_n)\) to emphasize that \(x_1, \dots, x_n\) are the variables that appear in \(\mu\).

The set \((X \cup A)^*\) consists of all of the words you can form by expanding your alphabet to include \(X\). That is, every monomial expression \(\mu \in (X \cup A)^*\) is of the form \[ \mu = u_0 x_1 u_1 x_2 \cdots x_n u_n \] for some variables \(x_1, \dots, x_n \in X\) and \(u_0, \dots, u_n \in A^*\). In other words, monomials are polynomials formed without using the \(+\) formation rule.

(Grammar Ready) Given a polynomial expression \(p(x_1, \dots, x_n)\) over \(A\) in \(X\) variables, we say that \(p\) is in grammar-ready form if there are monomial expressions \(\mu_1, \dots, \mu_k \in (X \cup A)^*\) such that \[ p = \mu_1 + \mu_2 + \dots + \mu_k \] (Note that this is syntactic equality, not languguage equivalence.) A polynomial system of equations is grammar-ready if every polynomial expression that appears in it is in grammar-ready form.

For example, all of \[ axb \qquad xxyabxybc + \varepsilon \qquad by + \varepsilon x y a \] are in grammar-ready form, but \[ ax(b + y) \qquad xx(a + b)xybc + \varepsilon \qquad b(y + \varepsilon) x y a \] are not. However, every polynomial expression has a grammar-ready equivalent.

(Getting Ready for School) For each polynomial expression below, \[ p_1(x, y) = ax(b + y) \qquad p_2(x, y) = xx(a + b)xybc + \varepsilon \qquad p_3(x, y) = b(y + \varepsilon) x y a \] find a language equivalent polynomial expression in grammar-ready form, i.e., \(q_i(x, y)\) that is in grammar-ready form and \[ p_i(x, y) =_{\mathcal L} q_i(x, y) \]

As you can probably guess, the translation from non-grammar-ready to grammar-ready is always the same sequence of steps. This is the basic idea behind proving the following lemma.

(Grammar-Readiness) Every polynomial system of equations has the same least solution as a grammar-ready polynomial system of equations.

Grammars

Finding the least solution to a polynomial system of equations is always a matter of determining which words have to be in every solution. This is always a matter of plugging in words we have already determined are in the language into the right-hand side of the equation. There is a more systematic way of writing this process down: it's called a rewrite rule. Roughly speaking, a set of rewrite rules is called a grammar.

(Context-Free Grammar) A context-free grammar (or simply put, a grammar) is a tuple \(\mathcal G = (X, A, R)\) consisting of

A set \(X\) of variables
An alphabet \(A\) of letters
A set of rewrite rules \(R\), which relates variables to monomial expressions over \(A\) in \(X\) variables, \[ R \subseteq X \times (X \cup A)^* \]

Given a variable \(x \in X\) and a monomial expression \(\mu\), we usually write \(x \to_R \mu\) to mean \((x, \mu) \in R\). If \(X\) and \(R\) are finite, then we say that \(\mathcal G\) is finite.

For example, consider the grammar \[ \mathcal G = (\{x\}, \{a, b\}, \{(x, \varepsilon), (x, axb)\}) \] This grammar has \(X = \{x\}\) for its single variable, \(A = \{a, b\}\) for its alphabet, and two rewrite rules, \(x \to \varepsilon\) and \(x \to axb\). The more typical way you would see this grammar written as a kind of table, as in \[\begin{aligned} x &\to \varepsilon \mid axb \end{aligned}\] Here, the \(\mid\) just indicates a separation into one of several rules. When there are multiple variables in the grammar, multiple lines will appear: for example, the grammar \[ \mathcal G = (\{x, y\}, \{a, b\}, \{(x, \varepsilon), (x, ayb), (y, bxa)\}) \] can equivalently be written as \[\begin{aligned} x &\to \varepsilon \mid ayb \\ y &\to bxa \end{aligned}\]

Rewrite rules are essentially just translations of variables into words. This translation extends to a larger rewriting system that allows us to transform words into words.

(Derivation) Let \(\mathcal G = (X, A, R)\) be a grammar. The rewrite relation in \(\mathcal G\) is the relation \[ {\Rightarrow_{\mathcal G}} \subseteq (X \cup A)^* \times (X \cup A)^* \] defined as follows: for any monomial expressions \[ \mu_1(x_1, \cdots, x_n), \mu_2(x_1, \dots, x_n) \in (X \cup A)^* \] we can rewrite \(\mu_1 \Rightarrow_{\mathcal G} \mu_2\) if and only if there is a rewrite rule \(x_i \to_R \nu\) and \[ \mu_2 = \mu_1(x_1, \dots, \overbrace{\nu}^{\text{in place of \(x_i\)}}, \dots, x_n) \] The multistep rewrite relation in \(\mathcal G\) is the relation \(\Rightarrow_{\mathcal G}^*\) defined so that for any monomial expressions \(\mu\) and \(\nu\), \(\mu \Rightarrow_{\mathcal G}^* \nu\) if and only if there is a sequence of rewrites \[ \mu \Rightarrow_{\mathcal G} \mu_1 \Rightarrow_{\mathcal G} \cdots \Rightarrow_{\mathcal G} \mu_n = \nu \] We say that \(\mu \Rightarrow_{\mathcal G}^* \nu\) is a derivation if \(\nu\) contains no variables, i.e., \(\nu = w \in A^*\) is a word from \(A\).

In the future, if \(\mathcal G\) is clear from context, we are always going to write rules and reqrites as \(x \to \mu\) and \(\mu_1 \Rightarrow \mu_2\) instead of \(x \to_R \mu\) and \(\mu_1 \Rightarrow \mu_2\) (that is, the subscripts aren't strictly necessary).

(Deriving A=B) In the grammar \[ \mathcal G = (\{x\}, \{a, b\}, \{(x, \varepsilon), (x, axb)\}) \] we can derive the string \(aabb\) by applying the rewrite rules as follows: \[ x \mathrel{\overbrace{\Rightarrow}^{x \to axb}} axb \mathrel{\overbrace{\Rightarrow}^{x \to axb}} aaxbb \mathrel{\overbrace{\Rightarrow}^{x \to \varepsilon}} aa\varepsilon bb = aabb \] This witnesses that \(x \Rightarrow^* aab\). Furthermore, this sequence of rewrites is a derivation because \(aabb\) is a word in \(A^*\).

(A-B Swap) Consider the grammar below again: \[\begin{aligned} x &\to \varepsilon \mid ayb \\ y &\to bxa \end{aligned}\] In this grammar, we can perform the rewrites \[ x \Rightarrow ayb \Rightarrow abxab \Rightarrow a b a y b a b \] However, this is not a derivation, because the variable \(y\) appears in the monomial expression at the end of the sequence of rewrites. If that sequence of rewrites took a different turn, we'd get \[ x \Rightarrow ayb \Rightarrow abxab \Rightarrow ab\varepsilon ab = abab \] This is a derivation because there are no variables in \(abab\).

(Some Practice Deriving) Consider the grammar \(\mathcal G = (X, A, R)\) below: \[\begin{aligned} x &\to \varepsilon \mid ax \mid yba \mid bxb \\ y &\to \varepsilon \mid ayy \mid zb \\ z &\to yx \mid az \\ \end{aligned}\]

What are the variables \(X\)?
What is the alphabet \(A\)?
Derive the word \(aabba\) from the variable \(x\).
What are the three shortest words that can be derived from \(x\) in \(\mathcal G\)?

Similar to how an automaton accepts words, in a grammar we can derive them.

(Derived, Context-free) Given a context-free grammar \(\mathcal G = (X, A, R)\) and a variable \(x \in X\), the language derived from \(x\) is the language \[ \mathcal L(\mathcal G, x) = \{w \in A^* \mid \text{there is a derivation } x \Rightarrow_{\mathcal G}^* w\} \] A language \(L \subseteq A^*\) is called a context-free language if there is a context-free grammar \(\mathcal G\) with a variable \(x\) such that \(\mathcal L(\mathcal G, x) = L\), i.e., it is the language derived from a variable in a finite context-free grammar.

The family of context-free languages is written \[ \mathsf{CFL} = \{L \subseteq A^* \mid L\text{ is context-free} \} \]

(Designing Basic Grammars) Show that following languages over \(A = \{a,b\}\) are context-free by designing a finite grammar that derives them.

\(L_1 = \{(ab)^n \mid n \in \mathbb N\}\)
\(L_2 = \{a^nb^m \mid n < m\}\)
\(L_3 = \{a^nb^{2n + 1} \mid n \in \mathbb N\}\)
\(L_4 = \{w \mid w = w^{\mathrm{op}} \text{ (i.e., \(w\) is a palindrome)}\}\)

(Balancing Act) Recall that a string of parentheses is balanced if every left parenthese \(\mathtt{(}\) is eventually followed by a right parenthese \(\mathtt{)}\). But things get more complicated when there are other alternatives to parentheses: what about square brackets? Or curly ones? If we take \[ A = \big\{ \mathtt{(}, \mathtt{)}, \mathtt{[}, \mathtt{]}, \mathtt{\{}, \mathtt{\}}, \mathtt{\langle}, \mathtt{\rangle} \big\} \] then we say that a string of brackets \(w \in A^*\) is balanced if every left bracket of a given type is eventually followed by a right bracket of the same type, without being interrupted by an unmatched right bracket of a different type. For example, these are all balanced: \[ \mathtt{ \{()\}() \qquad [] \qquad [()\langle()\rangle] \qquad [()\{()\}] } \hspace{4em}\text{(*)} \] but these are not: \[ \mathtt{ ([)] \qquad \{()() \qquad ] \qquad \langle[()()\rangle] \qquad [()\{(\})] } \] Let \(L \subseteq A^*\) be the language of balanced strings of brackets.

Write down a grammar \(\mathcal G = (X, A, R)\) with a variable that derives \(L\), i.e., for some \(x \in X\), \(\mathcal L(\mathcal G, x) = L\).
Use your grammar to derive each of the words in (*).
Describe what prevents each of the words in (**) from being derivable from your grammar \(\mathcal G\).

It was probably mentioned at some point that a lot of the initial developments of the theory of computation we've seen so far was done by linguists. It turns out that context-free grammars are much older than we initially thought: we've been writing language grammars like this since antiquity.

(Forming Sentences) Consider the grammar \(\mathcal G = (X, A, R)\) below. \[\begin{aligned} S &\to N_P V_P N_P \\ N_P &\to BAN \mid BN \\ A &\to \texttt{nice} \mid \texttt{mean} \\ B &\to \texttt{the} \mid \texttt{a} \\ N &\to \texttt{boy} \mid \texttt{girl} \mid \texttt{dog} \\ V_P &\to \texttt{likes} \mid \texttt{hates} \end{aligned}\] (Here, \(V_P\) and \(N_P\) are variables.) What are \(X\), \(A\), and \(R\)?

Derive the following sentences in \(\mathcal G\):

\(\texttt{the nice girl likes the mean dog}\)
\(\texttt{the boy likes the nice boy}\)
\(\texttt{the mean boy hates the dog}\)

Anyway. It should hopefully be clear at this point that grammars are precisely the tools we need to understand the least solutions to polynomial systems of equations!

(Grammars and Polynomials) Let \(A\) be an alphabet and \(X\) a set of variables. Consider the grammar-ready polynomial system of equations below: \[\begin{aligned} x_1 &= \mu_{11} + \mu_{12} + \cdots + \mu_{1n} \\ &\hspace{2em} \vdots \\ x_n &= \mu_{n1} + \mu_{n2} + \cdots + \mu_{nn} \end{aligned} \qquad (*)\] Above, \(\mu_{ij} \in (X \cup A)^*\) is a monomial expression for each \(i,j\). Let \(\mathcal G = (X, A, R)\) be the (necessarily finite) grammar \[\begin{aligned} x_1 &\to \mu_{11} \mid \mu_{12} \mid \cdots \mid \mu_{1n} \\ &\hspace{2em} \vdots \\ x_n &\to \mu_{n1} \mid \mu_{n2} \mid \cdots \mid \mu_{nn} \end{aligned}\] Then \(\mathcal L(\mathcal G, x_1), \mathcal L(\mathcal G, x_2), \dots, \mathcal L(\mathcal G, x_n)\) is the least solution to the polynomial system of equations \((*)\).

(Conversion to Grammar) Consider the polynomial equation \[ x = (\varepsilon + a(x + b)x)b \] Find a grammar with a variable that derives the least solution to this equation.

← pumping lengths parse trees →

Top