Finite and Infinite Automata
If you can write down the whole state diagram of an automaton, then we call it a finite automaton, because it has a finite number of states and transitions. If there are not finitely many states and transitions, then we call the automaton infinite. As we are about to see, there are some very natural examples of infinite automata!
By "natural", we mean something specific, actually: in the examples below, states are not "abstract" in the same way as we were writing before, when we wrote "\(x_0\)", or "\(s_1\)", or whatever. The states in the examples are little bit like the states of a puzzle game: they carry some concrete interpretation, and the same can be said about the transitions between these states. For a quick example, recall that the state of a maze is determined by the position of the player---the position is the semantics of the state. Then the transitions between the states are the moves the player was allowed to make---the allowable moves are the semantics of the transitions. (The word semantics is a fancy word for "meaning".)
The Brzozowski Construction
We defined languages abstractly, as arbitrary sets of words from an alphabet. But then we talked about languages accepted by states of automata. Are these the same? In other words, is every language the language accepted by some automaton? As a matter of fact, the answer to this question is "yes"! And moreover, it just takes one automaton to see this in action. The automaton we are going to construct below is called the "Brzozowski automaton", and it is very infinite.
The basic idea is that the set of all languages is the state space of an automaton. That is, we are going to define an automaton whose states are languages (observe that these states have a semantics, just liike in the Reverse Naturals automaton!), and the transitions between states (i.e., languages) are Brzozowski derivatives, in the following sense.
Remember that we want to define an automaton where the languages are the states. This means that we want to think of languages as programs (because they are states and states are programs). The program that corresponds to a language \(L\) should really be thought of as "eating letters" as it runs, and accepting when it has nothing left to eat. In the simplest case, this is how you should imagine the \(a\)-derivative of just a word: \(a^{-1}\{a\ b\ c\} = \{b\ c\}\). For arbitrary languages, taking the \(a\)-derivative is what is meant by "eating all the \(a\)'s".
Ok, but what is the \(a\)-derivative of the word \(a\) (of length \(1\))? Well, remember that \(a^{-1}aw = w\) for any word \(w\). This includes the empty word, or \(w = \varepsilon\), so \[ a^{-1}\{a\} = a^{-1}\{a\varepsilon\} = \{\varepsilon\} \]
- \(L_1 = \{\varepsilon, aa, ba, cab, c, acab\}\)
- \(L_2 = \{aa, aaa, aaaa, caaa\}\)
- \(L_3 = \{\varepsilon, bac, abc, cab\}\)
- \(L_4 = \{b^n \mid n \in \mathbb N\}\) where \(b^n = bbb\dots b\) is \(n\) consecutive \(b\)'s
- \(L_5 = L_2 \cup L_3\)
- \(L_6 = L_4 \cup \{a^n \mid n \in \mathbb n\}\)
Alright, now that you've had some practice with derivatives, let's take a look at the formal definition of the Brzozowski automaton.
- \(2^{A^*} = \{L \mid L \subseteq A^*\}\)
- \(\delta = \{(L, a, a^{-1}L) \mid \text{\(a \in A\) and \(L \subseteq A^*\)}\} \subseteq 2^{A^*} \times A \times 2^{A^*}\)
- \(F = \{L \subseteq A^* \mid \varepsilon \in L\}\)
In particular, there is the empty language \(\emptyset \in 2^{A^*}\).
In the problem below, we are going to consider the states that are reachable from a given state. Formally, given states \(x,y \in Q\) in an automaton \(\mathcal A = (Q, A, \delta, F)\), \(y\) is reachable from \(x\) if there is a path \[ x \xrightarrow{a_1} x_1 \xrightarrow{a_2} x_2 \xrightarrow{a_3} \cdots \xrightarrow{a_k} y \] that starts at \(x\) and ends at \(y\).
What we are calling the "Brzozowski Fixed Point Theorem" below states that every language accepts itself in the Brzozowski automaton.
We are going to show that every word \(w \in L\) is also an element of \(\mathcal L(\mathcal A_{Brz}, L)\) (this is the definition of \(\subseteq\)). To paraphrase, we want to show that for any language \(L \subseteq A^*\) and for any word \(w \in A^*\), if \(w \in L\), then \(L\) accepts \(w\) in \(\mathcal A_{Brz}\). So, let's collect all of the words that satisfy this statement into their own language, \[ U = \{w \in A^* \mid \text{for any language \(L \subseteq A^*\), if \(w \in L\), then \(L\) accepts \(w\)} \} \] We want to show that every word is in \(U\), i.e., \(U = A^*\). This is a situation where we can use Induction on Words!
In the base case, we want to show that \(\varepsilon \in U\). So, let \(L \subseteq A^*\) and suppose that \(\varepsilon \in L\). We want to show that \(\varepsilon \in \mathcal L(\mathcal A_{Brz}, L)\), which is to say that \(L\) accepts \(\varepsilon\). In the Brzozowski automaton, the accepting states (languages in \(F\)) are the languages that contain \(\varepsilon\). Since we assumed that \(\varepsilon \in L\), we know that \(L \in F\), which tells us that \(L\) accepts \(\varepsilon\). By definition, \(\varepsilon \in \mathcal L(\mathcal A_{Brz}, L)\). We have just shown that if \(\varepsilon \in L\), then \(\varepsilon \in \mathcal L(\mathcal A_{Brz}, L)\). By our definition of \(U\), \(\varepsilon \in U\). This concludes the base case of Induction on Words.
For the induction step, we are going to show that if \(w \in U\) and \(a \in A\), then \(aw \in U\). This will allow us to conclude that \(U = A^*\) by Induction on Words.
Let \(w \in U\) and \(a \in A\). Let \(L \subseteq A^*\) and suppose that \(aw \in L\). We want to show that \(aw \in \mathcal L(\mathcal A_{Brz}, L)\), since this would tell us that \(aw \in U\) like we wanted. Now, if \(aw \in L\), then \(w \in a^{-1}L\) by the definition of \(a\)-derivative. Since \(w \in U\), we also know that \(w \in \mathcal L(\mathcal A_{Brz}, a^{-1}L)\), i.e., \(a^{-1}L\) accepts \(w\). If we write \(w\) as a sequence of letters \(w = b_1b_2b_3\dots b_n\), then this means that there is a path \[ a^{-1}L \xrightarrow{b_1} L_1 \xrightarrow{b_2} \cdots \xrightarrow{b_n} L_n \qquad L_n \in F \] such that \(L_n\) is an accepting state. But by definition of \(\mathcal A_{Brz}\), \(L \xrightarrow{a} a^{-1}L\), so actually we have a path \[ L \xrightarrow{a} a^{-1}L \xrightarrow{b_1} L_1 \xrightarrow{b_2} \cdots \xrightarrow{b_n} L_n \qquad L_n \in F \] This tells us that \(L\) accepts \(aw\), or equivalently, \(aw \in \mathcal L(\mathcal A_{Brz}, L)\). We have just shown that if \(aw \in L\), then \(L\) accepts \(aw\). By definition of \(U\), this tells us \(aw \in U\).
We have shown two things: that \(\varepsilon \in U\), and if \(w\in U\) and \(a \in A\), then \(aw \in U\). By Induction on words, \(U = A^*\), or in other words, \(L \subseteq \mathcal L(\mathcal A_{Brz}, L)\) for every language \(L \subseteq A^*\).
- Show that if \(\varepsilon \in a^{-1}L\), then \(a \in L\)
- Show that if \(\varepsilon \in b^{-1}a^{-1}L\), then \(ab \in L\)
- Show that if \(a \in b^{-1}b^{-1}a^{-1}L\), then \(abba \in L\)