CSCI 341 Theory of Computation

Fall 2025, with Schmid

← finitely recognizable languages structure of fin →

Determinization

Remember that a language \(L\) is finitely recognizable if there is a finite automaton \(\mathcal A = (Q, A, \delta, F)\) and a state \(x \in Q\) such that \(L = \mathcal L(\mathcal A, x)\). Last time, we saw that \(\mathsf{Fin} \subseteq \mathsf{TDFin}\). Today, we are going to show the converse: that \(\mathsf{TDFin} \subseteq \mathsf{Fin}\), which would prove the following theorem:

(Finite Recognizability) Let \(L \subseteq A^*\) be a language. Then \(L\) is finitely recognizable if and only if \(L\) is total deterministic finitely recognizable. In other words, \[\mathsf{TDFin} = \mathsf{DFin} = \mathsf{Fin}\]

The difficult part of the proof of the Finite Recognizability Theorem is showing that \(\mathsf{TDFin} \subseteq \mathsf{Fin}\). The idea behind the proof is an automaton construction: this is a kind of function that takes an automaton and a state as input, and produces an automaton and a state as output. The automaton construction we need is

Input: a finite automaton \(\mathcal A = (Q, A, \delta, F)\) and a state \(x \in Q\)
Output: a finite automaton \(\mathrm{Det}(\mathcal A) = (Q', A, \delta', F')\) and a state \(S \in Q'\) such that \[\mathcal L(\mathcal A, x) = \mathcal L(\mathrm{Det}(\mathcal A), S)\]

The automaton construction we are going to describe below, \(\mathrm{Det}\), is called determinization. Before we get into it, it helps to see a bit of intuition for where it comes from. Remember how a state of an automaton reads a word:

Write the word to a tape and place a marker on the starting state.
If there is no letter under the tape head, stop. Otherwise, Read the letter under the tape head. Call it \(a_i\).
For each active state \(y\) (a state with a marker on it) with a transition \(y \xrightarrow{a_i} z\), remove the marker from \(y\) and place a marker on \(z\). Now \(z\) is active.
Move the tape head to the right and go back to 2.

In the example below, let's just run this on \(x_0\) for the word \(aa\).

We know that \(x_0\) accepts \(aa\) here because one of the active states is accepting at the end. But how do we turn this into a deterministic path?

At the moment, multiple states are active at each step. The idea behind turning this into a deterministic automaton is to create a state in our determinized automaton for each possible collection of active states. So, the first collection would just be \(\{x_0\}\), since that's where we start. The outgoing transitions from \(x_0\) are now collected into a single state: after reading an \(a\), the next active state would be the collection \(\{x_1, x_2\}\). Now repeat this again: after reading a second \(a\), collect all the outgoing transitions from \(\{x_1, x_2\}\) into a single transition to the collection \(\{x_3, x_4, x_5\}\). The collections \(\{x_0\}\), \(\{x_1, x_2\}\), and \(\{x_3, x_4, x_5\}\) are three of the states that we get from determinizing the automaton.

collect the set of states into a single state

Above, the collection \(\{x_3, x_4, x_5\}\) is a accepting state because one of the states it contains is accepting: indeed, observe that \(x_0\) accepts \(aa\). This is where the definition of \(\mathrm{Det}(\mathcal A)\) below comes from.

(Determinization) Let \(\mathcal A = (Q, A, \delta, F)\) be an automaton. The determinization of \(\mathcal A\) is the automaton \(\mathrm{Det}(\mathcal A) = (Q', A, \delta', F')\), where

\(Q' = 2^Q\) is the set of subsets of \(Q\), i.e., the set of all collections of states of \(\mathcal A\),
\(\delta' = \big\{ (U, a, V) \mid V = \bigcup_{x \in U} \delta(x, a) \big\}\), i.e., \(U \xrightarrow{a} V\) if \(V\) is the set of all states with incoming \(a\) transitions from states in \(U\),
\(F' = \big\{ U \mid F \cap U \neq \emptyset\big\}\), i.e., a collection of states is accepting if it contains an accepting state of \(\mathcal A\)

The set \(2^Q\), consisting of all the subsets of \(Q\), is called the powerset of \(Q\). You might sometimes hear this determinization construction called the powerset construction.

(Understanding the Powerset) Calculate the powerset \(2^Q\), where

\(Q = \{x\}\)
\(Q = \{x, y\}\)
\(Q = \{x, y, z\}\)
\(Q = \{\}\)

These sets have \(2, 4, 8\), and \(1\) elements respectively.

Consider the two-state automaton in the alphabet \(A = \{a, b\}\) below

two states, x-naught and x-1, with mutual a-transitions, and an a-loop on x-naught. x-1 is accepting.

A two-state automaton \(\mathcal A_{eg}\) over the alphabet \(A = \{a, b\}\). Notice that there are no \(b\)-transitions.

Formally, this state diagram represents the automaton \(\mathcal A_{eg} = (Q, A, \delta, F)\) given by \[\begin{aligned} Q &= \{x_0, x_1\} \\ A &= \{a, b\} \\ \delta &= \{(x_0, a, x_0), (x_0, a, x_1), (x_1, a, x_0)\}\\ F &= \{x_1\} \end{aligned}\] This is not a deterministic automaton: there are two \(a\)-transitions leaving \(x_0\).

In the determinization of \(\mathcal A_{eg}\), the states are collections of states from \(\mathcal A_{eg}\) (in other words, subsets of \(Q\)). There are two states in \(\mathcal A_{eg}\), so there are \(2^2 = 4\) states in \(\mathrm{Det}(\mathcal A_{eg})\). Specifically, \[ Q' = 2^Q = \big\{ \{\}, \{x_0\}, \{x_1\}, \{x_0,x_1\} \big\} \] Let's look at an example of a transition: since \(x_0 \xrightarrow{a} x_0\) and \(x_0 \xrightarrow{a} x_1\), the transition \(\{x_0\} \xrightarrow{a} \{x_0, x_1\}\) appears in \(\mathrm{Det}(\mathcal A_{eg})\). Similarly, since \(x_1 \xrightarrow{a} x_0\), there is a self \(a\)-transition \(\{x_0, x_1\} \xrightarrow{a} \{x_0, x_1\}\) in \(\mathrm{Det}(\mathcal A_{eg})\). In total, the state diagram looks like this:

two state automaton determinized, with four states

A state diagram for the determinization of the two state automaton, \(\mathrm{Det}(\mathcal A_{eg})\).

(Determinize for Yourself) Consider the following automaton \(\mathcal A\).

A two-state nondeterministic automaton \(\mathcal A = (Q, A, \delta, F)\), where \[\begin{aligned} Q &= \{x_0, x_1\} \\ A &= \{a, b\} \\ \delta &= \{ (x_0, a, x_0), (x_0, a, x_1), (x_0, a, x_1), (x_1, a, x_0), (x_1, b, x_1), \} \\ F &= \{x_0, x_1\} \end{aligned}\]

Our goal is to draw the state diagram for \(\mathrm{Det}(\mathcal A)\). The states of \(\mathrm{Det}(\mathcal A)\) are subsets of \(Q\), \[ Q' = 2^Q = \{\{\}, \{x_0\}, \{x_1\}, \{x_0, x_1\}\} \] like before. But what about the rest of it?

\(\{x_0\} \xrightarrow{a} \{x_0,x_1\}\) is a transition in \(\mathrm{Det}(\mathcal A)\). What about \(\{x_0\} \xrightarrow{a} \{x_1\}\)?
What are the accepting states of \(\mathrm{Det}(\mathcal A)\)?
Draw the complete state diagram for \(\mathrm{Det}(\mathcal A)\).

(Yeah, Totally) Explain the \(b\)-transitions into the emptyset, \(\xrightarrow{b} \{\}\), in \(\mathcal A_{eg}\) and \(\mathcal A\) above. Is \(\mathrm{Det}(\mathcal A)\) total?

(Determinizing is Deterministic) Prove that for any automaton \(\mathcal A = (Q, A, \delta, F)\), \(\mathrm{Det}(\mathcal A)\) is total deterministic.

Let \(U,V,W \in 2^Q\) be states of \(\mathrm{Det}(\mathcal A)\) (i.e., subsets of \(Q\)), and let \(a \in A\). Assume that \(U \xrightarrow{a} V\) and \(U \xrightarrow{a} W\) are two transitions in \(\mathrm{Det}(\mathcal A)\). Show that \(V \subseteq W\). Do we separately need to show that \(W \subseteq V\)?

(Running on Empty) Determinize the empty automaton. Which automaton do you get (it has a name)?

(A Bigger One) Consider the automaton \(\mathcal A = (Q, A, \delta, F)\) where \[\begin{aligned} Q &= \{s_1, s_2, s_3\} \\ A &= \{0, 1\} \\ \delta &= \{ (s_1, 0, s_2), (s_1, 0, s_3), (s_2, 0, s_3), (s_3, 0, s_2) \} \\ F &= \{s_2\} \end{aligned}\] Write down the transition table for \(\mathrm{Det}(\mathcal A)\). Draw the subautomaton of \(\mathrm{Det}(\mathcal A)\) that is reachable from the state \(\{s_1\}\) What language is accepted by \(\{s_1\}\)?

Determinization Works

The point of the previous few exercises and problems is to show that \(\mathrm{Det}(\mathcal A)\) is a total deterministic finite automaton when \(\mathcal A\) is a finite automaton. Our next goal is to show that this automaton construction is correct, in the sense that accomplishes our goal for this construction. The way we designed \(\mathrm{Det}(\mathcal A)\) is that for any state \(x \in Q\), \(x\) accepts the same language as \(\{x\}\).

(Determinized State) Let \(\mathcal A = (Q, A, \delta, F)\) be an automaton, and let \(\mathrm{Det}(\mathcal A)\) be its determinization. Then for any state \(x \in Q\), \[ \mathcal L(\mathcal A, x) = \mathcal L(\mathrm{Det}(\mathcal A), \{x\}) \]

We will prove one inclusion, \(\mathcal L(\mathcal A, x) \subseteq \mathcal L(\mathrm{Det}(\mathcal A), \{x\})\) here. The other inclusion is the problem below.

Let \(w \in \mathcal L(\mathcal A, x)\). Since \(w\) is a word, there are letters \(a_1, a_2, \dots, a_n \in A\) such that \(w = a_1a_2\cdots a_n\). Now \(x\) accepts \(w\), which means that there is a path through \(\mathcal A\) of the form \[ x \xrightarrow{a_1} x_1 \xrightarrow{a_2} \cdots \xrightarrow{a_n} x_n \] where \(x_n \in F\). Let's build the path from \(\{x\}\) to an accepting state of \(\mathrm{Det}(\mathcal A)\). Recursively, we define \(U_0 = \{x\}\), and for each \(0 \le i < n\), \[ U_{i+1} = \bigcup_{y \in U_i} \delta(y, a_i) \] So, for example, \(U_1\) is the set of states with incoming transitions from \(x\), \(U_2\) is the set of states with incoming transitions from states in \(U_1\), and so on. Notice that \(x_i \in U_i\) for each \(1 \le i \le n\).

By the definition of the transition relation \(\delta'\) of \(\mathrm{Det}(\mathcal A)\), the \(U_i\) form the path \[ \{x_0\} = U_0 \xrightarrow{a_1} U_1 \xrightarrow{a_2} U_2 \xrightarrow{a_3} \cdots \xrightarrow{a_{n}} U_n \] Since \(x_n \in U_n\) and \(x_n \in F\) (is an accepting state of \(\mathcal A\)), \(U_n\) is an accepting state of \(\mathrm{Det}(\mathcal A)\). This means that \(w = a_1a_2\cdots a_n \in \mathcal L(\mathrm{Det}(\mathcal A), \{x\})\).

We have just shown that every element of \(\mathcal L(\mathcal A, x)\) is an element of \(\mathcal L(\mathrm{Det}(\mathcal A), \{x\})\). In other words, \(\mathcal L(\mathcal A, x) \subseteq \mathcal L(\mathrm{Det}(\mathcal A), \{x\})\).

(Determinized State, Completing the Proof) Let \(\mathcal A = (Q, A, \delta, F)\) be an automaton, and let \(\mathrm{Det}(\mathcal A)\) be its determinization. Prove that for any state \(x \in Q\), \[ \mathcal L(\mathcal A, x) \supseteq \mathcal L(\mathrm{Det}(\mathcal A), \{x\}) \]

Let \(a_1a_2 \cdots a_n \in \mathcal L(\mathrm{Det}(\mathcal A), \{x\})\) and consider a path \(\{x\} \xrightarrow{a_1} U_1 \xrightarrow{a_2} \cdots \xrightarrow{a_n} U_n\) in \(\mathrm{Det}(\mathcal A)\) where \(U_n\) is an accepting state of \(\mathrm{Det}(\mathcal A)\). Recursively define a path from \(x\) to an accepting state of \(\mathcal A\) by picking out a desirable state from each \(U_i\).

This allows for the following proof of the Finite Recognizability Theorem.

(of the Finite Recognizability Theorem) We have already seen that \(\mathsf{TDFin} \subseteq \mathsf{DFin} \subseteq \mathsf{Fin}\). It suffices to show that \(\mathsf{Fin} \subseteq \mathsf{TDFin}\).

Let \(L \in \mathsf{Fin}\). Then by definition of finite recognizability, there is a finite automaton \(\mathcal A = (Q, A, \delta, F)\) and a state \(x \in Q\) such that \(L = \mathcal L(\mathcal A, x)\). We need to find a total deterministic finite automaton \(\mathcal A'\) and a state \(z\) of \(\mathcal A'\) such that \(\mathcal L(\mathcal A', z) = L\). By the Determinized State Theorem, \(\mathcal A' = \mathrm{Det}(\mathcal A)\) is such an automaton: it is a total deterministic automaton with the state \(\{x\}\), and \[ L = \mathcal L(\mathcal A, x) = \mathcal L(\mathrm{Det}(\mathcal A), \{x\}) \] It follows that \(L\) is a total deterministic finitely recognizable language, i.e., \(L \in \mathsf{TDFin}\).

(Is Determinization Optimal?) Remember that a minimal automaton is a total deterministic automaton where no two states accept the same language, and that a minimal automaton is the smallest total deterministic automaton that accepts all its states' languages. Does determinization find the smallest automaton? If it does, explain why. If it does not, give an example of a determinized automaton that is not minimal.

(You Got Options) Find the smallest automaton (not necessarily total or deterministic) with a state that accepts the language \[ L = \{ab^n \mid n \in \mathbb N\} \cup \{ac^n \mid n \in \mathbb N\} \cup \{a(bc)^n \mid n \in \mathbb N\} \] over the alphabet \(A = \{a,b,c\}\). Use determinization to find a deterministic automaton with a state that accepts the same language.

Possibly a life-saving hint... You do not need to draw every state of the automaton \(\mathrm{Det}(\mathcal A)\), only the ones reachable from \(\{x\}\). I also don't necessarily want a total automaton!

← finitely recognizable languages structure of fin →

Top