CSCI 341 Theory of Computation

Fall 2025, with Schmid

← lambda calculus church turing thesis →

Turing Machines

So far, we've seen what it means to design string representations of decision problems and functions. We also saw the \(\lambda\)-calculus, which is a way of writing down functions syntactically and using rewrite rules as a form of computation. This is one way to design string-functions \(g \colon A^* \to A^*\) (using rewrite rules). Today we're going to look at a different way, something called a Turing machine.

Roughly, a Turing machine is a kind of "mathematical model of computer hardware". It beefs up the stack automaton concept by adding a more powerful type of data structure to automata: a tape machine.

Tape Machines

In the case of stacks, these were three basic commands, including \(\mathtt{skip}\), \(\mathtt{pop}~\sigma\), and \(\mathtt{push}~\sigma\), that operated on a finite list of symbols that represented a stack. A tape machine (as we will define below) is a lot like a stack: it can run a set of basic commands that alter the state of a particular type of memory, which can be combined in sequence to form programs. We fix ahead of time a set \(A\) of tape symbols.

(Tape Machine) Given a set \(A\) of tape symbols, a (two-way) tape machine is a pair \((t, i)\) consisting of a function \(t \colon \mathbb Z \to A \cup \{\_\}\) called the tape, and an integer \(i\) called the position (of the tape head). The symbol \(\_\) is called blank.

The set of tape head programs \(\mathtt{Tape}\) is derived from the following grammar: \[ E \to \mathtt{skip} \mid \mathtt{write}~\sigma \mid \mathtt{move~left} \mid \mathtt{move~right} \mid E{.}E \] Above, \(\sigma\) ranges over \(A \cup \{\_\}\). Given a tape machine \((t, i)\), we define \[\begin{aligned} (t,i).\mathtt{skip} &= (t, i) \\ (t,i).\mathtt{write}~\sigma &= (t', i) &\text{where } t'(j) = \begin{cases} \sigma & \text{if } i = j \\ t(j) & \text{if } i \neq j \end{cases}\\ (t, i).\mathtt{move~left} &= (t, i-1) \\ (t, i).\mathtt{move~right} &= (t, i+1) \end{aligned}\]

When convenient, we will use the notation

\({\downarrow}\sigma = \mathtt{write}~\sigma\)
\(\not{\!\downarrow} = \mathtt{erase} = \mathtt{write}~\_\)
\({\lhd} = \mathtt{move~left}\)
\({\rhd} = \mathtt{move~right}\)

Intuitively, \((t, i)\) represents a list of symbols (the tape) that stretches infinitely long in both directions, paired with a read/write-enabled device (the tape head). At each position (represented as an integer), a symbol \(\sigma \in A\) can appear on the tape at that position, or the tape at that position could be blank (formally represented as "\(\_\)"). Below, the tape head is represented in pink/purple and the tape is represented as the array of squares (called cells) that streches in both directions. For an example of a tape machine running a tape program, let the tape machine directly above be called \((t, 0)\). Then \(t(0) = 0\), \(t(1) = 1\), and \(t(-1) = 0\). Running the program \((t, 0).p\) where \[ p = \mathtt{write}~1.\mathtt{move~right}.\mathtt{write}~0.\mathtt{move~left}.\mathtt{move~left}.\mathtt{write}~1 \] flips all the bits. This would produce the tape below:

(Nuked Tape) Write a tape program that clears the tape in the last image above, in the sense that it erases all three symbols on the tape.

(Two Steps Forward, Two Steps Back) Show that the following tape program is equivalent to \(\mathtt{skip}\) by running the program on the tape machine \((t,0)\) one step at a time. In other words, show that \((t,0).p = (t,0)\). \[p = \mathtt{move~right}.\mathtt{move~right}.\mathtt{move~left}.\mathtt{move~left} \]

Turing Machines

A Turing machine is to tapes and automata what a stack automaton was to stacks and nondeterministic automata.

(Turing Machine) A Turing machine is a triple \(\mathcal T = (Q, A, \delta)\), where

\(Q\) is a finite set of states, or programs,
\(A\) is a set of tape symbols, and
\(\delta\) is a relation \[ \delta \subseteq Q \times (A \cup \{\_\}) \times \mathtt{Tape} \times Q \] called the transition relation.

If \((x, \sigma, p, y) \in \delta\), then we write \(x\xrightarrow{\sigma \mid p} y\) and say that \(x\) runs \(p\) and transitions to \(y\) if \(a\) is read.

A Turing machine \(\mathcal T = (Q, A, \delta)\) is deterministic if for any \(x \in Q\) and \(\sigma \in A \cup \{\_\}\), there is at most one transition of the form \(x \xrightarrow{\sigma \mid p} y\), i.e., if we write \[ \delta(x, \sigma) = \{y \in Q \mid \text{there exists \(p \in \mathtt{Tape}\) such that } x \xrightarrow{\sigma \mid p} y\} \] then \[ |\delta(x, \sigma)| \le 1 \] Note that we are allowing \(\delta(x, \sigma) = \{\}\) in the definition of deterministic.

We say that \(x\) immediately halts on input \(\sigma\) if \(\delta(x, \sigma) = \{\}\).

A Turing machine should be thought of as "hardware" (as opposed to \(\lambda\)-terms, which should be thought of as "software") in the sense that we would normally think of hardware. Indeed, your actual, physical computer has a hard drive (or SSD, whatever) + RAM for its tape, and its states and transitions consist of all of the different programs that you can run on it. In this picture of Turing machines, the tape programs themselves are the possible machine instructions, and the tape head is the physical position of the reader/writer in the memory hardware!

States as Programs

The states of a Turing machine are honest-to-goodness programs, in the following sense: the transition \(x \xrightarrow{a \mid \rhd{.}{\downarrow}0} y\) should really be interpreted as a "line of code in the program \(x\)", namely

x =	if a move right.write 0.goto y
y =	if a write 0.goto x

Just like in ordinary programming, it's useful to have a library of "common procedures" at your disposal. Let's take a look at a couple examples.

(Halt) The program halt is the only state of the Turing machine \(\mathcal T_{x_\mathtt{hlt}} = (\{x_\mathtt{hlt}\}, A, \{\})\). The state \(x_{\mathtt{hlt}}\) halts on all inputs. In code, this might look like

x_hlt =

... it's the program that does nothing and then stops! This is such a special program that if we want it to appear as a state in another Turing machine, we'll just write \(\mathtt{halt}\) instead of \(\mathtt{goto~x\_hlt}\). If we were to draw the program halt and its Turing machine, it would be pretty boring: a single, transition-less state.

If you look at the transition relation \(\delta\) one more time, you will notice that this state has no outgoing transitions.

In the next example, we revisit the program you wrote near the beginning, that "nukes" the tape. As it is presented, we assume that \(A = \{0,1\}\) (more on that later), but it can be adapted to any set of tape symbols.

(Clear) The program clear moves to the nearest blank cell to the left, and then erases everything between that cell and the nearest blank cell on the right. As a program, it can be written as follows:

x_clr =	if _ move right.goto y if 0 move left.goto x_clr if 1 move left.goto x_clr
y =	if _ halt if 0 erase.move right.goto y if 1 erase.move right.goto y

This is also such a special program that if we want it to appear as a state in another Turing machine, we'll just write \(\mathtt{clear}\) instead of \(\mathtt{goto}~x_{\mathtt{clr}}\).

If we were to draw the program halt and its Turing machine, it would consist of three states (remember halt!).

The program \(\mathtt{clear}\) is really useful if we are careful to have all of the cells used by our program appear contiguously. Then it really clears the tape!

(Rewind) Design a program rewind that moves the tape head all the way to the left, until it reaches the nearest blank.

(Shift) Design a program shift that moves all of the symbols on the tape (i.e., between the nearest blank to the left, all the way to the nearest blank to the right) one index to the right.

(Remove) Design a program remove that deletes a cell from the tape without introducing whitespace in the middle of other non-blank cells.

Similar to shift!

The most important thing to note about Turing machine programs is that at every step, the next step is completely determined by the tape symbol under the tape head at that very instant.

String Transformers, Recognizablility, and Decidability

So, we've focused a lot on decision procedures throughout the course, but at this point, Turing machines don't have any "accept" or "reject" states. What's up with that? This subsection is going to show you how everything we have done so far can be placed in the context of Turing machines: decision procedures for languages, string functions, and all.

As we mentioned at the top, a state in a Turing machine (i.e., a program!) acts as a "string transformer". The way this works goes like this: start by writing your desired input string to a tape machine, and move the tape head to \(0\). Now run the machine until it halts!

(Halting on Input) Let \(w \in A^*\) be a word (note that we assume it does not include blanks), write \(w = a_1 \cdots a_n\), and let \(t \colon \mathbb{Z} \to A \cup \{\_\}\) be the tape \[ t(i) = \begin{cases} a_i & \text{ if \(1 \le i \le n\)} \\ \mathtt{\_} & \text{ otherwise} \\ \end{cases} \] Now let \(\mathcal T = (Q, A, \delta)\) be a deterministic Turing machine with a state \(x \in Q\).

A run of \(\mathcal T\) starting from \(x\) on input \(w\) is a path through \(\mathcal T\) of the form \[ (x, t, 0) \xrightarrow{b_1 \mid p_1} (x_1, t_1, i_1) \xrightarrow{b_2 \mid p_2} (x_2, t_2, i_2) \xrightarrow{b_3 \mid p_3} \cdots \xrightarrow{b_k \mid p_k} (x_k, t_k, i_k) \qquad\qquad (\dagger) \] such that the following conditions are met:

for any \(0 \le j < k\), \(b_{j+1} = t_{j}(i_j)\)
(i.e., the symbol read by the tape head at each position \(i_j\) is the input symbol for the next step \(b_{j}\))
for any \(0 \le j < k\), \((t_{j+1}, i_{j+1}) = (t_j, i_j).p_{j+1}\)
(i.e., the state of the tape machine at each step is obtained by running the tape program on the previous state of the tape machine)

Above, we are assuming \(t_0 = t\) and \(i_0 = 0\).

The run \((\dagger)\) is called a halting run, and we say that \(x\) halts on input \(w\), if furthermore,

\(x_{i_k}\) immediately halts on input \(t_{k}(i_k)\)
(i.e., there are no outgoing transitions from \(x_k\) for the tape symbol under the tape head at the end of the run)

If a program halts on every input string, it really determines a string function: given a string \(w\), we run the program \(x\) in the machine \(\mathcal T\) on input \(w\), and then when it halts, we read the tape again. If we ignore all of the blanks (remove all the whitespace), then we obtain a new string, \(u\), and call it \(\mathcal T_x(u)\). The only technicality to take care of is the awkward issue of halting. Indeed, it is easy to come up with programs that never halt!

(Divergent Loop) Write a program that never halts. In the future, we're going to call this program \(\mathtt{diverge}\).

(String Transformer, Turing computable) A string transformer is a partial function \(f \colon A^* \rightharpoonup A^*\). This means that it is a relation \(f \subseteq A^* \times A^*\) such that if \((x, y) \in f\) and \((x, y') \in f\), then \(y = y'\). If \((x, y) \in f\), we say that \(f(x)\) is well-defined at \(x\) and write \(f(x) = y\).

Let \(\mathcal T\) be a deterministic Turing machine with a state \(x\). We define the string transformer \(\mathcal T_x \colon A^* \rightharpoonup A^*\) as follows: let \(w \in A^*\) be a word. If \(x\) halts on input \(w\), and the non-blank cells on the tape at the end of the halting run spell out a word \(u \in A^*\) (i.e., after removing whitespace), then we say that \(u\) is the result of running \(x\) on input \(w\), and define \[ \mathcal T_x(w) = u \]

A string transformer \(f \colon A^* \rightharpoonup A^*\) is said to be Turing computable (or just computable) if there is a deterministic Turing machine \(\mathcal T\) with a state \(x\) such that \(f = \mathcal T_x\).

(2's Complement) Show that the following function is computable: \[ g \colon A^* \to A^* \quad g(w) = \begin{cases} 0~g(u) & w = 1~u \\ 1~g(u) & w = 0~u \\ \varepsilon & w = \varepsilon \end{cases} \] where \(A = \{0,1\}\). To check your construction, compute \(g(0100110)\).

We are going to call this program \(\mathtt{complement}\).

(\(b\)-Stretch) Let \(b \in A\). Show that the following function is computable: \[ g \colon A^* \to A^* \quad g(w) = \begin{cases} 0~b~g(u) & w = 0~u \\ 1~b~g(u) & w = 1~u \\ \varepsilon & w = \varepsilon \end{cases} \] where \(A = \{0,1\}\). To check your construction, compute \(g(0100110)\).

We are going to call this program \(\mathtt{stretch}~b\).

As a brief aside, we can now define what it means for a general function to be computable.

(General Computable Function) Let \(f \colon S_1 \to S_2\) be any function between two sets \(S_1\) and \(S_2\). A computable representation of \(f\) is a representation of \(f\) \((\rho_1, g, \rho_2)\) such that \(g\) is computable.

(Doubling is Computable) Show that \(f \colon \mathbb N \to \mathbb N\) defined by \(f(n) = 2n\) is computable.

Recognizability

The languages that a particular program halts have a special name, which we can write down now.

(Recognizable) A language \(L \subseteq A^*\) is Turing recognizable (or just recognizable) if there is a Turing machine \(\mathcal T\) with a state \(x\) such that \[ L = \{w \in A^* \mid \text{\(x\) halts on input \(w\)}\} \] In general, we write \[ \mathcal R(\mathcal T, x) = \{w \in A^* \mid \text{\(x\) halts on input \(w\)}\} \] The family of recognizable languages is denoted \(\mathsf{TRec}\).

(Recognition = Definition) Given a string transformer \(f \colon A^* \rightharpoonup A^*\), define the domain of \(f\) to be \[ \mathrm{dom}(f) = \{w \in A^* \mid \text{\(f\) is well-defined at \(w\)}\} \] Let \(\mathcal T\) be a Turing machine with a state \(x\). Show that \(\mathrm{dom}(\mathcal T_x) = \mathcal R(\mathcal T, x)\).

Decision Procedures

Note that recognition has nothing to do with accept/reject. It really just determines the set of strings that a computable string transformer is defined on. Decision procedures are a bit different: let's fix two symbols in our alphabet of tape symbols, call them accept \(\top \in A\) and reject \(\bot \in A\).

(Decider, Turing Decidable) A string transformer \(f \colon A^* \rightharpoonup A^*\) is called a decider if for any \(w \in A^*\), either \(f(w) = \bot\) (\(f\) rejects \(w\)) or \(f(w) = \top\) (\(f\) accepts \(w\)). Note that a decider is a (total) function, in the sense that \(\mathrm{dom}(f) = A^*\).

If the decider \(f\) is computable and \(f = \mathcal T_x\) for some Turing machine \(\mathcal T\) with the state \(x\), then we write \[ \mathcal L(\mathcal T, x) = \{w \in A^* \mid \} \] and say that \(x\) is a decision procedure for \(\mathcal L(\mathcal T, x)\). A language \(L \subseteq A^*\) is called Turing deciable (or just decidable) if it has a decision procedure.

The family of decision procedures is written \(\mathsf{Dec}\).

We use \(\top\) and \(\bot\) just so that we can call these symbols whatever we like, but we'll usually take \(1 = \top\) and \(0 = \bot\).

The next example contains two programs that will come up a lot in the context of decision procedures.

(Accept and Reject) The program accept is intended to clear the tape and then only leave a \(1\), indicating a "successful run", in the same sense as with stack automata. As a program, it can be written as follows:

x_acc =	clear.goto y
y =	write \(\top\).halt

The reject program is similar but writes a \(0\).

x_rej =	clear.goto y
y =	write \(\bot\).halt

These are also such special programs that they deserve their own names: we'll just write \(\mathtt{accept}\) and \(\mathtt{reject}\) instead of \(\mathtt{goto~x_acc}\) and \(\mathtt{goto~x_rej}\).

Note that in the programs above, we wrote \(\mathtt{clear.goto~y}\). This doesn't quite make sense at this point, because \(\mathtt{clear}\) is the program corresponding to a state, not a tape program. The way you should read this is run program \(\mathtt{clear}\) and when it halts, runt he program \(\mathtt{y}\).

(ABCs Done Right) Show that the language \(\{a^nb^nc^n \mid n \in \mathbb N\}\) is decidable.

Decidability is a much stronger condition than recognizability.

(Decidable Languages are Recognizable) Let \(L \subseteq A^*\) be a decidable language. Then \(L\) is recognizable as well. In other words, \[\mathsf{Dec} \subseteq \mathsf{Rec}\]

Since \(L\) is decidable, it has a decision procedure. Let \(\mathcal T\) be a Turing machine with a state \(x_L\) that is a decision procedure for the language \(L\). Call the decision procedure \(\mathtt{decide}~L\). Without loss of generality, we can assume that the tape head is either reading \(\top\) or \(\bot\) when it halts. Consider the following Turing program:

x =	decide \(L\).goto y
y =	if \(\top\) halt if \(\bot\) diverge

You made the program \(\mathtt{diverge}\) Divergent Loop exercise. The state \(x\) halts on an input word \(w\) if and only if \(x_L\) halts and accepts the word \(w\). This is because (1) if \(x_L\) accepts the word \(w\), then the tape head is reading \(\top\) when \(x_L\) halts on the input \(w\), and (2) if \(x_L\) rejects the word \(w\), then the tape head is reading \(\bot\) when \(x_L\) halts on the input \(w\).

That being said, basically every language we have seen so far is decidable! The easiest examples to see are the regular languages, but I will let you figure that out yourself.

(Regular Languages are Decidable) Prove that \(\mathsf{Reg} \subseteq \mathsf{Dec}\).

We could show that every context-free language is decidable at this point, but that proof is easier if we have a couple extra tools. We'll see those next time.

A brief historical note

Strangely, in the history of computing, Turing machines came after the \(\lambda\)-calculus, and a little while later it was shown that they are equally as powerful as the \(\lambda\)-calculus, in the sense that they compute all of the same functions on \(\mathbb N\).

← lambda calculus church turing thesis →

Top