CSCI 341 Theory of Computation

Fall 2025, with Schmid
← computationturing machines →

Pure Computation: The \(\lambda\)-calculus

Last lecture, we spent some time understanding how to represent decision problems and functions syntactically, i.e., as properties and transformations of strings. This allowed us to classify problems about numbers and trees using the heirarchy of families of languages we have seen so far (for eg., we were able to make sense of the statement, "deciding the parity of a natural number is a regular problem"). We also talked about how we can represent arbitrary functions, including any function of the form \(f \colon \mathbb N \to \mathbb N\), as a function on words.

Today we are going to focus on the latter item: we have decided to conceive of computation as "the manipulation of semantical objects via syntactic means". We have a clear conception, at this point, what it means to solve language membership problems syntactically (via automata, for example), but we haven't said anything about how to do do this for functions on strings. Next time, we will see that automata can also be used to define functions on strings, but today we are going to do something a bit different: we are going to use purely syntactic transformations---rewrite rules, nothing else---to define functions on strings. This is what the \(\lambda\)-calculus is for.

\(\lambda\)-terms

Let's start with a specific description of the syntax of the \(\lambda\)-calculus, what are called \(\lambda\)-terms. (Recall that we saw these in the Parse Trees section!)

(\(\lambda\)-terms) Fix a set \(\mathit{Var}\) of what we are going to call input variables. The set of \(\lambda\)-terms, \(\lambda\mathit{Term}\), is the language that is derived from the variable \(E\) in the grammar below. \[ E \to x \mid (E \circ E) \mid (\lambda x . E) \] where \(x \in \mathit{Var}\) is any input variable.

The \(\lambda\)-term \(x\) is called an isolated input variable. Given \(\lambda\)-terms \(t,s\), the \(\lambda\)-term \((t \circ s)\) is called the composition of \(t\) and \(s\). Given a \(\lambda\)-term \(t\), the \(\lambda\)-term \((\lambda x.t)\) is called the \(\lambda\)-abstraction of \(t\).

For example, \((\lambda x.(y \circ (\lambda y. y)))\) is a \(\lambda\)-term.

(Penmanship) Write down five more \(\lambda\)-terms using the input variables \(x\) and \(y\).

We are going to simplify notation a bit sometimes, with the tacit understanding that the simpler strings of symbols representing \(\lambda\)-terms always represent unique honest-to-goodness \(\lambda\)-terms. For \(\lambda\)-terms \(t_1\) and \(t_2\), instead of \((t_1 \circ t_2)\) we will probably just write \(t_1t_2\). We also typically eliminate the outermost brackets. So, for example, the two expressions below represent the same \(\lambda\)-term: \[ \lambda x.y(\lambda y. y) = (\lambda x.(y \circ (\lambda y. y))) \] When we are feeling very lazy, we are also going to use the shorthands \[ t_1~t_2~t_3 = ((t_1 ~ t_2)~t_3) \qquad \lambda xy.t = (\lambda x.(\lambda y.t)) \]

(Shorthand) Expand the following shorthands into full \(\lambda\)-terms.
  1. \(\lambda x.xy\)
  2. \(\lambda fx.fy\)
  3. \(\lambda xy.z(\lambda zy.x)\)

At this point, \(\lambda\)-terms are just syntax. They're meaningless strings of symbols. This is going to make them feel a bit abstract, which is a good thing at this point, but to help you understand what's going on here: every \(\lambda\)-term represents a function. In fact, every \(\lambda\)-term is a program that computes a function. Specifically, the term \(\lambda x.t\) represents a program that computes a function whose parameter is \(x\). For example, if we allowed ourselves arithmetic operations, we might write \[ f = \lambda x. (x + 1) \quad f(2) = 2 + 1 \] The idea is that the \(x\) is a symbol that we are supposed to replace when we are "evaluating" the function. Many programming languages have a feature like \(\lambda x\), sometimes just called \(\texttt{lambda x}\) (and in that context, it is called abstraction). For now, \(\lambda\)-terms are just strings of symbols, but it doesn't hurt to keep this in mind.

Substitution

Probably the most important notation to learn in the coming days is input variable substitution. This requires a bit of care: to begin with, observe that every \(\lambda\)-term is just a string of symbols, so in particular, every character in the string has an index. Given a \(\lambda\)-term \(t\), if the character at index \(i\) in \(t\) is \(a\), then we call the pair \((a, i)\) an instance of \(a\) in \(t\). We are being careful about this now because different instances of the same character in a \(\lambda\)-term can mean different things.

The following definition is a bit technical, so let's start with the basic idea: an instance of a variable in a \(\lambda\)-term is bound if it is in the scope of a \(\lambda x\), and free otherwise. Exact indices don't make a difference to bookkeeping in the end, so don't drive yourself nuts when counting the symbols in a string.

(Free, Bound) Given a \(\lambda\)-term \(t\), the set of free input variables in \(t\), \(\mathsf{fv}(t)\), is defined recursively on \(\lambda\)-terms: \[\begin{aligned} \mathsf{fv}(x) &= \{(x, 0)\} \\ \mathsf{fv}((t_1 \circ t_2)) &= \{(y, i+1) \mid (y,i) \in \mathsf{fv}(t_1)\} \cup \{(y, i + |t_1| + 2) \mid (y,i) \in \mathsf{fv}(t_2)\} \\ \mathsf{fv}((\lambda x.t)) &= \{(y, i + 4) \mid (y, i) \in \mathsf{fv}(t) \text{ and } y \neq x\} \end{aligned}\] If \((x,i) \in \mathsf{fv}(t)\), we say that \((x, i)\) is a free instance of \(x\) in \(t\).

The scope of \(x\) in \(\lambda x.t\) is the term \(t\). We say that every free instance of \(x\) in \(t\) is bound by \(\lambda x\) in \(\lambda x.t\), and we refer to \(\lambda x\) as the binder of the instance.
(Over the Line!) For each of the following \(\lambda\)-terms, determine the free instances of each variable and determine the bound instances of each variable as well as the \(\lambda\) that binds them.
  1. \(\lambda x.xy\)
  2. \(\lambda x.yx(\lambda x.x)\)
  3. \(\lambda xy.z(\lambda zy.x)\)

Now that we have some idea about what counts as bound and free, let's talk about the key idea on which the \(\lambda\)-calculus is built: substitution.

(Substitution) Let \(t\) and \(s\) be \(\lambda\)-terms and \(x\) be an input variable. The substitution of \(s\) for \(x\) in \(t\) is the \(\lambda\)-term \(t[s/x]\) obtained by replacing every free instance of the variable \(x\) in \(t\) with \(s\).

For example, given any \(\lambda\)-term \(s\), we can compute the substitution \[ (\lambda x.y(\lambda y.y))[s / y] = (\lambda x.s(\lambda y.y)) \] The only free instance of \(y\) in \((\lambda x.y(\lambda y.y))\) is \((y, 4)\), so that is the only instance that is substituted. So, if \(s = \lambda x.x\), then we would have \[ (\lambda x.y(\lambda y.y))[(\lambda x.x) / y] = (\lambda x.(\lambda x.x)(\lambda y.y)) \]

(Substitute Teacher) Calculate the following \(\lambda\)-terms.
  1. \((\lambda x.xy)[(\lambda f.f)/y]\)
  2. \((\lambda x.yx(\lambda x.x))[(\lambda f.f)/x]\)
  3. \((\lambda xy.z(\lambda zy.x))[(\lambda f.f)/z]\)

\(\beta\)-reduction and \(\alpha\)-equivalence

We are now ready to give some meaning to \(\lambda\)-terms. We already stated that every \(\lambda\)-term represents a function, and that the function it represents is somehow given by substituting variables. But what exactly does that mean? We define a rewrite system to make sense of this, similar to how grammars are defined, called \(\beta\)-reduction.

(\(\beta\)-reduction) Let \(t,s\) be \(\lambda\)-terms such that there are no free variables in \(s\) that are bound in \(t\). In such a case, we call the \(\lambda\)-term \(\lambda x.t\) a \(\beta\)-redex, define the rewrite rule \[ (\lambda x.t)s \to_\beta t[s / x] \] and call \(t[s / x]\) a \(\beta\)-reduction of \(\lambda x.t\).

The \(\beta\)-reduction relation extends this rewrite rule to \(\Rightarrow_\beta\), which includes compositions and \(\lambda\)-abstractions as follows:
  • If \(t_1 \Rightarrow_\beta t_2\), then \(t_1s \Rightarrow_\beta t_2 s\) and \(st_1 \Rightarrow_\beta st_2\)
  • If \(t_1 \Rightarrow_\beta t_2\), then \(\lambda x.t_1 \Rightarrow_\beta \lambda x.t_2\)
For example, we can perform a \(\beta\)-reduction of \((\lambda y.yy)a\) by substituting the free instances of \(y\) in the scope of \(\lambda y\) with the \(a\) on the right: \[ (\lambda {\color{blue} y}.{\color{red} yy}){\color{blue} a} \Rightarrow_\beta aa \]
(Calculating for the First Time) Find all of the \(\beta\)-reductions of the following \(\lambda\)-terms.
  1. \((\lambda x.xy)(\lambda f.f)\)
  2. \((\lambda x.yx(\lambda x.x))(\lambda f.f)\)
  3. \(((\lambda zy.z(\lambda x.xy))(\lambda f.f))(\lambda f.f)\)

With a bit of care, the techincal condition in the definition of \(\beta\)-reduction (that free variables cannot be come bound after \(\beta\)-reduction) can be ducked. This is what is called \(\alpha\)-equivalence, although it's not quite as fancy as it sounds. The basic idea is that two functions like \(f(x) = x + 1\) and \(f(y) = y + 1\) are "really the same function". The only difference is what you have named the parameters.

(\(\alpha\)-equivalence) Let \(t,s\) be \(\lambda\)-terms and \(x,y\) be input variables. If either \(y = x\) or \(y\) is not free in \(t\), then we define the rewrite rule \[ \lambda x.t \to_\alpha \lambda y.(t[y/x]) \] and call this relation \(\alpha\)-reduction. This rewrite rule extends to \(\Rightarrow_\alpha\), defined across composition and \(\lambda\)-abstractions as follows:
  • If \(t_1 \Rightarrow_\alpha t_2\), then \(t_1s \Rightarrow_\alpha t_2 s\) and \(st_1 \Rightarrow_\alpha st_2\)
  • If \(t_1 \Rightarrow_\alpha t_2\), then \(\lambda x.t_1 \Rightarrow_\alpha \lambda x.t_2\)
We write \(=_\alpha\) for the relation \[ t =_\alpha s \text{ if and only if } t \Rightarrow_\alpha^* s \] That is, there is a sequence of \(\alpha\)-rductions that turns \(t\) into \(s\). The relation \(=_\alpha\) is called \(\alpha\)-equivalence.
(Alpha-Equivalence) Determine which of the following \(\lambda\)-terms are \(\alpha\)-equivalent.
  1. \((\lambda x.xy)\)
  2. \((\lambda y.xy)\)
  3. \((\lambda x.yx)\)
  4. \((\lambda y.yy)\)
  5. \((\lambda x.xx)\)

We have given \(\alpha\)-reduction a "direction", but it really doesn't have one. This is why we call it an equivalence.

(\(\alpha\)-reduction is an equivalence) The relation \(=_\alpha\) is an equivalence relation. That is, for any \(\lambda\)-terms \(t,s,r\),
  1. \(t =_\alpha t\)
  2. if \(t =_\alpha s\), then \(s =_\alpha t\)
  3. if \(t =_\alpha s\) and \(s =_\alpha r\), then \(t =_\alpha r\)

Up to \(\alpha\)-equivalence, all of the ingredients for using \(\lambda\)-terms as functions is actually buried in the definition of \(\beta\)-reduction. In fact, the stance that the \(\lambda\)-calculus takes is that "complete \(\beta\)-reduction = function computation". By "complete", we mean "no further reductions can be formed".

(\(\alpha\beta\)-Normal Form) A \(\lambda\)-term is said to be in \(\alpha\beta\)-normal form if for every \(\lambda\)-term \(s\) such that \(t =_\alpha s\), \(s\) has no further \(\beta\)-reductions, i.e., \(s \not{\Rightarrow_\beta}\).

This leads to the definition of function evaluation stipulated by the \(\lambda\)-calculus.

(Evaluation) Let \(t \Rightarrow s\) denote that either \(t \Rightarrow_\beta\) or \(t =_\alpha s\). Then we say that \(t\) evaluates to \(s\) if \(t \Rightarrow^* s\) and \(s\) is in \(\alpha\beta\)-normal form. In this case, we write \(t \Downarrow s\).

It is important to note that if \(s_1 =_\alpha s_2\) and \(t \Downarrow s_1\), then \(t\Downarrow s_2\) as well. In other words, evaluation is not unique. However, it is unique up to \(\alpha\)-equivalence. This is what's known as the Church-Rosser Theorem.

(Church-Rosser) Let \(t\) be a \(\lambda\)-term, and let \(t \Downarrow s_1\) and \(t \Downarrow s_2\). Then \(s_1 =_\alpha s_2\).

We aren't going to prove this theorem; that would go beyond the scope of the course. But what it does tell us is that \(\beta\)-reduction does, up to \(\alpha\)-equivalence, give us a useful notion of "running a program to calculate the value of a function".

(Evaluation 1) The \(\lambda\)-term below can be evaluated as follows: \[\begin{aligned} ((\lambda yx.yx)((\lambda y.yy)a))d &=_\alpha ((\lambda {\color{blue} b}c.{\color{red} b}c){\color{blue}( (\lambda y.yy)a)})d \\ &\Rightarrow_\beta (\lambda {\color{blue} c}.((\lambda y.yy)a)~{\color{red} c}){\color{blue} d} \\ &\Rightarrow_\beta ((\lambda {\color{blue} y}.{\color{red} yy}){\color{blue} a})d \\ &\Rightarrow_\beta (aa)d \end{aligned}\] That last \(\lambda\)-term is in \(\alpha\beta\)-normal form, so we are done: \(((\lambda yx.yx)((\lambda y.yy)a))d \Downarrow aad\).

One consequence of the Church-Rosser theorem is that it does not matter what order you apply your \(\beta\)-reductions in. Try it!

(Evaluation) Evaluate the following \(\lambda\)-terms in two different ways by applying \(\beta\)-reductions in different orders.
  1. \((\lambda a.aa)((\lambda y.y)x)\)
  2. \(((\lambda ab.aba)(\lambda y.y))x\)

Computing in the \(\lambda\)-calculus

So far, we have seen evaluation in the \(\lambda\)-calculus as a process of taing \(\beta\)-reductions of \(\lambda\)-terms up to \(\alpha\)-equivalence. This is exactly the kind of thing we were talking about when we said that computation operated "by syntactic means": strings becoming strings via purely syntactic transformations. What we do with the syntax, i.e., how we interpret the syntax, is now up to us!

Given an alphabet \(A\) and sets \(S_1,S_2\), recall that a string representation of a function \(f \colon S_1 \to S_2\) consists of two representations, \(\rho_1 \colon S_1 \to A^*\) and \(\rho_2 \colon S_2 \to A^*\) respectively, and a function \(g \colon A^* \to A^*\) such that \(g(\rho_1(s)) = \rho_2(f(s))\) for all \(s \in S\) and \(\rho_1,\rho_2\) are injective. In order to get a notion of representation in \(\lambda\)-terms, we need to relax the strict equality to \(\alpha\)-equivalence instead.

(\(\lambda\)-representation) A \(\lambda\)-term representation of a set \(S\) is a function \(\rho \colon S \to \lambda\mathit{Term}\). Now, for arbitrary sets \(S_1,S_2\), let \(f \colon S_1 \to S_2\) be a function. A \(\lambda\)-representation of \(f\) consists of two \(\lambda\)-term representations \(\rho_1 \colon S_1 \to \lambda\mathit{Term}\) and \(\rho_2 \colon S_2 \to \lambda\mathit{Term}\), and a \(\lambda\)-term \(\mathsf{g}\) such that for any \(s \in S\), \[ \mathsf{g}~\rho_1(s) \Downarrow \rho_2(f(s)) \]

Notice that nothing is said for how an arbitrary \(\lambda\)-term is evaluated by \(\mathsf{g}\), only the ones that are representatives of elements of \(S_1\). Any arbitrary junk could be shoved next to \(\mathsf{g}\), and it would output something. It just might not have a meaning. Think about Chomsky's famous sentence, "Colourless green ideas sleep furiously". Yes, it's syntactically technically correct, but totally devoid of any interpretation!

The next thing we are going to do is look at a whole lot of different \(\lambda\)-term representations of everyday objects, starting with a pretty simple type.

Boolean Logic

Let \(B = \{\mathtt{True}, \mathtt{False}\}\) be the set containing the two Boolean truth values. Consider the following \(\lambda\)-interpretation of this set: \[\begin{gathered} \rho \colon B \to \lambda\mathit{Term} \\ \rho(\mathtt{True}) = \mathsf{T} = \lambda xy.x \qquad \rho(\mathtt{False}) = \mathsf{F} = \lambda xy.y \end{gathered}\] In some way or form, these two \(\lambda\)-terms are the representatives of the true/false logic that we know and love. But how? The only way we can find out is combining them with the logical operators we know and love.

(AND) For example, let's define the "and" operation. Consider the \(\lambda\)-term \[ \mathsf{AND} = (\lambda xy.yxy) \] Where did this come from? Who cares! Does it work? Well, let's see: let's try computing \(\mathsf{AND}~\mathsf{T}~\mathsf{T}\), which we think of as "true and true". Intuitively, this should evaluate to \(\mathsf{T}\), right? Let's apply the definitions and \(\beta\)-reduce: \[\begin{aligned} \mathsf{AND}~\mathsf{T}~\mathsf{T} &=_\alpha (\lambda {\color{blue} a}b.b{\color{red} a}b){\color{blue} (\lambda cd.c)}(\lambda xy.x) \\ &\Rightarrow_\beta (\lambda {\color{blue} b}.{\color{red} b}(\lambda cd.c){\color{red} b}){\color{blue} (\lambda xy.x)} \\ &\Rightarrow_\beta (\lambda {\color{blue} x}y.{\color{red} x}){\color{blue} (\lambda cd.c)}(\lambda xy.x) \\ &\Rightarrow_\beta (\lambda {\color{blue} y}.(\lambda cd.c)){\color{blue} (\lambda xy.x)} \\ &\Rightarrow_\beta (\lambda cd.c) \\ &=_\alpha \mathsf{T} \\ \end{aligned}\]
(\(\lambda\)-circuits) Evaluate the following \(\lambda\)-terms.
  1. \(\mathsf{AND}~\mathsf{F}~\mathsf{T} \)
  2. \(\mathsf{AND}~\mathsf{T}~\mathsf{F} \)
  3. \(\mathsf{AND}~\mathsf{F}~\mathsf{F} \)

This confirms that \(\mathsf{AND}\) is a \(\lambda\)-term representation of the "and" function from Boolean logic. Formally speaking, let \(\wedge \colon B \times B \to B\) be the usual logical "and". We can define the \(\lambda\)-representation of paris of Booleans by setting \(\langle \rho,\rho \rangle \colon B \times B \to \lambda\mathit{Term}\) equal to \[\begin{aligned} \langle \rho,\rho \rangle(\mathtt{True}, \mathtt{True}) &= \mathsf{T}\mathsf{T} \\ \langle \rho,\rho \rangle(\mathtt{False}, \mathtt{True}) &= \mathsf{F}\mathsf{T} \\ \langle \rho,\rho \rangle(\mathtt{True}, \mathtt{False}) &= \mathsf{T}\mathsf{F} \\ \langle \rho,\rho \rangle(\mathtt{False}, \mathtt{False}) &= \mathsf{F}\mathsf{F} \end{aligned}\] Then we have just verified that \[ \mathsf{AND} \langle \rho,\rho \rangle(a, b) \Downarrow \rho(a \wedge b) \] This is the formal statement that says that \(\mathsf{AND}\) is a \(\lambda\)-representation of \(\wedge\).

(OR WHAT) Let \(\vee \colon B \times B \to B\) be the logical "or" function. Find a \(\lambda\)-representation \(\mathsf{OR}\) of \(\vee\), and evaluate its truth table.

Number-theoretic Functions

We can also represent numbers as \(\lambda\)-terms. Let \(\mathsf{C} \colon \mathbb N \to \lambda\mathit{Term}\) be the function \[\begin{aligned} \mathsf{C_0} &= \lambda fx.x \\ \mathsf{C_1} &= \lambda fx.fx \\ \mathsf{C_2} &= \lambda fx.f(fx) \\ \mathsf{C_3} &= \lambda fx.f(f(fx)) \\ &\vdots \\ \mathsf{C_n} &= \lambda fx.\overbrace{f(f\cdots f(f}^{\text{\(n\) times}} x) \cdots ) \\ \end{aligned}\] The \(\lambda\)-term \(\mathsf{C_n}\) is called the Church numeral of \(n\). The functions \(\mathbb N \to \mathbb N\) that are representable by \(\lambda\)-terms are called Church-computable, like the following example.

(Successor) Consider the \(\lambda\)-term below, \[ \mathsf{S} = \lambda abc.b(abc) \] Applying this \(\lambda\)-term to \(\mathsf{C_1}\), we evaluate as follows \[\begin{aligned} \mathsf{S}\mathsf{C_1} &= (\lambda abc.b(abc))(\lambda fx.fx) \\ &\Rightarrow_\beta \lambda bc.b((\lambda fx.fx)bc) \\ &\Rightarrow_\beta \lambda bc.b((\lambda x.bx)c) \\ &\Rightarrow_\beta \lambda bc.b(bc) \\ &= \lambda bc.bbc \\ &=_\alpha \lambda fx.ffx \\ &= \mathsf{C_2} \end{aligned}\] In other words, \(\mathsf{S} \mathsf{C_1} \Downarrow \mathsf{C_2}\). This \(\lambda\)-term \(\mathsf{S}\) represents the function \((x + 1) \colon \mathbb N \to \mathbb N\)!
(Greg the Egg) Evaluate the following \(\lambda\)-term to verify that it is the Church numeral for \(3\). \[ \mathsf{S}~(\mathsf{S}~(\mathsf{S}~\mathsf{C_0} ) ) \]

One of the funny things about Church numerals is that they really represent "repeated application".

(Repeat Application) Evaluate the \(\lambda\)-term \[ (\mathsf{C_2}~\mathsf{S})~\mathsf{C_1} \]

Using this idea, a lot of other number-theoretic functions can be defined. For example, addition can be represented as \[ \mathsf{ADD} = \lambda n m.(n~\mathsf{S})m \]

(1+1) Evaluate the \(\lambda\)-term \[ \mathsf{ADD}~\mathsf{C_1}~\mathsf{C_1} \]

Using the same idea, multiplication can be defined as repeated addition. But I'll let you figure this out!

(Multiply by 3) Find a \(\lambda\)-term \(\mathsf{M}_3\) that represents multiplication by \(3\). That is, if \(\mathsf{C} \colon \mathbb N \to \lambda\mathit{Term}\) is the Church-numeral representation of natural numbers, \[ \mathsf{M}_3\mathsf{C_n} \Downarrow \mathsf{C_{3n}} \] Verify that \(\mathsf{M_3} \mathsf{C_3} \Downarrow \mathsf{C_9}\) using your definition of \(\mathsf{M_3}\).
Try this first for \(2\).
You can do this in multiple ways, but it's easiest (since you just care about \(3\)) to do this with \(\mathsf{S}\) and avoid \(\mathsf{ADD}\). For \(2\), you want to apply \((\mathsf{SS})\) exactly \(n\) times to get \(2n\). How would you replicate this for \(3\)?
← computationturing machines →
Top