Pure Computation: The \(\lambda\)-calculus
Last lecture, we spent some time understanding how to represent decision problems and functions syntactically, i.e., as properties and transformations of strings. This allowed us to classify problems about numbers and trees using the heirarchy of families of languages we have seen so far (for eg., we were able to make sense of the statement, "deciding the parity of a natural number is a regular problem"). We also talked about how we can represent arbitrary functions, including any function of the form \(f \colon \mathbb N \to \mathbb N\), as a function on words.
Today we are going to focus on the latter item: we have decided to conceive of computation as "the manipulation of semantical objects via syntactic means". We have a clear conception, at this point, what it means to solve language membership problems syntactically (via automata, for example), but we haven't said anything about how to do do this for functions on strings. Next time, we will see that automata can also be used to define functions on strings, but today we are going to do something a bit different: we are going to use purely syntactic transformations---rewrite rules, nothing else---to define functions on strings. This is what the \(\lambda\)-calculus is for.
\(\lambda\)-terms
Let's start with a specific description of the syntax of the \(\lambda\)-calculus, what are called \(\lambda\)-terms. (Recall that we saw these in the Parse Trees section!)
For example, \((\lambda x.(y \circ (\lambda y. y)))\) is a \(\lambda\)-term.
We are going to simplify notation a bit sometimes, with the tacit understanding that the simpler strings of symbols representing \(\lambda\)-terms always represent unique honest-to-goodness \(\lambda\)-terms. For \(\lambda\)-terms \(t_1\) and \(t_2\), instead of \((t_1 \circ t_2)\) we will probably just write \(t_1t_2\). We also typically eliminate the outermost brackets. So, for example, the two expressions below represent the same \(\lambda\)-term: \[ \lambda x.y(\lambda y. y) = (\lambda x.(y \circ (\lambda y. y))) \] When we are feeling very lazy, we are also going to use the shorthands \[ t_1~t_2~t_3 = ((t_1 ~ t_2)~t_3) \qquad \lambda xy.t = (\lambda x.(\lambda y.t)) \]
- \(\lambda x.xy\)
- \(\lambda fx.fy\)
- \(\lambda xy.z(\lambda zy.x)\)
At this point, \(\lambda\)-terms are just syntax. They're meaningless strings of symbols. This is going to make them feel a bit abstract, which is a good thing at this point, but to help you understand what's going on here: every \(\lambda\)-term represents a function. In fact, every \(\lambda\)-term is a program that computes a function. Specifically, the term \(\lambda x.t\) represents a program that computes a function whose parameter is \(x\). For example, if we allowed ourselves arithmetic operations, we might write \[ f = \lambda x. (x + 1) \quad f(2) = 2 + 1 \] The idea is that the \(x\) is a symbol that we are supposed to replace when we are "evaluating" the function. Many programming languages have a feature like \(\lambda x\), sometimes just called \(\texttt{lambda x}\) (and in that context, it is called abstraction). For now, \(\lambda\)-terms are just strings of symbols, but it doesn't hurt to keep this in mind.
Substitution
Probably the most important notation to learn in the coming days is input variable substitution. This requires a bit of care: to begin with, observe that every \(\lambda\)-term is just a string of symbols, so in particular, every character in the string has an index. Given a \(\lambda\)-term \(t\), if the character at index \(i\) in \(t\) is \(a\), then we call the pair \((a, i)\) an instance of \(a\) in \(t\). We are being careful about this now because different instances of the same character in a \(\lambda\)-term can mean different things.
The following definition is a bit technical, so let's start with the basic idea: an instance of a variable in a \(\lambda\)-term is bound if it is in the scope of a \(\lambda x\), and free otherwise. Exact indices don't make a difference to bookkeeping in the end, so don't drive yourself nuts when counting the symbols in a string.
- \(\lambda x.xy\)
- \(\lambda x.yx(\lambda x.x)\)
- \(\lambda xy.z(\lambda zy.x)\)
Now that we have some idea about what counts as bound and free, let's talk about the key idea on which the \(\lambda\)-calculus is built: substitution.
For example, given any \(\lambda\)-term \(s\), we can compute the substitution \[ (\lambda x.y(\lambda y.y))[s / y] = (\lambda x.s(\lambda y.y)) \] The only free instance of \(y\) in \((\lambda x.y(\lambda y.y))\) is \((y, 4)\), so that is the only instance that is substituted. So, if \(s = \lambda x.x\), then we would have \[ (\lambda x.y(\lambda y.y))[(\lambda x.x) / y] = (\lambda x.(\lambda x.x)(\lambda y.y)) \]
- \((\lambda x.xy)[(\lambda f.f)/y]\)
- \((\lambda x.yx(\lambda x.x))[(\lambda f.f)/x]\)
- \((\lambda xy.z(\lambda zy.x))[(\lambda f.f)/z]\)
\(\beta\)-reduction and \(\alpha\)-equivalence
We are now ready to give some meaning to \(\lambda\)-terms. We already stated that every \(\lambda\)-term represents a function, and that the function it represents is somehow given by substituting variables. But what exactly does that mean? We define a rewrite system to make sense of this, similar to how grammars are defined, called \(\beta\)-reduction.
- If \(t_1 \Rightarrow_\beta t_2\), then \(t_1s \Rightarrow_\beta t_2 s\) and \(st_1 \Rightarrow_\beta st_2\)
- If \(t_1 \Rightarrow_\beta t_2\), then \(\lambda x.t_1 \Rightarrow_\beta \lambda x.t_2\)
- \((\lambda x.xy)(\lambda f.f)\)
- \((\lambda x.yx(\lambda x.x))(\lambda f.f)\)
- \(((\lambda zy.z(\lambda x.xy))(\lambda f.f))(\lambda f.f)\)
With a bit of care, the techincal condition in the definition of \(\beta\)-reduction (that free variables cannot be come bound after \(\beta\)-reduction) can be ducked. This is what is called \(\alpha\)-equivalence, although it's not quite as fancy as it sounds. The basic idea is that two functions like \(f(x) = x + 1\) and \(f(y) = y + 1\) are "really the same function". The only difference is what you have named the parameters.
- If \(t_1 \Rightarrow_\alpha t_2\), then \(t_1s \Rightarrow_\alpha t_2 s\) and \(st_1 \Rightarrow_\alpha st_2\)
- If \(t_1 \Rightarrow_\alpha t_2\), then \(\lambda x.t_1 \Rightarrow_\alpha \lambda x.t_2\)
- \((\lambda x.xy)\)
- \((\lambda y.xy)\)
- \((\lambda x.yx)\)
- \((\lambda y.yy)\)
- \((\lambda x.xx)\)
We have given \(\alpha\)-reduction a "direction", but it really doesn't have one. This is why we call it an equivalence.
- \(t =_\alpha t\)
- if \(t =_\alpha s\), then \(s =_\alpha t\)
- if \(t =_\alpha s\) and \(s =_\alpha r\), then \(t =_\alpha r\)
Up to \(\alpha\)-equivalence, all of the ingredients for using \(\lambda\)-terms as functions is actually buried in the definition of \(\beta\)-reduction. In fact, the stance that the \(\lambda\)-calculus takes is that "complete \(\beta\)-reduction = function computation". By "complete", we mean "no further reductions can be formed".
This leads to the definition of function evaluation stipulated by the \(\lambda\)-calculus.
It is important to note that if \(s_1 =_\alpha s_2\) and \(t \Downarrow s_1\), then \(t\Downarrow s_2\) as well. In other words, evaluation is not unique. However, it is unique up to \(\alpha\)-equivalence. This is what's known as the Church-Rosser Theorem.
We aren't going to prove this theorem; that would go beyond the scope of the course. But what it does tell us is that \(\beta\)-reduction does, up to \(\alpha\)-equivalence, give us a useful notion of "running a program to calculate the value of a function".
One consequence of the Church-Rosser theorem is that it does not matter what order you apply your \(\beta\)-reductions in. Try it!
- \((\lambda a.aa)((\lambda y.y)x)\)
- \(((\lambda ab.aba)(\lambda y.y))x\)
Computing in the \(\lambda\)-calculus
So far, we have seen evaluation in the \(\lambda\)-calculus as a process of taing \(\beta\)-reductions of \(\lambda\)-terms up to \(\alpha\)-equivalence. This is exactly the kind of thing we were talking about when we said that computation operated "by syntactic means": strings becoming strings via purely syntactic transformations. What we do with the syntax, i.e., how we interpret the syntax, is now up to us!
Given an alphabet \(A\) and sets \(S_1,S_2\), recall that a string representation of a function \(f \colon S_1 \to S_2\) consists of two representations, \(\rho_1 \colon S_1 \to A^*\) and \(\rho_2 \colon S_2 \to A^*\) respectively, and a function \(g \colon A^* \to A^*\) such that \(g(\rho_1(s)) = \rho_2(f(s))\) for all \(s \in S\) and \(\rho_1,\rho_2\) are injective. In order to get a notion of representation in \(\lambda\)-terms, we need to relax the strict equality to \(\alpha\)-equivalence instead.
Notice that nothing is said for how an arbitrary \(\lambda\)-term is evaluated by \(\mathsf{g}\), only the ones that are representatives of elements of \(S_1\). Any arbitrary junk could be shoved next to \(\mathsf{g}\), and it would output something. It just might not have a meaning. Think about Chomsky's famous sentence, "Colourless green ideas sleep furiously". Yes, it's syntactically technically correct, but totally devoid of any interpretation!
The next thing we are going to do is look at a whole lot of different \(\lambda\)-term representations of everyday objects, starting with a pretty simple type.
Boolean Logic
Let \(B = \{\mathtt{True}, \mathtt{False}\}\) be the set containing the two Boolean truth values. Consider the following \(\lambda\)-interpretation of this set: \[\begin{gathered} \rho \colon B \to \lambda\mathit{Term} \\ \rho(\mathtt{True}) = \mathsf{T} = \lambda xy.x \qquad \rho(\mathtt{False}) = \mathsf{F} = \lambda xy.y \end{gathered}\] In some way or form, these two \(\lambda\)-terms are the representatives of the true/false logic that we know and love. But how? The only way we can find out is combining them with the logical operators we know and love.
- \(\mathsf{AND}~\mathsf{F}~\mathsf{T} \)
- \(\mathsf{AND}~\mathsf{T}~\mathsf{F} \)
- \(\mathsf{AND}~\mathsf{F}~\mathsf{F} \)
This confirms that \(\mathsf{AND}\) is a \(\lambda\)-term representation of the "and" function from Boolean logic. Formally speaking, let \(\wedge \colon B \times B \to B\) be the usual logical "and". We can define the \(\lambda\)-representation of paris of Booleans by setting \(\langle \rho,\rho \rangle \colon B \times B \to \lambda\mathit{Term}\) equal to \[\begin{aligned} \langle \rho,\rho \rangle(\mathtt{True}, \mathtt{True}) &= \mathsf{T}\mathsf{T} \\ \langle \rho,\rho \rangle(\mathtt{False}, \mathtt{True}) &= \mathsf{F}\mathsf{T} \\ \langle \rho,\rho \rangle(\mathtt{True}, \mathtt{False}) &= \mathsf{T}\mathsf{F} \\ \langle \rho,\rho \rangle(\mathtt{False}, \mathtt{False}) &= \mathsf{F}\mathsf{F} \end{aligned}\] Then we have just verified that \[ \mathsf{AND} \langle \rho,\rho \rangle(a, b) \Downarrow \rho(a \wedge b) \] This is the formal statement that says that \(\mathsf{AND}\) is a \(\lambda\)-representation of \(\wedge\).
Number-theoretic Functions
We can also represent numbers as \(\lambda\)-terms. Let \(\mathsf{C} \colon \mathbb N \to \lambda\mathit{Term}\) be the function \[\begin{aligned} \mathsf{C_0} &= \lambda fx.x \\ \mathsf{C_1} &= \lambda fx.fx \\ \mathsf{C_2} &= \lambda fx.f(fx) \\ \mathsf{C_3} &= \lambda fx.f(f(fx)) \\ &\vdots \\ \mathsf{C_n} &= \lambda fx.\overbrace{f(f\cdots f(f}^{\text{\(n\) times}} x) \cdots ) \\ \end{aligned}\] The \(\lambda\)-term \(\mathsf{C_n}\) is called the Church numeral of \(n\). The functions \(\mathbb N \to \mathbb N\) that are representable by \(\lambda\)-terms are called Church-computable, like the following example.
One of the funny things about Church numerals is that they really represent "repeated application".
Using this idea, a lot of other number-theoretic functions can be defined. For example, addition can be represented as \[ \mathsf{ADD} = \lambda n m.(n~\mathsf{S})m \]
Using the same idea, multiplication can be defined as repeated addition. But I'll let you figure this out!