What is functional programming?
In the early days of my interest in Computer Science (in the late 1980's), flushed with enthusiasm for the connections between Category Theory and Computer Science, I wrote a paper with that title. It was accepted for the first volume of a new journal, Mathematical Structures in Computer Science, and was promised even in the advertisements for the new journal. But then the editors had a change of heart and withdrew their acceptance. (I must find in my papers the letters.)
I accepted this rejection without a whimper because my ex student Mike Johnson was against the paper (or at least its title) and even I came to believe that I had been rather presumptuous (as I tend to be when I enter a new subject - see recently probability theory). I received further criticism later for my impudence from Phil Wadler when he visited us in Sydney. However I did include some hints about the idea in my book "Categories and Computer Science" in sections 5.5 and 6.2, under the modest title "The specification of functions".
I am persuaded to write more about the topic by the fact that, judging by exchanges on the categories mailing list, the average category theorist doesn't have any idea what functional programming is. The matter has not been well explained. Those who do know functional programming speak what seems to be a foreign language when they refer to it. In addition a functional programmer is likely to talk about the most advanced aspects, whereas it is possible to begin with very simple ideas. One of the striking advantages of category theory is that one can begin simply, already seeing part of the idea, and then add structure revealing more richness.
Don't be put off by the simplicity of my first remarks.
The very simplest version - Categories As one might imagine from the name functional programming regards the computation of functions. Consider the category Sets of sets and functions. We notice that there is an object I, a one point set, and an object N, the natural numbers, together with two functions o:I → N (zero) and s:N → N (the successor). We are interested to calculate the function f(x)=2x.
There are some equations this function satisfies namely fo=o and fs=ssf.
The graph of these two objects with the arrows o,s and f, together with the equations, form the generators and relations defining an abstract (syntactical) category which naturally has an induced functor to Sets taking s to successor, o to 0, the symbol f to the concrete doubling function f.
We can calculate in this syntactical category as follows:
fssso=ssfsso=ssssfso=ssssssfo=sssssso, which sequence of deductions is a calculation of the fact that f(2)=6.
So the program consists of the graph with objects I,N and arrows o,s,f ; the execution is the deduction fssso=ssfsso=ssssfso=ssssssfo=sssssso.
Let's make clearer what the programming language is, and what an execution is for this primitive language.
A program is a graph, and a finite number of equations u=v between paths in the graph. An execution of the program on a path is a sequence of deductions from the equations.
To be practical, we might modify the definitions slightly. Instead of equations take ordered equations (or rewrites) thus distinguishing between u=v and v=u (but possibly including both). Further an execution may only use the rewrites in the order u=>v given, and termination consists when the deduced path is irreducible.
With care a programmer can now organize that there are unique executions from paths in a desired set of paths, thus computing a function. If the programmer has in mind some actual functions which satisfy the equations then the correctness of the computation is clear. The more difficult things to manage are uniqueness of the result, the termination and the efficiency.
Remark What we have described is just traditional algebra, but applied to an algebra of functions. Suppose you want to calculate with numbers rather than functions. You notice they form a ring. You can talk about elements in a concrete ring by using symbols to describe some elements, forming the free ring to talk about other elements, and state equational properties. Then a calculation is a sequence of equational deductions. Example; (1+1)(1+1)=1(1+1)+1(1+1)=(1+1+1+1). The deductions may be made more efficient by ordering the equations.
The programming language above is very easy to implement; one just needs to program rewriting of words. There is a question as to how useful (as distinct from powerful) it is. Maybe with this amount of structure not much, but with a little more categorical structure we can really program. The resulting programs will resemble programs in Lisp.
Categories with products The category of Sets has finite products, and we would like to write programs to calculate such functions as addition :NxN → N, mutiplication, factorial and so on. More generally we would like to program functions on lists, and even program term rewriting itself, but for that see later.
I will tell you immediately what the programming language and its execution are. a program is a finite multigraph (in the sense of my book - that is, a set of objects, and a set of arrows each of which has as codomain an object but whose domain may be a finite product of objects) together with a finite number of (ordered) equations between terms (arrows in the free category with products on the multigraph). An execution starting with a term is a sequence of rewrites of the terms until arriving at an irreducible term.
I repeat: to write a program one should have in mind the particular functions in Sets one is describing and equations which involve them. It is all to easy to forget the semantics and end up in mindless symbol pushing. .
To see the strength of the language we will first write a program to calculate HCF : NxN → N starting with nothing. It will be clear from this example that such constructions as "while", "if then else", "sequence" can be programmed.
The objects in the multigraph are I and N as above. I will gradually be introducing new arrows into the multigraph, but the initial ones will be o : I → N, s : N → N and add : NxN → N (symbol for addition).
The first equations which will sufficient to calculate addition are:
add(s(x),y)=add(x,s(y))
add(o,y)=y.
An execution:
add(s(s(o)),s(o))=add(s(o),s(s(o)))=add(o,s(s(s(o))))=s(s(s(o))).
We don't need multiplication for calculating the GCD of two numbers but I will describe a program for it as an illustration (see also my post on tail recursion).
Adjoin to the multigraph two arrows mult : NxN → N and auxmult : NxNxN → N. For auxmult the function we have in mind is (m,n,p) → mn+p.
The further program equations which allow the computation of multiplication are:
auxmult(s(m),n,p)=auxmult(m,n,add(n,p))
auxmult(o,n,p)=p.
mult(m,n)=auxmult(m,n,o).
Now the algorithm I have in mind for GCD will require a Boolean test function, namely geq(m,n) – “greater than or equal”, and a partial function on NxN, namely minus. To deal with these it is convenient to introduce a new object M in the multigraph, to represent the set N+{true, false}. We will need an arrow i:N → M, j:M → N and two arrows true:I → M, false:I → M. The functions we have in mind are i(n)=n, and j(n)=n, j(true)=1, j(false)=0.
We adjoin an equation to relate the arrows into M with those in N:
j(i(n))=n
j(true)=s(o)
j(false)=o.
Now the function minus from NxN to M takes (m,n) to m-n if m is greater than or equal to n, otherwise minus(m,n)= false. We adjoin minus ; NxN → M to the multigraph. Its program is:
minus(m,o)=i(m)
minus(o,s(n))=false
minus(s(m),s(n))=minus(m,n).
We adjoin geq:MxN → M to the multigraph and the program:
geq(i(m),o)=true
geq(i(o),s(n))=false
geq(i(s(m)),s(n))=geq(i(m),n)
geq(false,n)=false
geq(true,n)=false.
Finally, the HCF of two natural numbers is defined using two new arrows in the multigraph, auxHCF: MxMxN → M, HCF : NxN → N. The function I have in mind for auxHCF takes (true,m,n) to HCF(m,n) if m ≥ n; takes (false, m,n) to HCF(m,n) if m < n; and is false for all other values.
auxHCF(true, i(m),s(n))=auxHCF(geq(minus(m,s(n)),s(n)),minus(m,s(n)),s(n))
auxHCF(false,i(m),n)=auxHCF(true,i(n),m)
auxHCF(true,i(m),o)=i(m).
HCF(m,n)=j(auxHCF(geq(i(m),n),i(m),n)).
Let's look at an execution, assuming that minus and geq execute correctly, and writing k for s^k(o).
HCF(6,9)=j(auxHCF(false,6,9))=j(auxHCF(true,9,6))=j(auxHCF(false,3,6))
=j(auxHCF(true,6,3))=j(auxHCF(true,3,3))=j(auxHCF(false,0,3))
=j(auxHCF(true,3,0))=j(i(3))=3.
Notice that I have not written down enough equations to calculate auxHCF(u,m,n) for all values of u,m,n, but enough to calculate HCF. I didn't say, but the value for HCF of (0,0) I have in mind is 0.
Remark It is obvious that adding more structure to the category (distributive, cartesian closed, ...) adds more linguistic mechanisms (if then else, higher order, ...) for writing programs. However the idea is already present here.
Notice another thing: programs can be written which may apply to different sets. For example, a sort algorithm may apply to many types of data. It can be written with unspecified objects and arrows, and later these may be identified with objects an arrows in another graph. I am saying that pushouts of graphs may be used to combine programs.
The same idea might be applied also to quite different semantic categories.
Remark Functional programming is sometimes promoted by saying that it is easier to check the correctness of such programs. That may be so if the programmer has in mind clearly all the functions involved. However if functional programming involves translating while loops into tail recursion it cannot be easier to check than the imperative program. Knowing the functions involved is like knowing invariants of imperative programs.
I accepted this rejection without a whimper because my ex student Mike Johnson was against the paper (or at least its title) and even I came to believe that I had been rather presumptuous (as I tend to be when I enter a new subject - see recently probability theory). I received further criticism later for my impudence from Phil Wadler when he visited us in Sydney. However I did include some hints about the idea in my book "Categories and Computer Science" in sections 5.5 and 6.2, under the modest title "The specification of functions".
I am persuaded to write more about the topic by the fact that, judging by exchanges on the categories mailing list, the average category theorist doesn't have any idea what functional programming is. The matter has not been well explained. Those who do know functional programming speak what seems to be a foreign language when they refer to it. In addition a functional programmer is likely to talk about the most advanced aspects, whereas it is possible to begin with very simple ideas. One of the striking advantages of category theory is that one can begin simply, already seeing part of the idea, and then add structure revealing more richness.
Don't be put off by the simplicity of my first remarks.
The very simplest version - Categories As one might imagine from the name functional programming regards the computation of functions. Consider the category Sets of sets and functions. We notice that there is an object I, a one point set, and an object N, the natural numbers, together with two functions o:I → N (zero) and s:N → N (the successor). We are interested to calculate the function f(x)=2x.
There are some equations this function satisfies namely fo=o and fs=ssf.
The graph of these two objects with the arrows o,s and f, together with the equations, form the generators and relations defining an abstract (syntactical) category which naturally has an induced functor to Sets taking s to successor, o to 0, the symbol f to the concrete doubling function f.
We can calculate in this syntactical category as follows:
fssso=ssfsso=ssssfso=ssssssfo=sssssso, which sequence of deductions is a calculation of the fact that f(2)=6.
So the program consists of the graph with objects I,N and arrows o,s,f ; the execution is the deduction fssso=ssfsso=ssssfso=ssssssfo=sssssso.
Let's make clearer what the programming language is, and what an execution is for this primitive language.
A program is a graph, and a finite number of equations u=v between paths in the graph. An execution of the program on a path is a sequence of deductions from the equations.
To be practical, we might modify the definitions slightly. Instead of equations take ordered equations (or rewrites) thus distinguishing between u=v and v=u (but possibly including both). Further an execution may only use the rewrites in the order u=>v given, and termination consists when the deduced path is irreducible.
With care a programmer can now organize that there are unique executions from paths in a desired set of paths, thus computing a function. If the programmer has in mind some actual functions which satisfy the equations then the correctness of the computation is clear. The more difficult things to manage are uniqueness of the result, the termination and the efficiency.
Remark What we have described is just traditional algebra, but applied to an algebra of functions. Suppose you want to calculate with numbers rather than functions. You notice they form a ring. You can talk about elements in a concrete ring by using symbols to describe some elements, forming the free ring to talk about other elements, and state equational properties. Then a calculation is a sequence of equational deductions. Example; (1+1)(1+1)=1(1+1)+1(1+1)=(1+1+1+1). The deductions may be made more efficient by ordering the equations.
The programming language above is very easy to implement; one just needs to program rewriting of words. There is a question as to how useful (as distinct from powerful) it is. Maybe with this amount of structure not much, but with a little more categorical structure we can really program. The resulting programs will resemble programs in Lisp.
Categories with products The category of Sets has finite products, and we would like to write programs to calculate such functions as addition :NxN → N, mutiplication, factorial and so on. More generally we would like to program functions on lists, and even program term rewriting itself, but for that see later.
I will tell you immediately what the programming language and its execution are. a program is a finite multigraph (in the sense of my book - that is, a set of objects, and a set of arrows each of which has as codomain an object but whose domain may be a finite product of objects) together with a finite number of (ordered) equations between terms (arrows in the free category with products on the multigraph). An execution starting with a term is a sequence of rewrites of the terms until arriving at an irreducible term.
I repeat: to write a program one should have in mind the particular functions in Sets one is describing and equations which involve them. It is all to easy to forget the semantics and end up in mindless symbol pushing. .
To see the strength of the language we will first write a program to calculate HCF : NxN → N starting with nothing. It will be clear from this example that such constructions as "while", "if then else", "sequence" can be programmed.
The objects in the multigraph are I and N as above. I will gradually be introducing new arrows into the multigraph, but the initial ones will be o : I → N, s : N → N and add : NxN → N (symbol for addition).
The first equations which will sufficient to calculate addition are:
add(s(x),y)=add(x,s(y))
add(o,y)=y.
An execution:
add(s(s(o)),s(o))=add(s(o),s(s(o)))=add(o,s(s(s(o))))=s(s(s(o))).
We don't need multiplication for calculating the GCD of two numbers but I will describe a program for it as an illustration (see also my post on tail recursion).
Adjoin to the multigraph two arrows mult : NxN → N and auxmult : NxNxN → N. For auxmult the function we have in mind is (m,n,p) → mn+p.
The further program equations which allow the computation of multiplication are:
auxmult(s(m),n,p)=auxmult(m,n,add(n,p))
auxmult(o,n,p)=p.
mult(m,n)=auxmult(m,n,o).
Now the algorithm I have in mind for GCD will require a Boolean test function, namely geq(m,n) – “greater than or equal”, and a partial function on NxN, namely minus. To deal with these it is convenient to introduce a new object M in the multigraph, to represent the set N+{true, false}. We will need an arrow i:N → M, j:M → N and two arrows true:I → M, false:I → M. The functions we have in mind are i(n)=n, and j(n)=n, j(true)=1, j(false)=0.
We adjoin an equation to relate the arrows into M with those in N:
j(i(n))=n
j(true)=s(o)
j(false)=o.
Now the function minus from NxN to M takes (m,n) to m-n if m is greater than or equal to n, otherwise minus(m,n)= false. We adjoin minus ; NxN → M to the multigraph. Its program is:
minus(m,o)=i(m)
minus(o,s(n))=false
minus(s(m),s(n))=minus(m,n).
We adjoin geq:MxN → M to the multigraph and the program:
geq(i(m),o)=true
geq(i(o),s(n))=false
geq(i(s(m)),s(n))=geq(i(m),n)
geq(false,n)=false
geq(true,n)=false.
Finally, the HCF of two natural numbers is defined using two new arrows in the multigraph, auxHCF: MxMxN → M, HCF : NxN → N. The function I have in mind for auxHCF takes (true,m,n) to HCF(m,n) if m ≥ n; takes (false, m,n) to HCF(m,n) if m < n; and is false for all other values.
auxHCF(true, i(m),s(n))=auxHCF(geq(minus(m,s(n)),s(n)),minus(m,s(n)),s(n))
auxHCF(false,i(m),n)=auxHCF(true,i(n),m)
auxHCF(true,i(m),o)=i(m).
HCF(m,n)=j(auxHCF(geq(i(m),n),i(m),n)).
Let's look at an execution, assuming that minus and geq execute correctly, and writing k for s^k(o).
HCF(6,9)=j(auxHCF(false,6,9))=j(auxHCF(true,9,6))=j(auxHCF(false,3,6))
=j(auxHCF(true,6,3))=j(auxHCF(true,3,3))=j(auxHCF(false,0,3))
=j(auxHCF(true,3,0))=j(i(3))=3.
Notice that I have not written down enough equations to calculate auxHCF(u,m,n) for all values of u,m,n, but enough to calculate HCF. I didn't say, but the value for HCF of (0,0) I have in mind is 0.
Remark It is obvious that adding more structure to the category (distributive, cartesian closed, ...) adds more linguistic mechanisms (if then else, higher order, ...) for writing programs. However the idea is already present here.
Notice another thing: programs can be written which may apply to different sets. For example, a sort algorithm may apply to many types of data. It can be written with unspecified objects and arrows, and later these may be identified with objects an arrows in another graph. I am saying that pushouts of graphs may be used to combine programs.
The same idea might be applied also to quite different semantic categories.
Remark Functional programming is sometimes promoted by saying that it is easier to check the correctness of such programs. That may be so if the programmer has in mind clearly all the functions involved. However if functional programming involves translating while loops into tail recursion it cannot be easier to check than the imperative program. Knowing the functions involved is like knowing invariants of imperative programs.
Labels: category theory, computing, mathematics
0 Comments:
Post a Comment
<< Home