The How Mathematicians Work group, of which I am a member, has been concerned for some time that exploration of the nature of mathematics, whether as philosophical enquiry or for educational objectives (or both, as POME exemplifies), should be grounded in the actual behaviour of mathematicians.
I’ve always been less concerned to study such behaviour by objective methods; rather I want mathematicians themselves, the people most suited to register the nuances of this esoteric activity, to reflect upon what they do. Not being myself a philosoper, I am unable to say what might result thereby for the philosophy of mathematics, but I feel sure that reflective reports by mathematicians should form some part of its raw material.
As with any activity, it is difficult simultaneously to perform well and to reflect upon the performance, so an external registration of the activity is essential for later consideration. Fortunately the very process of doing maths leaves an automatic trace in the pieces of paper which accumulate when tackling a problem; this forms a kind of automatic protocolling of the process. If you, very soon after, review and annotate those papers you can sometimes recollect what has been going on within your thought processes.
The most striking thing about doing this is the sheer quantity of scribbled material one gets from the simplest of problems (and considerably more paper would be necessary to explicate fully what underlies them.) Unless you're very clear about how to proceed with a problem, which probably only occurs when an attempt is not truly exploratory, being merely a rehash of well-known procedures, you struggle up innumerable blind alleys making all sorts of false starts, mistakes, re-workings and sudden changes of direction. Maybe you do arrive somewhere in the end, but anyway you report to others a polished version which disguises the struggle. Indeed, such is the ethos prevailing within mathematics, that we find it positively embarassing to reveal how stupid we feel ourselves to be for not seeing our final conclusions from the outset. The tongue-in-cheek, bracketted words in my title express my own embarassment within the present paper. I still cannot escape the dreadful feeling that what I report below merely reveals my shameful inadequacy as a mathematician since the instances chosen describe either woefully little progress or more substantial progress to the obvious.
Nevertheless I also believe that this form of confession may be vital if we wish fully to understand the process of mathematics. We normally disguise from others, and perhaps even from ourselves, that the majority of mathematical work is a struggle through uncharted bog, most of which peters out in boredom or disillusion. (Again I feel a collapse of confidence — do I speak only for myself here?) If this is indeed the major part of mathematics as a process, then our efforts to understand that process should focus, in proportional measure, on all those things which get left out from the final presentation --- errors, misunderstandings and the like; and not just these but also social and inter-personal motivations for one’s work (sometimes, as exhibited below, rather petty and pathetic).
Of course, I am not asking for this to be the standard presentational discourse of mathematics, unless it is to tell a particularly illuminating story about possible sources of error to be avoided. Rather, I'm thinking of the burgeoning studies of the mathematical process directed mainly toward pedagogy. For if we don't reveal to students how much error and misunderstanding is to be expected, we simply tell them a lie, which exacerbates feelings of inadequacy. So we need to share more accounts about the detailed doing of mathematics, less to arrive at objective knowledge of the process than to stimulate, within the mathematical community, an unembarassed discourse about such matters.
My aim here is to look at the processes involved in tackling a few questions in mathematics which actually occurred during my normal mathematical activity. This is part of an on-going attempt to develop my self-awareness about my problem-solving moments. This might seem self-indulgent rather than self-aware, perhaps even a particularly unsightly form of self-abuse, but I'm operating upon an analogy which I have made regarding the similarity of recollection of mathematical processes to that involved in recollecting dreams. It has been often remarked that practice in recalling dreams develops the capacity for that recall; my hypothesis is that the same will be true regarding mathematics. The objective is two-fold: firstly, to develop my own self-knowledge of my processes in doing mathematics; secondly, as a suggestion for other mathematicians, to stimulate them to the same activity. Except perhaps in top-level mathematicians, each of us will experience only part of the variety of forms of mathematical intuition, so a corresponding variety of mathematicians might be required to report upon their procedures. Were we to become able freely to discourse about our difficulties, errors, blockages and insecurities, a beneficial impact on the teaching of mathematics might occur automatically.
I will describe a few specific cases and how I tackled (or am trying to tackle) them, interspersing the description with italicised comments on general matters of mathematical procedure; indeed, it is these italicised remarks which form the real message of this paper. The work described takes the form of short episodes concerning encapsulated, separable segments of mathematics, since for such short, isolated efforts it is easier to survey the task and its treatment, without being too overloaded with paper to be annotated. Of course, this means we will inevitably have a rather myopic focus and fail to study those aspects of mathematical work which concern large, strategic goals.
The first problem, on which I’ve made little progress as yet, comes from my relatively new interest in the mathematics of control theory. It arises from attempts to control the smoothness of sheet-material produced by fluid extrusion from a set of outlets, the number m of points where the thickness is measured being greater than the number n of extrusion points where control may be exercised. Assume that the m-vector of measurements has linear form y = A x + b + e, where x is the n-vector of controls, A, b are constants and e is a vector of sensor noise. If y = 0 is the desired state, because of dimension-deficiency we can only hope to minimise some distance-measure of y from 0; this is taken to be the maximum modulus of the components of y. Because of stochasticity we need then to take expectations of this distance over e to get the criterion which is to be minimised with respect to x.
Note that this is pretty typical of how quite nasty mathematical problems arise naturally from simply posed problems occurring in practical situations. I think quite a lot needs to be studied about the source of our problems, not just in terms of the physical things we try to model but, more importantly, the social situations that bring them to our attention. With respect to this problem, it needs to be said that I am working with engineers for the first time in my life, so that I suffer from a great deal of anxiety about whether I can really be of use to them, since I am merely a beginner in the subject and I have no engineering expertise.
A first stab at the problem involved using the triangle inequality on the expected distance to separate it into the part involving A x + b and the part involving e. The former is non-stochastic and, by itself, minimising the maximum modulus yields to linear programming; the latter is independent of x and, being merely added on, can be dropped from consideration. Of course, the inequality means that this is not the original problem but, rather, some solvable alternative which, it was hoped, would put some limits on the true solution.
Here we have a nice example of the desire to get things into familiar form. I suppose it is inevitable, that we try to transform new problems into forms that we know about. In this case, though, the reduction was too drastic; I must confess, though, that it took an embarassing length of time to see that it merely ignored the noise, a vital feature of the model.
I then tried to work directly on the problem as given. I began with no prior knowledge of what was known, I didn't know where to consult the literature and spent a good deal of time working out from scratch the fact that the objective function could be rewritten, in the case where components of e
were independently and identically distributed with distribution function F(x), as an expectation over a one-dimensional distribution K which was an m-fold product of differences G of values of F.
To do this I wanted to envisage how the m-dimensional space was carved up into regions in each of which one of the components of e had the largest modulus. This I found hard to do; I worked out the m=2 case easily, but for some reason couldn't see how it carried over into general m. I suppose it was because I was also grappling with probability theory which wasn't very familiar. It wasn't until later that I realised that the formula had a much easier, purely logical derivation.
However, the importance of good geometric visualisation is instanced by this small step in my understanding. Moreover, the importance of getting a good feel for a simple case as a stepping stone to seeing some regularity in the general case. I remember that having got a good picture of m=2 I had the visualisation of m=3 worrying me even as I went about other business. It seemed very necessary to get so that I could call up an immediate picture easily and familiarly; it was very satisfying when I got to that state.
Another point may be made here. When one doesn't know the answer beforehand, or even whether there is a usable answer, you remain nervous of believing things, even when you feel that you have a proof. It isn't simply a matter of careful checking of your purported proof; you want to have a variety of ways of seeing the same thing, so that the intuitive evidence, which we probably trust more than we do a formal proof-check, has a density of cross-relating, mutually supporting parts.
Only when I'd realised that K had the properties of a distribution did I start reading around for things about minimising expectations with respect to parameters, but this wasted quite a bit of time through not realising that the problem didn't lie in the general idea of an algorithm for minimising expectations, but rather in the very calculation of K itself. In fact, I had been working quite formally without considering which particular noise distribution might be present. The obvious and usual choice was the normal distribution, but for this choice F was the error function which made calculation of K (typically with m=80) a product of very many error functions. And it was that kind of thing (or its gradient) which would need calculating at each step of a gradient descent to the desired minimum.
It was actually only at this time that I tumbled to the fact that the previous approach had effectively ignored the noise. So I started wondering whether I could solve things with any noise distribution at all. I felt, intuitively, that any noise was better than none and I checked out this intuition with some statisticians, while not troubling them with the details of the problem. Then, in order to persuade myself that the exercise was worth doing at all, I calculated explicitly, for some low-dimensional cases, the difference between using a Gaussian noise, as against having no noise.
Here again we have a falling back on simple cases to buttress intuition, but there was an added twist of errors. I assured myself to my satisfaction that for quite small variances the minimum in the noisy case was substantially larger than in the noise-free case. It took some time before I realised that that was not the issue. It wasn't a case of checking the minimum in some equation (1) against that of another equation (2), but rather that of equation (2) against what you got if the minimising argument of (1) was substituted into (2), since the real situation was that of modelling something which was really noisy by a noise-free model.
I decided to try to work out a detailed formula for the difference in the simplest specific case, a noise whose density was uniform and symmetric. I did detailed hand calculations for m=2, n=1 and for m=3, n=2, but the latter got very messy. I then disappeared up a blind alley by thinking it might be better to work with a simpler G than that which was derived from a given F. I reasoned that everything I was doing was qualitative anyway, since the noise would be unknown, except for some empirically estimatedparameters, but I spent a bit of time trying to characterise the G's which were derivable from an F.
What was happening here was an attempt to take seriously an entity, G, which had been initially introduced merely as an algebraic convenience. This is an instance of the way in which notation can take on a life of its own, distorting one's ability to see past the notation to what it describes. The need, generally, to keep meaning in mind is surely the best rebuttal to formalism.
Now, I thought, was the time to learn MATHEMATICA, something I'd been putting off for ages. But as soon as I started I ran into trouble; it simply refused to do anything in the way of purely algebraic integration with the piecewise linear functions which I had adopted for F. I found out later that it did have a loadable package written for the handling of delta functions and the like, within which was some lines of MATHEMATICA code about the step function, which might have enabled symbolic handling of my integrals, but I was still reluctant to use it since these few useful lines were embedded in a nine-page printout, and I would feel obliged to understand it all to see how the bits I wanted fitted in. I felt too daunted.
Here we might observe a problem with mathematical tools, of which MATHEMATICA is merely an exemplary case. Generally, I suppose, mathematicians more than most other people will not take things on trust; if we are to use a theorem or lemma or method, we will spend some time trying to understand it in our own terms. With earlier tools, even if not built by ourselves --- say tabulated functions --- we would expect to spot any glaring errors. But the computer is somewhat more complex and hidden in its workings, so the closest we might hope to come to our traditional approach would be to study any algorithms which the machine employed. Whether attitudes are changing, perhaps with growing tendencies to team work in mathematics, I don't know, but I'm still struck by the number of people who prefer to write their own routines, rather than lift things off the shelf. For some interesting remarks on the links between computer-dependent developments in mathematics and the growing scientisation of mathematics, see Tymoczko (1995).
This is about the position currently reached, so I'll turn now to another example.
The second example is more in the nature of a curiosity which illustrates a different kind of error, that of misunderstanding, from the outset, what has to be proved. It concerns a relatively unimportant question which was put to me only incidentally by a colleague and which should have been settled quite rapidly.
This colleague was telling me that he was introducing some chaos theory into his dynamical systems course, but that he'd found mention of a curious form of converse to the main idea of chaos. It occurred in a short note by May (1976), discussing the significance of the following probability result:–
Begin with a urn containing 1 black and 1 white ball. At each move draw a ball at random, note its colour and replace it together with another ball of the same colour. May remarks that at stage N when the urn contains N+2 balls,
“.... for large N the proportion of white balls will tend to converge to some limiting value, p say: but this limiting value p is equally likely to take any value between 0 and 1 That is, when I play the game my plot of the proportion of white balls in the urn as a function of the number of rounds (or, equivalently, time) will exhibit some initial wiggles but will eventually settle to a steady flat line corresponding to, say p=0.391”
He continues
“Each run of this experiment will yield to the experimenter an illusion of determinism, as his results settle to a steady limiting value; ..... In short, this game provides an example where an underlying stochastic process gives results which appear deterministic .... It is, in a sense, the mirror image of the phenomenon .... whereby simple deterministic processes can produce results which appear indistinguishable from random fluctuations.”
I can't take too seriously this presumed analogy to a converse of chaotic behaviour, there being no need to go beyond counting the relative total occurence of heads within repeated coin tossing to get a settling to a definite limit. But I was provoked to re-solve the problem as posed.
Confession time again! A major reason for wishing to do this was to engage socially with my colleague, who was a new, young and quite dynamic research person in his own field, and who, I fantasised, perceived me to be dull, old and useless. Since I saw that the problem was a standard probabilistic one, whereas he seemed mystified by its meaning, I was tempted by a cheap way to be seen as useful.
I straightway wrote down the recurrence relation and proceeded to solve it by standard methods of getting a first order diffferential equation for the generating function and solving that. But this is where my stupidity begun. This equation, as will be seen, has a very simple solution, but my misunderstanding of what May was saying had led me completely astray. Because he framed his statement in terms of behaviour for large N I formed the idea that the solution was going to be a bit complicated, only becoming simple in an asymptotic limit.
So instead of doing a few early cases, to try to see a pattern, I went straight into general algebra. Then I made a simple transcription error when getting the general solution of the partial differential equation. I might have noticed this if I hadn't been pre-disposed toward finding a more intractable solution.
Here we see a rather ridiculous interaction between two pretty common errors: misunderstanding what one is after, and making an elementary algebraic mistake. The latter or, perhaps more accurately said, our worries that we might have made such an error, which drives us repeatedly to re-work and scrutinise our calculations, probably uses up most of our working time.
Anyway, I ended up running into a bog and had to back off. After which I spent quite a bit of time trying to get an asymptotic form for the distribution when N is large, directly from the recurrence equation. Starting with an expansion in inverse powers of the number of balls, you can see that the first few coefficients beyond the –1th power are zero, but their calculation gets more and more messy. Once again I failed to spot the obvious, that they were all zero except the first because the distribution is uniform for all N, large or small.
This is another small example of what can go wrong when you have too fixed an idea of what it is you are after. The whole scenario described only took hours but it contains a number of good instances of commonly occurring errors.
Let F be a family of positive continuous functions from the real line to itself, all of which are unimodal with a single maximum at x=0. We’re allowed to make real-valued functions of an n-vector x by composing elements of F with maximum rank linear functions. Does it necessarily follow that any such function has only a single relative maximum at x = 0?
Once again the solution was quite straightforward, at least in the differentiable case, from which the general case comes by a density argument. But this straighforward result once again cost me some time and effort because I began with the wrong intuition — that it might be possible, by judicious choice of the linear functions, to create another stationary point.
Why might I have formed that intuition, when actually I was asked to prove that there was only the zero solution? Probably a curious blend of a social interaction and a personal preference. To merely prove what was asked would not create the same dependence of the asker upon oneself; and, anyway, the simple answer seemed too boring. Wouldn't it be nicer if there was a surprise lurking in the problem? Now what sort of psychological inversion is this? I was effectively dismissing my own intuition of what should be true because it looked insufficiently exciting. In the context of the origin of the problem, it would have made a system's behaviour more intriguing. So the desire for simple intuition to be false led, almost unthinkingly, to a reversal of intuition.
So I begun by trying directly to construct a simple counterexample, but a difficulty was created at the outset by the fact that the problem had been posed about a very specific family of functions built up from exponentials and various parameters. So I got immediately bogged down in algebra, trying to match the conditions for the desired non-zero stationary point for this specific case.
A clear case here of failing to see the wood for the trees, though this stage didn't last long. Indeed it was the very unpleasantness of the algebra that forced me away from the specific to the more general although this didn't happen immediately, but in stages; perhaps this is always the motivation for abstraction, clearing the trees away better to see the wood!
After a bit I eased off the mindless calculation.
In fact, it is notable that the pleasure to be gained by mindless calculation can override one's perception that it might be a waste of time. It has the merit of any good displacement activity, that of postponing the pain of having to think, in favour of the comfort of knowing what your doing --- even if you hardly know why.
But I couldn't let go of the enterprise of finding a counter-example, so what I did instead was look for another class of functions where the algebra would be more tractable. After a few stabs at things like Gaussians, I hit upon looking at sech x.
Things had begun to slip a bit. I was tacitly allowing that the problem was really qualitative and so using other functions might be illuminating on how to give a general argument, but I was certainly not framing it so clearly to myself in this way --- rather I was just blundering. Also I'd got a bit desperate about handling a class of functions, so I was trying to deal with just a singleton F.
The point about using sech x was that the equations could be converted to polynomial ones by setting X = sinh x, Y = cosh x. I found myself in a mire of numerical calculation, trying to get some numbers to yield me a non-zero solution. At the lowest ebb, I ended up looking at the simplest possible case to assure myself that, in this case at least, there were no other solutions than the zero one. Unfortunately, this showed me clearly that all my numerical efforts had been a waste of time, since I'd accidentally dropped a square from a term in doing the algebra.
I do feel that a key problem in doing any kind of research, no matter how low grade, is that, since you have no prior knowledge of what the outcome will be, you are inevitably drawing upon intuition. It is this that makes intuition more important than formal proof in the process of doing mathematics. As Hadamard said: “Logic merely sanctions the conquests of the intuition.” Unfortunately, in this state of unknowing, it's very difficult to avoid making mistakes of a purely mechanical nature and, as noted above, the interaction of such mistakes with a false intuition maybe occupies the majority of our working effort. (Mine, at least, as these dismal confessions reveal.)
It's difficult to say what, precisely, is the key that makes one suddenly switch intuition. I was still looking for a counter-example and saw this particular case as illuminating why the zero solution would be forced in some circumstances, but still thinking that this might point to a boundary beyond which one could find other solutions.
At some point on the sheets of paper, which I ended up annotating, I'd written ``It's really all about signs'' and proceeded to write down various cases for the signs of derivatives of elements of F putting in various numbers. Then there was a sudden jump to looking at four functions and two variables, as if I'd persuaded myself that there was no counterexample to be found in the case of three functions, though it's not obvious why from what's written.
Here, I suppose, I'm pouring cold water on my idea that the scribbles one makes might enable reconstruction of thought processes, since I can't recall what was going on in this simple case, although I annotated the pieces of paper immediately afterwards. However, it does, at least, help us to be honest about where a jump is made and may help to avoid too glib a rational reconstruction of the process.
The idea of multiplying the first order conditions by the variables and summing, the key thought in the final simple proof, suddenly appeared on the penultimate page of my scribbles and then was immediately written up in the general case.
It's noteworthy that, in the result as written up, the statement of conditions under which the result holds, as well as the proof, were rationally reconstructed, since they were not given in the problem as originally stated. The issue here was: what was it about the given family which allowed the proof? This is in line with Lakatos' thinking about the back-and-forth moves between assumptions and conclusions in the production of a proof, but here I'm trying to get more at the psychological factors prompting any move in the process.
I’d be interested to hear from anyone who has any comments on the above, particularly other accounts of the experience of doing mathematics. (E-mail a.muir@city.ac.uk)
May, R. M. (1976) Irreproducible results, Nature, 262, p.646.
Tymoczko, T. (1995) Review of The Art of Mathematics, by J.P.King. Philosophia Mathematica, 3, 120-126.