PHILOSOPHY OF MATHEMATICS EDUCATION JOURNAL 11 (1999)
STATISTICIANS AND THE DRIVE FOR INTEGRITY Ray Thomas Open University r.thomas@open.ac.uk ABSTRACT
1 The Royal Statistical Society The original aims of the Royal Statistical Society established in the 19th Century were to 'to collect, arrange, digest and publish facts, illustrating the condition and prospect of society in its material, social, and moral relations.' (RSS, 1985). These aims are inscribed on the wall of the main meeting room of the Society in its building near the Barbican in London, but they have not been printed in any of the regularly produced publications of the RSS since 1985. The meaning of statistics has also changed since the RSS acquired its Royal Charter in the late 19th Century. Statisticians use the term statistics as a singular noun to refer to statistical method, rather than as a plural noun to mean facts about society. The address of the RSS president in 1996 called for an "evidence-based society in which informed quantitive reasoning is the dominant modality in public debate", and referred to these 19th century aims as an 'original, rather humble, purview', (see Smith, A.F.M., p. 368). Smith assumes that the evidence provided by statistics as facts about society is unproblematic, and that the work of the statistician is to apply statistical method to such facts. This assumption is consistent with standard statistical texts that define statistics as being concerned with natural phenomena, rather than 'facts about society'. Professor David Hand in his recent article 'Statistics and the theory of measurement' indicates a lack of recognition on the part of statisticians of the differences between natural phenomena and facts about society. Formally, Hand addresses statistics as facts about society as well as natural phenomena. He mentions 'constructs' such as school league tables, research assessment exercises, quality-of-life scales, and price and quantity indices. But the main examples used in discussion relate to a set of rigid rods, and to a collection of rocks of varying hardness (Hand, 1996). Mainstream social science sees statistics as facts about society as artefacts. Such statistics are produced by society as well as being about society. Except perhaps for statistics of births and deaths, it is assumed that all statistics about society (not just school league tables, research assessment exercises, and quality-of-life scales) are social or organisational products or constructs. Facts about society, in other words, do not belong to the natural world. The difference between 'natural phenomena' and 'facts about society' are related to the relevance of statistical theory. In statistical theory it is common to assume that errors of observation, or unobserved phenomena affecting the data being examined, are randomly distributed. But statistics on human activities are typically subject to systematic influences of unknowable magnitude. It cannot be assumed those errors of observation or exogenous influences on the statistics are randomly distributed. The RSS has moved away from its original objectives in part because of the growth of the activities of the GSS and other survey organisations that have increasing taken responsibility for the production of statistics as facts about society. The growth of official statistics has not lead to a renewal of interest by statisticians in facts about society. Government statisticians depend upon their skills in statistical method to produce what are generally called official statistics. And government statisticians cannot avoid some responsibility for the presentation, interpretation, and uses made of official statistics. But in exercising these wider responsibilities government statisticians are influenced by what has come to be seen as the meaning of the term statistics, i.e. statistical method, and by the professional orientation associated with statistical method. This paper argues that the education, training, and professional orientation of statisticians distances government statisticians from the exercise of responsibilities beyond production of statistics. First statistical education and training draws a boundary around statistical method that often excludes dealing with facts about society. The boundary does not include, for example, the concept of the population at risk which is central to demography. Second the growing popularity of Bayesian methods moves responsibility for the evaluation of evidence from the statistician to the user. Third the Green paper plea for integrity of official statistics indicates, in the context of the measurement of unemployment, indicates that government statisticians have a drive to distance themselves from the statistics they produce.
2 The problem of the denominator Most people probably think of demography, the study of human populations, as part of statistics. But demography belongs to 'fact about society' rather than statistical method. The boundary drawn around statistical method does not include one the fundamental concepts of demography: that of population at risk. In order to make comparisons over time or space between populations it is common to divide the variables being investigated by a measure of the population from which they are drawn. In demography the choice of denominator is commonly spoken of as the population at risk. There is often a choice in the selection of a population at risk. Demographers calculate crude death rates on the basis of the total population, age specific rates for particular age groups, and standardised mortality rates for the whole population, which also take into account the age structure of the population. Demographers calculate crude birth rates on the basis of total population or total female population, and birth rates per 1000 women of childbearing age, and age specific rates. The precise definition of the population at risk can reveal important characteristics of human behaviour. For example birth rates outside marriage, when calculated on a specific population at risk basis (i.e. using numbers of single, widowed and divorced women as the denominator), consistently show a peak in the 25-29 age group. Contrary to the public image, and it often seems the governmental image, relatively mature women are more likely to become single mothers than teenagers. The use of the concept population at risk identifies the principle that the denominator used should include the numerator. But this principle is not articulated in any of the standard statistical texts. It may be that the principle seems generally to be so obvious as to be taken for granted. But the closest to a statement of relevant principle I have been able to find is an Open University social science teaching text, which states. The aim, as in all statistical analysis, is to ensure, as far as possible, that comparisons are made between like and like. (The term ' like and like' is a statistical expression of fundamental importance). (Gore, 1975, p. 21). The concept of population at risk is recognised in epidemiology, and many of publications of the Office of Population Censuses and Surveys, but the concept does not seem to be recognised by statisticians outside these demography related areas. This lack of recognition is crucial in the measurement of unemployment for sub-groups of the population.. Statistics related to long term unemployment, often defined as those who have been unemployed for more than a year, provides an example. The ONS publication Labour Market Trends, and its predecessor The Employment Gazette, have long published statistics for the Count of Claimants giving the numbers with different durations of unemployment as a percentage of the current level of unemployment. The ONS has also published statistics for LFS unemployment giving the numbers unemployed for more than a year as a percentage of current unemployment (ONS, 1996, Table 25b, p. 65). The publication of these percentages reflects and reinforces the common use of by economists of the variable LAPU to refer to the ratio of the long-term unemployment to total unemployment. That is: LAPU = 100*LTUt /Ut Where LAPU is the ratio used, LTUt is the number of long term unemployed (usually a year or more) in year t, and Ut is the total number of unemployed in year t. This formulation violates the population at risk principle. The population at risk of being unemployed for more than a year (for example) is not the current number of unemployed but the number who were unemployed a year ago. The true long term unemployment rate is: TLTU = 100*LTUt /Ut-1 This is not a nit-picking matter. TLTU or LTUt /Ut-1, the true long term unemployment rate tells us, straightforwardly, what proportion of those who were employed a year ago are still unemployed. LAPU or LTUt /Ut literally tells use what proportion of the current unemployed have been employed more than a year. LTUt /Ut confounds statistics about those who have been unemployed for more than a year with statistics of what has happened to the general level of unemployment over the past year. It is difficult to imagine any statistical model for which LTUt /Ut would be an appropriate measure. A true long term unemployment series, properly calculated using number unemployed a year ago as a denominator, behaves quite differently from those calculated on a current basis. But ratios improperly calculated on unlagged statistics have been widely used in economic studies - in particular to support the idea of 'hysteresis'. The hysterisis explanation interprets the long-term growth of unemployment in terms of the unemployability of the long-term unemployed (see Cross, 1995). The hysteresis interpretation and the associated 'withered flowers' theory of unemployability have had, and continue to have, a considerable influence on current debates about unemployment and the management of the economy (see. for example, Layard 1998) Unemployment statistics for small areas provides another example of incorrect choice of denominator. In the case of the Count of Claimants the statistics for the numbers unemployed is available in great detail (down to postcode level), but the problem has been to find a suitable denominator. As a matter of statistical convenience it has been traditional to use what are called workforce based rates. With workforce based rates the major item of the denominator is an estimate of the number employed in the area which does not, because of commuting flows, correspond to the population at risk. The use of workforce based rates is not problematic at the national or regional level where the numbers in employment approximates to the number of residents in employment. But the use of workforce based is associated with the production of unemployment statistics for Travel to Work Areas which have been widely criticised for disguising local concentrations of unemployment (see Webster 1997, Webster & Turok 1997, Thomas 1998a). In the latest development the ONS is publishing workforce-based rates for local authority areas (see Hickson, 1998). These rates systematically mislead users on the location of unemployment. Nearly all major cities are daily importers of labour from a surrounding to suburban/exurban hinterland. The use of workforce rates systematically understates unemployment rates in the cities and overstates unemployment rates in the local authorities in the surrounding hinterland. (see Webster, 1998).
3 Bayesian developments Over very recent decades being a Bayesian has changed from being a minority interest among statisticians to the dominant interest. The growing popularity of Bayesian statistics appears from the outside to be more like a change in the style rather than a change in the boundary of statistics method. But this change of style seems to represent a move away from the examination of statistical evidence. Bayes' theorem dates from the 18th century. In essence the theorem expresses a posterior probability as the product of a prior probability and a likelihood function. A study of the occurrence of a rare disease, childhood leukaemia, provides an example of a Bayesian approach. A preliminary investigation produces a list of areas ranked according to the number of occurrences. Are the areas at the top of the list the most dangerous areas? The answer is arguably negative, because the list does not take into account the population sizes of the areas. If a new list is drawn up which expresses the number of occurrences as a percentage of the population at risk, then the ranked list of dangerous areas is quite different (see Langford, 1994). The new list seems to accord with common sense, but the term 'arguably' seems appropriate because taking into account the size of population does not always seem appropriate. Does it make sense to say, for example, that urban populations are in more danger when the 'danger' being considered is that of being killed by a meteor? The expected distribution in this study of childhood leukaemia is the prior probability in the Bayesian terminology. The prior probability would be modified in accordance with the actual incidence of the disease to arrive at a posterior probability, which gave the risk of getting the disease in different areas. The use of statistical evidence on the distribution of population leads its author to call this study empirical Bayes. But its status as an empirical study seems to depend upon the correctness of the assumptions underlying the calculation of the expected distribution. More commonly the concept of prior probability used is a judgement based on experience or special knowledge. At what may be the other extreme from this empirical use of Bayes theory, juries have been invited to express their prior probabilities on the guilt of the person being tried and to modify these prior probabilities in the light of the likelihood ratio derived from DNA blood matching tests. The use of such Bayesian procedures was thrown out in the Appeal Court precisely because the Court believed that juries should assess items of evidence separately (Balding, 1998, pp 1-2). The use of Bayesian statistics affects the division of responsibility between the statistician and the user. The typical Bayesian statistician asks users about their expectation and suggests how that expectation should be modified in the light of the statistical evidence provided by the statistician. It is possible to interpret this situation as one of the statistician giving responsibility to the user instead of making an independent judgement. Thus the growth in popularity of Bayesian methods among statisticians in general can be interpreted as taking the statistician away from the consideration of the user's data, and in general away from the assessment of evidence provided by facts about society or human activities.
4 The Count of Claimants and 'the drive for integrity' The use of the term 'integrity' in the title of this paper comes from the Green Paper published in February 1998, Statistics: A Matter of Trust, on the Government Statistical Service (GSS). Helen Liddell, at that time Economic Secretary to the Treasury and Minister responsible for the Office for National Statistics (ONS), stated that the Green Paper presents 'the Government's ideas for enhancing integrity, both actual and perceived, through improvements to their overall framework' (Treasury, 1998, p 1). Earlier papers by Heads of the GSS also address the issue of integrity (notably Moser, 1980, and Hibbert 1990). The integrity issue centres upon statistics of unemployment and specifically on the Count of Claimants. The Green Paper mentions unemployment statistics four times more that any other kind of statistic. The message of the Green Paper echoes that given by the mass media - and that of some members of the PUM Seminar. The unemployment figures have been fiddled. The integrity of the statistics has to be protected because of this fiddling. This message is misleading. The Count of Claimants was created as an administrative statistic in 1982. In terms of validity, reliability, and accuracy the Count of Claimants is in the same league as other administrative statistics - such as those for births, marriages and deaths. No one disputes that it is the responsibility of government to decide who is entitled to unemployment benefit. No one seriously disputes that the Count of Claimants validly and reliably covers the numbers receiving unemployment benefit. The problem in the 1980s and 1990s has been government exercised its power and there were many changes in the regulations which determined entitlement to unemployment benefit (see Fenwick & Denman, 1995). The Government Statistical Service (GSS) failed to find a defence to the criticisms that the unemployment statistics were being fiddled. Peter Stibbard, Director of Statistics at the Department of Employment at the time, reported "abuse levelled at my colleagues in the House of Commons, where it was said that if (his statisticians) were a football team they would be banned for bringing their profession into disrepute" (Working Party, 1995, p 405). Lack of continuity is a problem shared with other administrative statistics. When the legal definition of abortion changed, there was a discontinuity in the statistics. The coverage the Department of Inland Revenue's statistics of incomes change every year with budget changes affecting the numbers liable to income tax. Hospital waiting lists are vulnerable to administrative checking: some of those on the list have gone private, some have moved away, some have died, some no longer want their operations. In all of these areas legal changes, or what is seen as administrative necessity, overrides the desirability of achieving statistical continuity. This scapegoating of the GSS about the Count of Claimants is in part explicable in terms of the status of government statisticians as civil servants which could not have made it easy for them tell journalists or members of the opposition 'It is not our fault, but that of the Government'. But it seems likely that strength of public support for the charge of fiddling, and the difficulty of dealing with the charge, is attributable to a failure on the part of the GSS to give credible explanations. Against the charge of breaks in the series, the ONS produce SAUCC: the Seasonally Adjusted count of Unemployment Consistent with Current coverage. The ONS claim that this series is retrospectively consistent. The series indicates that changes in the regulations affecting entitlement to unemployment were a minor influence on the series. There seems no good grounds for disputing this claim. The reason why the Count was believed to be understating unemployment was not changes in the regulations affecting entitlement to unemployment benefit, but a transfer of claimants from the Count to sickness benefits. For most of the 80s and 90s local employment offices were following performance targets which included transfer to another benefit as a successful outcome. But the mid 1990s Britain had more people of working age on the sick than on the dole (the main study is Beatty et al 1996. See also Thomas, 1998b and 1998c).
5 Embracing the LFS Series The embarrassment of the GSS at the charges made of fiddling the unemployment statistics led to the Royal Statistical Society to establish a Working Party to investigate and report on unemployment statistics. The contribution made by that report in putting unemployment statistics on a serious statistical agenda was immense. But in accordance with the orientation of the statistical profession the report treated the measurement of unemployment as a matter of statistical method rather than a matter of getting facts about employment. The Report favoured the Labour Force Survey unemployment series produced in accordance with International Labour Office and Eurostat criteria over the Count of Claimants, and recommended that the sample size of the LFS be increased in order to produce a monthly 'headline' figure for LFS unemployment. The GSS followed up its 1995 report with a letter to Tim Holt. Director of the ONS which ended: 'We believe that a new series estimating the trend in unemployment based on the LFS definition would be a major advance in the presentation of unemployment statistics. It should take unemployment statistics out of the political and media arenas to the benefit of everyone, especially the ONS and the GSS (quoted in Bartholomew, 1997, p 388). Tim Holt responded very positively. There are many good reasons for giving more emphasis to the LFS statistics. But there are problems with the LFS and the LFS unemployment series (see Thomas, 1997a, 1997b, 1998b and 1998c). The plea for integrity, at least in the context of unemployment statistics, is consistent with favouring the LFS series because it is believed that it will avoid controversy, not because it provides better facts about employment.
6 The responsibilities of statisticians The ideal mathematician, according to Davis & Hearst (1983), is dependent on a very small circle of fellow specialists. According to Sir Claus Moser (head of the GSS 1967-78), in his presidential address to the Royal Statistical Society, the position of the statistician is similar:
The issue of 'MAD' uses of mathematics was briefly discussed at the first seminar in June. MAD uses included the use of mathematics for models for share dealing which brought instability to the stock market, the covering-up of data on CJD dangers, as well as the use of games theory for Mutually Assured Destruction simulations. Statistics as facts about society discussed in this paper do not relate to such extraordinary or exceptional situations, which are hopefully rare, but to ordinary situations. Government statistician differ from the ideal mathematician in that their work has an impact on society through the creation of official statistics. The question posed by this paper are about the effects of drawing a boundary round statistical method, favouring a Bayesian approach and seeking to take statistics out of politics. One set of questions relate to the use of official statistics and other statistics as facts about society. The democratic tradition assumes that reason and evidence, including statistical evidence, should play a part in debates about public policy. If official statistics are taken out of politics, for example, does this mean that statistical evidence will play a smaller part in debates about public policy? These questions are is discussed in Thomas, 1998d. A second set of questions are for this PUM seminar. They relate to the effects of the statisticians' handling of facts about society on the public image and understanding of statisticians and of mathematics.
At this point I can't resist going back to Sir Claus Moser's presidential address to the RSS in 1980. In his opening remarks he said:
It maybe that by making these remarks Moser was intending to lighten the mood of the occasion, but he allowed the words to stand in the record of the proceedings. Is this vignette far from the public image of statisticians? Is the image of mathematicians affected by that of statisticians? If so, it can be said that nothing that has happened among statisticians since 1980 seems likely to have improved the image of statisticians.
References:
Ray Thomas © The Author 1999 Web page maintained byP.M.Rosenthall@exeter.ac.uk. |