Statistical Inference as Severe Testing

Home > Other > Statistical Inference as Severe Testing > Page 57
Statistical Inference as Severe Testing Page 57

by Deborah G Mayo


  Four Philosophical Positions

  Admitting there is no unanimity as to either the definition or goal of “ objective” (default) Bayesianism, Berger (2006 , p. 386) outlines “ four philosophical positions” that default Bayesianism might be seen to provide:

  1. A complete coherent objective Bayesian methodology for learning from data.

  2. The best method for objectively synthesizing and communicating the uncertainties that arise in a specific scenario, but is not necessarily coherent.

  3. A convention we should adopt in scenarios in which a subjective analysis is not tenable.

  4. A collection of ad hoc but useful methodologies for learning from data.

  Berger regards (1) as unattainable; (2) as often attainable and should be done if possible, but concedes that often the best we can hope for is (3), or maybe (4). Lindley would have gone with (1).

  Is a collection of ad hoc but useful methodologies good enough? There is a fascinating philosophical tension in Berger’ s work: while in his heart of hearts he holds “ the (arguably correct) view that science should embrace subjective statistics” , he realizes this “ falls on deaf ears” (ibid., p. 388). When scientists demur: “ I do not want to do a subjective analysis, and hence I will not use Bayesian methodology,” Berger convincingly argues they can have it both ways (p. 389).

  Among the advantages to adopting a default Bayesian methodology is avoiding a subjective elicitation of experts. Berger finds elicitation does not work out too well. Far from providing a route within which to describe background knowledge in terms of prior probabilities, he finds elicitation foibles are common even with statistically sophisticated practitioners. “ [V]irtually never would different experts give prior distributions that even overlapped; there would be massive confusions over statistical definitions (e.g., what does a positive correlation mean?)” coupled with the difficulty of eliciting priors when, as is typical, “ the expert has already seen the data” (ibid., p. 392). But if the prior is determined post-data, one wonders how it can be seen to reflect information independent of the data. I come back to this. In his own experience Berger found:

  … for the many parameters for which there was data … all of the expert time was used to assist model building. It was necessary to consider many different models, and expert insight was key to obtaining good models; there simply was no extra available expert time for prior elicitation.

  (ibid.)

  He argues that the default choices have the advantage over trying to elicit a subjective prior:

  The problem is that, to elicit all features of a subjective prior π ( θ ), one must infinitely accurately specify a (typically) infinite number of things. In practice, only a modest number of (never fully accurate) subjective elicitations are possible, so practical Bayesian analysis must somehow construct the entire prior distribution π ( θ ) from these elicitations.

  (ibid., p. 397)

  A standard way to turn elicitations into full prior distributions is to use mathematically convenient priors (as with default priors). The trouble is this leads to Bayesian incoherence, in violation of the Likelihood Principle (LP). Why? Because “ depending on the experiment designed to study θ , the subjective Bayesian following this ‘ prior completion’ strategy would be constructing different priors for the same θ , clearly incoherent” (ibid.). Ironically, this LP violation is not directly driven by the need to compute the sampling distribution to obtain frequentist error probabilities: it is a way to try to capture a reasonably non-informative prior – as this is thought to depend on the experiment to be performed. 6 Any good error properties are touted as a nice bonus, not the deliberate aim, except for the special case of error probability matching priors.

  Berger maintains that the default, at best, achieves “ a readily understandable communication of the information in the observed data, as communicated through a statistical model, for any scientific question that is posed” (ibid., p. 388). We’ ve seen that there’ s considerable latitude open to the default Bayesian – the source of arguments that P -values overstate the evidence. It’ s hard to view those spiked priors as merely conveying what the data say (especially when they use a two-sided test). Another issue is that it often distinguishes parameters of “ interest” from additional “ nuisance” parameters, each of which must be ordered. In Bernardo’ s system of reference priors, there are as many reference priors as possible parameters of interest (1997, p. 169). That’ s because what counts as data dominance (he calls it “ maximizing the missing information” ) will differ for different parameters. Each ordering of parameters will yield different posteriors. Despite Berger’ s own misgivings in avoiding elicitation bias:

  A common and reasonable practice is to develop subjective priors for the important parameters or quantities of interest in a problem, with the unimportant or ‘ nuisance’ parameters being given objective priors.

  (ibid., p. 393)

  Here again we see the default Bayesian inviting both types of priors: if you have information, put it in the elicitation; if not, keep it out and choose one of the conventional priors. You might say there’ s nothing really schizophrenic in this, even subjectivists argue that default priors are kosher as approximations to what they would have arrived at in cases of minimal information (O’ Hagan, this forum). It’ s just faster. Should they be so sanguine? The tasks are quite different.

  [O]bjective priors can vary depending on the goal of the analysis for a given model. For instance, in a normal model, the reference prior will be different if inference is desired for the mean µ or if inference is desired for µ / σ . This, of course, does not happen with subjective Bayesianism.

  (Berger 2006 , p. 394)

  Trying to describe your beliefs is different from trying to make the data dominant relative to a given model and ordering of parameters. Subjectivists hold that prior beliefs in H shouldn’ t change according to the experiment to be performed. However, if they incorporate default priors, when required by complex problems, this changes. Since priors of different sorts are then combined in a posterior, how do you tell what’ s what? If nothing else, the simplicity that led Dawid to joke that Bayesianism is boring disappears.

  Ironic and Bad Faith Bayesianism

  A major impetus for developing default Bayesian methods, for Berger, is to combat what he calls “ casual Bayesianism” or pseudo-Bayesianism.

  One of the mysteries of modern Bayesianism is the lip service that is often paid to subjective Bayesian analysis as opposed to objective Bayesian analysis, but then the practical analysis actually uses a very adhoc version of objective Bayes, including use of constant priors, vague proper priors, choosing priors to ‘ span’ the range of the likelihood, and choosing priors with tuning parameters that are adjusted until the answer ‘ looks nice.’ I call such analyses pseudo-Bayes because, while they utilize Bayesian machinery, they do not carry with them any of the guarantees of good performance that come with either true subjective analysis (with a very extensive elicitation effort) or (well-studied) objective Bayesian analysis.

  (Berger 2006 , pp. 397– 8)

  Berger stops short of prohibiting casual Bayesianism, but warns that it “ must be validated by some other route” (ibid.), left open. One thing to keep in mind: “ good performance guarantees” mean disparate things to Bayesians and to frequentist error statisticians. Remember those subscripts. “ In general reference priors have some good frequentist properties but except in one-dimensional problems it is unclear that they have any special merit in that regard” (Cox 2006b , p. 6). Judging from the ensuing discussion, Berger’ s concern here is with resulting improper posteriors that can remain hidden in the use of computer packages. Improper priors are often not problematic, but posteriors that are not probabilities (because they don’ t add to 1) are a disaster.

  Interestingly, Lindley came to his subjective Bayesian stance after he was shown that conventional priors can lead to improper posteriors and thus to violations of probability theory (D
awid, Stone, and Zidek 1973 ). A remark that is especially puzzling or revealing, depending on your take:

  Too often I see people pretending to be subjectivists, and then using ‘ weakly informative’ priors that the objective Bayesian community knows are terrible and will give ridiculous answers; subjectivism is then being used as a shield to hide ignorance … In my own more provocative moments, I claim that the only true subjectivists are the objective Bayesians, because they refuse to use subjectivism as a shield against criticism of sloppy pseudo-Bayesian practice.

  (Berger 2006 , pp. 462– 3)

  How shall we deconstruct this fantastic piece of apparent doublespeak? I take him to mean that a subjectivist who properly recognizes her limits and biases and opts to be responsible for her priors would accept the constraints of default priors. A pseudo-Bayesian uses priors as if these really reflected properly elicited subjective judgments. In doing so, she (thinks that she) doesn’ t have to justify them – she claims that they reflect subjective judgments (and so who can argue with them?).

  Although most Bayesians these days disavow classic subjective Bayesian foundations, even the most hard-nosed, “ we’ re not squishy” Bayesians retain the view that a prior distribution is an important if not the best way to bring in background information. Here’ s Christian Robert:

  The importance of the prior distribution in a Bayesian statistical analysis is not at all that the parameter of interest θ can (or cannot) be perceived as generated from [prior distribution π ] … but rather that the use of a prior distribution is the best way to summarize the available information (or even the lack of information) about this parameter.

  (Robert 2007 , p. 10)

  But is it? To suppose it is pulls in the opposite direction from the goal of the default prior which is to reflect just the data.

  Grace and Amen Bayesians

  I edit an applied statistics journal. Perhaps one quarter of the papers employs Bayes’ theorem, and most of these do not begin with genuine prior information.

  (Efron 2013 , p. 134)

  Stephen Senn wrote a paper “ You Might Believe You Are a Bayesian But You Are Probably Wrong.” More than a clever play on words, Senn’ s title highlights the common claim of researchers to have carried out a (subjective) Bayesian analysis when they have actually done something very different. They start and end with thanking the (subjective?) Bayesian account for housing all their uncertainties within prior probability distributions; in between, the analysis immediately turns to default priors, coupled with ordinary statistical modeling considerations that may well enter without being put in probabilistic form. “ It is this sort of author who believes that he or she is Bayesian but in practice is wrong” (Senn 2011 , p. 58). In one example Senn cities Lambert et al. (2005, p. 2402):

  [T]he authors make various introductory statements about Bayesian inference. For example, ‘ In addition to the philosophical advantages of the Bayesian approach, the use of these methods has led to increasingly complex, but realistic, models being fitted,’ and ‘ an advantage of the Bayesian approach is that the uncertainty in all parameter estimates is taken into account’ … but whereas one can neither deny that more complex models are being fitted than had been the case until fairly recently, nor that the sort of investigations presented in this paper are of interest, these claims are clearly misleading…

  (Senn 2011, p. 62)

  While the authors “ considered thirteen different Bayesian approaches to the estimation of the so-called random effects variance in meta-analysis …” – techniques fully available to the frequentist, “ [n]one of the thirteen prior distributions considered can possibly reflect what the authors believe about the random effect variance” (ibid., pp. 62– 3).

  Ironically, Senn says, a person who takes into account the specifics of the case in their statistical modeling is “ being more Bayesian in the de Finetti sense” (ibid) than the default/non-subjective Bayesian. By focusing on how to dress the case into ill-fitting probabilistic clothing, Senn is insinuating, the Bayesians may miss context-dependent details solely because they were not framed probabilistically. Leo Breiman, an early leader in machine learning, needled Bayesians:

  The Bayesian claim that priors are the only (or best) way to incorporate domain knowledge into the algorithms is simply not true. Domain knowledge is often incorporated into the structure of the method used. … In handwritten digit recognition, one of the most accurate algorithms uses nearest-neighbor classification with a distance that is locally invariant to things such as rotations, translations, and thickness.

  (Breiman 1997 , p. 22)

  Nor need context-dependent information of a repertoire of mistakes and pitfalls be cashed out in terms of priors. They’ d surely be reflected in a post-data assessment of severity, which would be open to model builders from any camp.

  Finally, the “ the lip service that is often paid to subjective Bayesian analysis as opposed to objective Bayesian analysis,” far from being the “ modern mystery,” Berger (2006 , p. 397) dubs it, might reflect the degree of schizophrenia of default Bayesianism. After all, Berger 7 says that the default prior “ is used to describe an individual’ s (or group’ s) ‘ degree of belief’” (ibid., p. 385), while ensuring the influence of subjective belief is minimal. Moreover, in using default priors, he maintains, you’ re getting closer to the subjective Bayesian ideal (absent a full elicitation). So there should be no surprise when a default Bayesian says she’ s being a good subjective Bayesian. The default Bayesians attain an aura of subjective foundations for philosophical appeal, and non-subjective foundations for scientific appeal. If you come face to face with a default posterior probability, you need to ask which default method was used, the ordering of parameters, the mixture of subjective and default priors and so on. Even a transparent description of all that may not help you appraise whether a high default posterior in H indicates warranted grounds for H .

  6.4 What Happened to Updating by Bayes’ Rule?

  In striving to understand how today’ s Bayesians view their foundations, we find even some true-blue subjective Bayesians reject some principles thought to be important, such as Dutch book arguments. If it is agreed that we have degrees of belief in any and all propositions, then it is argued that if your beliefs do not conform to the probability calculus you are being incoherent. We can grant that if we had degrees of belief, and were required to take any bets on them, that, given we prefer not to lose, we do not agree to a series of bets that ensures losing. This is just a tautologous claim and entails nothing about degree of belief assignments. “ That an agent ought not to accept a set of wagers according to which she loses come what may, if she would prefer not to lose, is a matter of deductive logic and not a property of beliefs” (Bacchus, Kyburg, and Thalos 1990 , pp. 504– 5).

  The dynamic Dutch book argument was to show that the rational agent, upon learning some data E, would update by Bayes’ Rule, else be guilty of irrationality. Confronted with counterexamples in which violating Bayes’ Rule seems perfectly rational on intuitive grounds, many if not most Bayesian philosophers dismiss threats of being Dutch-booked as irrelevant. “ It is the entirely rational claim that I may be induced to act irrationally that the dynamic Dutch book argument, absurdly, would condemn as incoherent” (Howson 1997a , p. 287). Howson declares it was absurd all along to consider it irrational to be induced to act irrationally. It’ s insisting on updating by Bayes’ Rule that is irrational. “ I am not inconsistent in planning … to entertain a degree of belief [that is inconsistent with what I now hold], I have merely changed my mind” (ibid.). One thought the job of Bayesian updating was to show how to change one’ s mind reasonably.

  Counterexamples to Bayes’ Rule often take the following form: While an agent assigns probability 1 to event S at time t , i.e., Pr(S) = 1, he also believes that at some time in the future, say t ′ , he may assign a low probability, say 0.1, to S, i.e., Pr′ (S) = 0.1, where P′ is the agent’ s belief function at later time t ′ .
/>
  Let E be the assertion: P′ (S) = 0.1.

  So at time t , Pr(E) > 0.

  But Pr(S|E) = 1 since P(S) = 1.

  Now, Bayesian updating says:

  If Pr(E) > 0, then Pr′ (.) = Pr(. |E).

  But at t ′ we have, Pr′ (S) = 0.1,

  which contradicts Pr′ (S) = Pr(S| Pr′ (S) = 0.1) = 1 obtained by Bayesian updating. It is assumed, by the way, that learning E does not change any of the other degree of belief assignments held at t – never mind how one knows this.

  The kind of example at the heart of this version of the counterexample was given by William Talbott (1991 , p. 139). In one of his examples: S is “ Mayo ate spaghetti at 6 p.m., April 6, 2016” . Pr(S) = 1, where Pr is my degree of belief in S now (time t ), and E is “ Pr′ (S) = r ” , where r is the proportion of times Mayo eats spaghetti (over an appropriate time period); say r = 0.1. As vivid as eating spaghetti is today, April 6, 2016, as Talbott explains, I believe, rationally, that next year at this time I will have forgotten, and will (rationally) turn to the relative frequency with which I eat spaghetti to obtain Pr′ . Variations on the counterexample involve current beliefs about impairment at t ′ through alcohol or drugs. This is temporal incoherency.

 

‹ Prev