Statistical Inference as Severe Testing

Home > Other > Statistical Inference as Severe Testing > Page 66
Statistical Inference as Severe Testing Page 66

by Deborah G Mayo


  Seidenfeld , T. (1979 ). ‘ Why I Am Not an Objective Bayesian; Some Reflections Prompted by Rosenkrantz ’ , Theory and Decision 11 (4 ), 413– 40 .

  Sellke , T. , Bayarri , M. , and Berger , J. (2001 ). ‘ Calibration of ρ Values for Testing Precise Null Hypotheses ’ , The American Statistician 55 (1 ), 62 – 71 .

  Selvin , H. (1970 ). ‘ A Critique of Tests of Significance in Survey Research’ in Morrison , D. and Henkel , R. (eds.), pp. 94 – 106 .

  Senn , S. (1994 a). ‘ Testing for Baseline Balance in Clinical Trials ’ , Statistics in Medicine 13 , 1715– 26 .

  Senn , S. (1994 b). ‘ Fisher’ s Game with the Devil ’ , Statistics in Medicine 13 (3 ), 217– 30 .

  Senn , S. (2001 a). ‘ Statistical Issues in Bioequivalence ’ , Statistics in Medicine 20 (17 – 18), 2785 – 2799 .

  Senn , S. (2001 b). ‘ Two Cheers for P-values? ’ Journal of Epidemiology and Biostatistics 6 (2 ), 193 – 204 .

  Senn , S. (2002 ). ‘ A Comment on Replication, P-values and Evidence’ , S. N.Goodman, Statistics in Medicine 1992; 11:875-879 ’ , Statistics in Medicine 21 (16 ), 2437– 44 .

  Senn , S. (2007 ). Statistical Issues in Drug Development , 2nd edn. Chichester, UK : Wiley Interscience .

  Senn , S. (2008 ). ‘ Comment on an Article by Gelman ’ , Bayesian Analysis 3 (3 ), 459– 62 .

  Senn , S. (2011 ). ‘ You May Believe You Are a Bayesian But You Are Probably Wrong ’ , Rationality, Markets and Morals (RMM) 2 , 48 – 66 .

  Senn , S. (2013 a). ‘ Comment on Gelman and Shalizi ’ , British Journal of Mathematical and Statistical Psychology 66 , 65– 7 .

  Senn , S. (2013 b). ‘ Seven Myths of Randomisation in Clinical Trials ’ , Statistics in Medicine 32 (9 ), 1439– 50 .

  Senn , S. (2014 ). ‘ Blood Simple? The Complicated and Controversial World of Bioequivalence’ , Guest Blogpost on Errorstatistics.com (6/5/2014).

  Senn , S. (2015 a). ‘ Double Jeopardy?: Judge Jeffreys Upholds the Law’ , Guest Blogpost on Errorstatistics.com (5/9/2015).

  Senn , S. (2015 b). ‘ Comment’ on Blogpost ‘ Can You Change Your Bayesian Prior?’ on Errorstatistics.com (6/18/2015).

  Senn , S. (2019 ). Statistical Issues in Drug Development , 3rd edn. Chichester, UK : Wiley Interscience .

  Sewell , W. (1952 ). ‘ Infant Training and the Personality of the Child ’ , American Journal of Sociology 58 , 150– 9 .

  Shaffer , J. (1995 ). ‘ Multiple Hypothesis-Testing ’ , Annual Review of Psychology 46 (1 ), 561– 84 .

  Silberstein , L. (1919 ). ‘ Contribution to “ Joint Eclipse Meeting of the Royal Society and the Royal Astronomical Society” ’ , The Observatory 42 , 389– 98 .

  Silver , N. (2017 ). ‘ There Really Was a Liberal Media Bubble’ , on FiveThirtyEight.com (3/10/2017).

  Simmons , J. , Nelson , L. , and Simonsohn , U. (2011 ). ‘ False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allow Presenting Anything as Significant ’ , Psychological Science 22 (11 ), 1359– 66 .

  Simmons , J. , Nelson , L. , and Simonsohn , U. (2012 ). ‘ A 21 word solution ’ , Dialogue: The Official Newsletter of the Society for Personality and Social Psychology 26 (2 ), 4 – 7 .

  Simonsohn , U. (2013 ). ‘ Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone ’ , Psychological Science 24 (10 ), 1875– 88 .

  Simonsohn , U. , Nelson , L. , and Simmons , J. (2014 ). ‘ P-Curve: A Key to the File-Drawer ’ , Journal of Experimental Psychology: General 143 (2 ), 534– 47 .

  Singh , K. , Xie , M. , and Strawderman , W. (2007 ). ‘ Confidence Distribution (CD)– Distribution Estimator of a Parameter’ , IMS Lecture Notes– Monograph Series, Volume 54, Complex Datasets and Inverse Problems: Tomography, Networks and Beyond , pp. 132– 50 .

  Skyrms , B. (1986 ). Choice and Chance: An Introduction to Inductive Logic , 3rd edn. Belmont, CA : Wadsworth .

  Sober , E. (2001 ). ‘ Venetian Sea Levels, British Bread Prices, and the Principle of the Common Cause ’ , The British Journal for the Philosophy of Science 52 (2 ), 331– 46 .

  Sober , E. (2008 ). Evidence and Evolution: The Logic behind the Science . Cambridge : Cambridge University Press .

  Spanos , A. (1986 ). Statistical Foundations of Econometric Modeling . Cambridge : Cambridge University Press .

  Spanos , A. (1999 ). Probability Theory and Statistical Inference: Econometric Modeling with Observational Data . Cambridge : Cambridge University Press .

  Spanos , A. (2000 ). ‘ Revisiting Data Mining: “ Hunting” with or without a License ’ , Journal of Economic Methodology 7 (2 ), 231– 64 .

  Spanos , A. (2007 ). ‘ Curve Fitting, the Reliability of Inductive Inference, and the Error‐ Statistical Approach ’ , Philosophy of Science 74 (5 ), 1046– 66 .

  Spanos , A. (2008 a). ‘ Review of S. T. Ziliak and D. N. McCloskey’ s The Cult of Statistical Significance ’ , Erasmus Journal for Philosophy and Economics 1 (1 ), 154– 64 .

  Spanos , A. (2008 b). ‘ Statistics and Economics ’ , in Durlauf , S. and Blume , L. (eds.), The New Palgrave Dictionary of Economics , 2nd edn., London : Palgrave Macmillan , pp. 1057– 97 .

  Spanos , A. (2010 a). ‘ Akaike-type Criteria and the Reliability of Inference: Model Selection Versus Statistical Model Specification ’ , Journal of Econometrics 158 (2 ), 204– 20 .

  Spanos , A. (2010 b). ‘ Is Frequentist Testing Vulnerable to the Base-Rate Fallacy? ’ , Philosophy of Science 77 (4 ), 565– 83 .

  Spanos , A. (2010 c). ‘ Theory Testing in Economics and the Error-Statistical Perspective’ , in Mayo , D. and Spanos , A. (eds.), pp. 202– 46 .

  Spanos , A. (2010 d). ‘ Graphical Causal Modeling and Error Statistics: Exchanges with Clark Glymour’ , in Mayo, D. and Spanos, A. (eds.), 364– 75 .

  Spanos , A. (2011 a). ‘ Revisiting the Welch Uniform Model: A Case for Conditional Inference? ’ , Advances and Applications in Statistical Science 5 , 33 – 52 .

  Spanos , A. (2011 b). ‘ Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation ’ , Rationality, Markets and Morals (RMM) 2 , 146– 78 .

  Spanos , A. (2012 ). ‘ Revisiting the Berger Location Model: Fallacious Confidence Interval or a Rigged Example? ’ , Statistical Methodology , 9 , 555– 61 .

  Spanos , A. (2013 a). ‘ R. A. Fisher: How an Outsider Revolutionized Statistics’ , on Error Statistics blog (2/17/13). errorstatistics.com/2013/02/17/r-a-fisher-how-an-outsider-revolutionized-statistics/ .

  Spanos , A. (2013 b). ‘ Who Should Be Afraid of the Jeffreys-Lindley Paradox? ’ , Philosophy of Science 80 , 73 – 93 .

  Spanos , A. (2013 c). ‘ A Frequentist Interpretation of Probability for Model-based Inductive Inference ’ , Synthese 190 (9 ), 1555– 85 .

  Spanos , A. (2014 ). ‘ Recurring Controversies about P values and Confidence Intervals Revisited ’ , Ecology 95 (3 ), 645– 51 .

  Spanos , A. (2019 ). Probability Theory and Statistical Inference: Empirical Modeling with Observational Data . Cambridge : Cambridge University Press .

  Spanos , A. and Mayo , D. (2015 ). ‘ Error Statistical Modeling and Inference: Where Methodology Meets Ontology ’ , Synthese 192 (11 ), 3533– 55 .

  Spiegelhalter , D. (2004 ). ‘ Incorporating Bayesian Ideas into Health-Care Evaluation ’ , Statistical Science 19 (1 ), 156– 74 .

  Spiegelhalter , D. (2012 ). ‘ Explaining 5 Sigma for the Higgs: How Well Did They Do?’ , Blogpost on Understandinguncertainty.org (8/7/2012).

  Spiegelhalter , D. , Freedman , L. , and Parmar , M. (1994 ). ‘ Bayesian Approaches to Randomized Trials ’ , Journal of the Royal Statistical Society, Series A 157 (3 ), 357 – 416 .

  Sprenger , J. (2011 ). ‘ Science without (Parametric) Models: The Case of Bootstrap Resampling ’ , Synthese 180 (1 ), 65 – 76 .

  Sprott , D. (2000 ). ‘ Comments on the Paper by Lindley ’ , Journal of the Royal Statistical Society, Series D 49 (3 ), 331– 2 .

  Staley , K. (2014 ). An Introduc
tion to the Philosophy of Science . Cambridge : Cambridge University Press .

  Staly , K. (2017 ). ‘ Pragmatic Warrant for Frequentist Statistical Practice: The Case of High Energy Physics’ , Synthese 194 (2 ),355– 76

  Staley , K. and Cobb , A. (2011 ). ‘ Internalist and Externalist Aspects of Justification in Scientific Inquiry ’ , Synthese 182 (3 ), 475– 92 .

  Stapel , D. (2014 ). Faking Science: A True Story of Academic Fraud . Translated by Brown , N. from the original 2012 Dutch Ontsporing (Derailment), http://nick.brown.free.fr/stapel .

  Steegen , S. , Tuerlinckx , F. , Gelman , A. , and Vanpaemel , W. (2016 ). ‘ Increasing Transparency Through a Multiverse Analysis ’ , Perspectives on Psychological Science 11 (5 ), 702– 12 .

  Stevens , S. (1946 ). ‘ On the Theory of Scales of Measurement ’ , Science 103 , 677– 80 .

  Stigler , S. (2016 ). The Seven Pillars of Statistical Wisdom , Cambridge, MA : Harvard University Press .

  Stone , M. (1997 ). ‘ Discussion of Papers by Dempster and Aitkin ’ , Statistics and Computing 7 , 263– 4 .

  Strassler , M. (2013 a). ‘ CMS Sees No Excess in Higgs Decays to Photons’ , Blogpost on Of Particular Significance (profmattstrassler.com ) (3/14/2013).

  Strassler , M. (2013 b). ‘ A Second Higgs Particle’ , Blogpost on Of Particular Significance (profmattstrassler.com ) (7/2/2013).

  Sugden , R. (2005 ). ‘ Experiments as Exhibits and Experiments as Tests ’ , Journal of Economic Methodology 12 (2 ), 291 – 302 .

  Suppes , P. (1969 ). ‘ Models of Data ’ , in Studies in the Methodology and Foundations of Science , Dordrecht, The Netherlands : D. Reidel , pp. 24 – 35 .

  Talbott , W. (1991 ). ‘ Two Principles of Bayesian Epistemology ’ , Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition 62 (2 ), 135– 50 .

  Taleb , N. (2013 ). ‘ Beware the Big Errors of “ Big Data”’ , WIRED/Opinion . Blogpost on Wired.com (2/8/2013).

  Taleb , N. (2018 ). Skin in the Game: The Thrills and Logic of Risk Taking . London : Penguin Books.

  Thaler , R. H. (2013 ). ‘ Breadwinning Wives and Nervous Husbands’ , New York Times (June 2, 2013: 3(L)).

  van Belle , G. (2008 ). Statistical Rules of Thumb , 2nd edn. Hoboken, NJ : John Wiley and Sons .

  Vigen , T. (2015 ). ‘ Tangled Bedsheets & Consumption of Cheese ’ , in Spurious Correlations , New York : Hyperion (tylervigen.com/spurious-correlations ).

  von Mises , R. (1957 ). Probability, Statistics and Truth . New York : Dover .

  Wagenmakers , E.-J. (2007 ). ‘ A Practical Solution to the Pervasive Problems of P values ’ , Psychonomic Bulletin & Review 14 (5 ), 779 – 804 .

  Wagenmakers , E.-J. and Grünwald , P. (2006 ). ‘ A Bayesian Perspective on Hypothesis Testing: A Comment on Killeen (2005) ’ , Psychological Science 17 (7 ), 641– 2 .

  Wagenmakers , E.-J. , Wetzels , R. , Borsboom , D. , and van der Maas , H. (2011 ). ‘ Why Psychologists Must Change the Way They Analyze Their Data: The Case of Psi: Comment on Bem ( 2011) ’ , Journal of Personality and Social Psychology 100 , 426– 32 .

  Wasserman , L. (2006 ). >‘ Frequentist Bayes is Objective ’ , Bayesian Analysis , 1 , 451– 6 .

  Wasserman , L. (2007 ). ‘ Why Isn’ t Everyone a Bayesian? ’ , in Morris , C. and Tibshirani , R. (eds.), The Science of Bradley Efron , New York : Springer , pp. 260– 1 .

  Wasserman , L. (2008 ). ‘ Comment on an Article by Gelman ’ , Bayesian Analysis 3 (3 ), 463– 6 .

  Wasserman , L. (2012 a). ‘ The Higgs Boson and the P-value Police’ , Blogpost on normaldeviate.wordpress.com (7/11/2012).

  Wasserman , L. (2012 b). ‘ What is Bayesian/Frequentist Inference?’ , Blogpost on normaldeviate.wordpress.com (11/7/2012).

  Wasserman , L. (2012 c). ‘ Nate Silver is a Frequentist: Review of “ The Signal and the Noise”’ , Blogpost on normaldeviate.wordpress.com (12/4/2012).

  Wasserman , L. (2013 ). ‘ The Value of Adding Randomness’ , Blogpost on normaldeviate.wordpress.com (6/9/2013).

  Wasserstein , R. and Lazar , N. (2016 ). ‘ The ASA’ s Statement on P-values: Context, Process and Purpose ’ , (and supplemental materials), The American Statistician 70 (2 ), 129– 33 .

  Wellek , S. (2010 ). Testing Statistical Hypotheses of Equivalence and Noninferiority , 2nd edn. Boca Raton, FL : Chapman and Hall, CRC Press .

  Wertheimer , N. and Leeper E. (1979 ). ‘ Electrical Wiring Configurations and Childhood Cancer ’ , American Journal of Epidemiology 109 , 273– 84 .

  Westfall , P. and Young , S. (1993 ). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment . New York : Wiley .

  Wiki How to Do Anything (2017 ). ‘ How to Get Out of Quicksand’ (wikihow.com/Get-out-of-Quicksand ).

  Wilks , S. (1938 ). ‘ The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses ’ , Annals of Mathematical Statistics 9 , 60– 2 .

  Wilks , S. (1962 ). Mathematical Statistics . New York : John Wiley & Sons .

  Will , C. M. (1986 ). Was Einstein Right?: Putting General Relativity to the Test , 1st edn. New York : Basic Books .

  Will , C. M. (1993 ). Theory and Experiment in Gravitational Physics . Cambridge : Cambridge University Press .

  Williamson , J. (2010 ). In Defence of Objective Bayesianism . Oxford : Oxford University Press .

  Wilson , R. A. (1971 ). Feminine Forever , 3rd edn. New York : Pocket Books . (First published 1968 by M. Evans & Company.)

  Woodward , J. (2000 ). ‘ Data, Phenomena, and Reliability ’ , Philosophy of Science 67 , S163 – S179 .

  Worrall , J. (1978 ). ‘ Research Programmes, Empirical Support, and the Duhem Problem: Replies to Criticism ’ , in Radnitzky , G. and Andersson , G. (eds.), Progress and Rationality in Science , Dordrecht, The Netherlands : D. Reidel , pp. 321– 38 .

  Worrall , J. (1989 ). ‘ Fresnel, Poisson and the White Spot: The Role of Successful Predictions in the Acceptance of Scientific Theories ’ , in Gooding , D. , Pinch , T. and Schaffer , S. (eds.), The Uses of Experiment: Studies in the Natural Sciences , Cambridge : Cambridge University Press , pp. 135– 57 .

  Worrall , J. (2002 ). ‘ What Evidence in Evidence-Based Medicine ?’ , Philosophy of Science 69 (S3 ), S316 – S330 .

  Worrall , J. (2010 ). ‘ Error, Tests, and Theory Confirmation’ , in Mayo , D. and Spanos , A. (eds.), pp. 125– 54 .

  Xie , M. and Singh , K. (2013 ). ‘ Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review ’ , International Statistical Review 81 (1 ), 3 – 39 .

  Xu , F. and Garcia , V. (2008 ). ‘ From the Cover: Intuitive Statistics by 8-month-old Infants ’ , Proceedings of the National Academy of Sciences 105 (13 ), 5012– 15 .

  Young , S. (2013 ). ‘ Better P-values Through Randomization in Microarrays’ , Guest Blogpost on Errorstatistics.com (6/19/2013).

  Zabell , S. L. (1992 ). ‘ R. A. Fisher and Fiducial Argument ’ , Statistical Science 7 (3 ), 369– 87 .

  Ziliak , S. and McCloskey , D. (2008 a). The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives . Ann Arbor, MI : University of Michigan Press .

  Ziliak , S. and McCloskey , D. (2008 b). ‘ Science is Judgment, Not Only Calculation: A Reply to Aris Spanos’ s review of “ The Cult of Statistical Significance” ’ , Erasmus Journal for Philosophy and Economics 1 (1 ), 165– 70 .

  Index

  21 word solution, Simmons, Nelson, and Simonsohn, 237

  Achinstein, P., 68 – 69 , 296 – 297 , 369n10

  ad hoc hypothesis, 90 , 107

  adequacy for a problem, 297

  adjustments for selection, 19 , 106 , 276 appeal procedure in, 279

  when not needed, 281

  Adler, A., 76 , 96

  affirming the consequent invalid, 63

  probabilistic, 62 – 63 , 67

  severe tester rejects, 71

  Akaike, H., 318 information criteria (AIC), 318 – 319

&n
bsp; score (AIC), 318n5

  all flesh is grass, 222 – 223 , 296

  Altman, D., 264

  American Statistical Association Six Principles, 215n4

  statement on P -values (ASA 2016 Guide), 17 , 215 – 216 , 272 , 395

  ancillary statistic, 200

  Anderson, D., 159 , 162 , 318 – 319

  apophenia, 19 , 301

  approximate vs. probable model, 296 – 297

  argument, 60 convergent, 15 , 107 , 320

  linked vs. convergent, 15

  soundness/unsoundness, 61

  validity/invalidity, 60

  argument from error and coincidence, 14 – 16 , 85 , 210 , 285 and stringent tests, 237

  argument from intentions, see intentions, argument from

  Armitage, P., 46 – 50 , 303 , 398 , 430

  array of models, 86 – 87 in 1919 eclipse experiments, 121 – 122

  hierarchy, 109 , 237 , 300

  auditing, 20 , 122 , 267 an inquiry (three questions), 122

  of P -value, 43 , 94 , 154

  tasks, 269 , 308

  autoregressive model, 316

  Bacchus, F., 415

  background information and auditing, 127

  and causal attribution, 233 – 235

  Cox on, 233

  in FIRST interpretations and Cox’ s taxonomy, 150 – 158

  use of in frequentist inference, 86 , 94 , 150 , 158 , 317

  Gelman on Cox, 234

  bad evidence, no test (BENT), 5 – 6 , 20 , 78 , 149 , 201 , 221 , 223 , 232 , 235 , 247 , 277 , 301 , 362

  Baggerly, K., 6 , 18

  Bailar, J., 281n7

  Bakan, D., 239 – 240

  Baker, M., 286

  Banerjee, A., 290 – 292

  Barnard, G., 13 , 31 , 46 , 48 , 126 – 127 , 139 , 156n8 , 181 – 182 , 324 , 388 , 420 – 421

  Bartlett, M., 47 , 389

  batch effects, 293 – 294 , 361

  Bayarri, M., 175 , 179 , 183 , 305 , 338 , 434

  Bayes B-boost (incremental), 63 , 66 – 73 , 95 vs. absolute, 67 ; see also affirming the consequent (probabilistic)

 

‹ Prev