Statistical Inference as Severe Testing

Home > Other > Statistical Inference as Severe Testing > Page 68
Statistical Inference as Severe Testing Page 68

by Deborah G Mayo


  Fraser, D., 23 , 391 , 392n10 , 405 , 428 , 430

  fraud busting, 21 , 54 , 284 – 285 role of error probabilities, 285

  Frayn, M., 375 and Copenhagen , 375

  Freedman, D., 272 , 305

  Freiman, J, 339n4

  frequentist matching priors, 254 – 256 , 264 , 400 , 411 , 428 – 430

  frequentist evidence principle (FEV), 148 – 150 , 212 applications to Cox’ s taxonomy, 152 – 158

  FEV(i): evidence of discrepancy, 149 , 152 – 153

  FEV(ii): evidence for the absence of a discrepancy, 150 , 152 – 153 , 157 , 343

  and severity (FEV/SEV), 153 , 159 , 352n10

  frequentist statistics, 13 , 53 – 55 , 110 – 111 , 407 as pessimistic, 298

  as piecemeal, 85 , 392 ; see also error statistics

  frequentist theory of induction, 111 , 115

  Freud, S., 76 , 92 , 96 , 280 – 281

  Freudian metaphor for significance tests (Gigerenzer), 178 – 180

  Fukushima, 69

  Gaifman, H., 419

  Gaito, J., 239

  Garcia, V., 79

  Geller, U., 31 , 235

  Gellerized hypothesis, 31 – 34 , 38 , 235

  Gelman, A., 27 , 83 , 94n4 , 105 , 228 , 234 , 256 , 289 , 304 – 305 , 359 , 361 , 401 , 403 , 407 , 417 , 419 , 432 – 435

  General Theory of Relativity (GTR), 69 , 119 – 129 , 156 , 160 – 162 , 217 , 226 , 238 , 300 ; see also eclipse tests

  getting beyond I’ m rubber and you’ re glue, 247 , 440 – 441

  getting beyond statistics wars, xii , 9 , 11 – 12 , 178 , 283 , 436 , 444

  Ghosh, J., 51 , 124 , 305n2 , 402 – 403 , 405 , 431 , 434

  Gianotti, F., 210

  Gibbons, J, 175 , 339

  Gibney, E., 214

  Giere, R., 285 , 422

  Gigerenzer, G., 22 , 93 , 96 , 136 , 178 – 180 , 182 , 224 , 422

  Gill, R., 284

  Gillies, D., 299 , 421

  Glymour, C., 51 , 69 , 70 , 73 , 156 , 227

  Goldacre, B., 20 , 40 , 276

  Goldman, A., 235

  Good, I. J., 31 , 46 – 47 , 223 , 282 , 389 , 418

  Goodman, S., 264 , 269 , 332 – 337 , 365 – 366

  Gopnik, A., 79

  Gorroochurn, P., 390n7

  Gosset, W. (aka Student), 132 – 133 , 141 , 384

  Greenland, S., 26 , 252 , 256 – 257 , 264 , 353 – 354 , 356 , 365 , 406

  Grünwald, P., 253

  Guggenheim Museum, 59

  Gurney, J., 158

  GWAS, genome-wide association studies, 277 , 294

  Hacking, I, 30 , 32 , 37 , 45 , 64 – 65 , 136 , 222 – 223 , 378 – 380 , 417

  Haidt, J., 7

  Haig, B., 264

  Hand, D., 166

  Hannig, J., 392n10

  Hardie, J., 291 – 292 , 294

  Harlow, H., 100

  Hawking, S., 157

  Hawthorne, J, 69 – 70

  Heisey, D., 355 – 358

  Hempel, C., 90

  Hendry, D., 279

  Henkel, R., 239 , 274 , 280

  Hennig, C., 228 , 407 , 419 , 435

  heuristics and biases literature, 422

  Higgs discovery, 10 , 189 , 202 – 207 , 210 – 215 , 217 , 331 , 405 ATLAS, 203 , 206 , 209 , 217

  ATLAS and CMS, 202 , 211 , 213

  null hypotheses in, 203 , 211

  severity principle, implicit, 208 – 210

  statistical controversies in, 202 – 217

  subjective Bayesians on, 202 – 203 , 213 ; see also statistical fluctuations

  Higgs, P., 202

  Hinkley, D., 45 , 53 , 153n7 , 172 , 174 – 175 , 194 , 198 , 250 , 252 , 440

  Hitchcock, C., 92n3

  HIV/AIDS, 292

  Hjort, N., 195 , 392n10

  Hochberg, Y., 277 – 278 , 362n7

  Hoenig, J., 355 – 358

  honest hunter, 275

  hormone replacement therapy (HRT), 20 – 21 , 230

  hot air balloon, 8 , 23 , 29 , 395 , 397

  Howson, C., 48 , 51n5 , 240 , 242 , 289 , 368 , 379 , 415 – 417 , 419

  Hsu, J., 162n11

  Hubbard, R., 175 , 179 , 183

  Huber, P., 302

  Huff, D., 3

  Hume, D., 61 , 75

  Hurlbert, S., 264

  IID/NIID, 81 , 141 , 184 , 191 , 248 , 299 , 303 , 309 and bootstrap resampling, 305 – 307

  figure of typical realization, 309

  and GTR, 129

  M-S testing of, 309 – 313

  and omnibus tests, 154

  and runs test, 310

  implicationary assumption (i-assumption), 109 – 110 , 167 , 187 , 276 , 298 , 310 , 319 , 434

  improper posteriors, 412 – 413

  incompatibilism (between Fisher and N-P), 173 , 175 , 178 – 179 , 216n4 , 439 getting beyond, 178 – 182

  incomplete statistic, 200

  induction faulty analogy with deduction, 64 – 66 , 380

  logical problem, 60 – 62

  Peirce’ s rationale, 267

  solving it now, 107 – 115 , 296

  inductive behavior, 110 , 146 , 176 – 177 , 181 , 390 – 391 and de Finetti, 146 ; see also performance

  inductive inference, 8 , 59 , 162 deductively invalid, 61

  as ampliative, 64 , 307 , 399 , 409 , 427

  frequentist statistics as a theory of, 147

  vs. inductive behavior, 110

  as severe testing, 92 , 108 – 109 , 209 , 302

  as warranted, 10 , 50 , 65 , 87 , 301 , 443 ; see also induction

  inductive logic, 32 , 379 evidential relationship, 275

  Fisherian leanings, 391 ; see also confirmation theory

  infant training study, 280 – 281

  inference, defined, 65

  insignificant results, 149 , 338 – 339 , 346 in cancer clusters, 158

  in developmental economics, 292

  and Fisher, 176 – 177

  fortifying inferences: DNA, 282

  in GTR, 125 – 126

  in Higgs, 211 – 213

  and infant training, 280 – 281

  and power, 342 – 343 , 434

  instrumentalism, 79 , 297

  intentions, argument from, 48 – 50 , 268 – 270 , 283 , 286 , 320 data other than observed, 273

  and stopping rules, 273 , 431 , 438 , see also Likelihood Principle

  International Society of Bayesian Analysis (ISBA), 202

  inverse inference, 24 , 383 , 386 – 387 , 404 , 409

  Ioannidis, J., 4 , 20n2 , 92n2 , 293n12 , 361 – 362 , 364 – 366 , 368 – 370

  irrelevant conjunctions, paradox of (tacking paradox), 69 – 70 and severity, 71 – 72

  Isaac, story of, 368 – 369

  Iyer, R., 7

  Jastrow, J., 288n9

  Jaynes, E., 403n4 , 419n9

  Jefferys, W., 74

  Jeffreys, H., 26 , 64 , 124 , 127 , 165 , 167 – 169 , 183 – 184 , 186 – 187 , 248 – 249 , 252 – 255 , 259n8 , 400 , 403n4 , 409 , 424

  Jeffreys– Lindley paradox (Bayes/Fisher or Jeffreys/Fisher Disagreement), 239 , 241 – 242 , 250 – 251 , 253 – 255 , 256n7 , 284

  Jeffreys’ tail area criticism, 168 – 170 , 206 , 332

  Jenkins, G., 47 , 302 , 313

  Johnson, V., 261 – 264 , 266 , 271 , 333

  Kadane, J., 25 , 50 – 51 , 166 – 167 , 180 , 207 , 231 , 240 , 287 , 289 , 400

  Kahneman, D., 97 , 99 , 422

  Kaku, M., 119

  Kalbfleisch, J., 180

  Kass, R., 7 – 8 , 11 , 28 , 381 , 400 – 401 , 411n6 , 424 – 430

  Kaye, D., 272

  Kempthorne, O., 174 , 325

  Kennefick, D.: re-analysis Sobral, 157

  Kerridge, D., 47n4

  Keynes, J., 90

  Kheifets, L., 158

  Kish, L., 280

  Klioner, S., 157n9

  known/old evidence problem, 51 , 405

  Kruschke, J., 273

  Kruse, M., 47 – 48


  Kuebler, R., 339n4

  Kuhn, T, xii, 88n1

  kuru, 81 – 82 , 109 , 238

  Kyburg, H., 64 , 232 , 415

  Lad, F., 227

  lady tasting tea, 16 – 19 , 31 , 169

  Lakatos, I., 83 , 90 , 92 – 93 , 235

  Lakens, D., 162n11 , 264

  Lambert, C., 293 – 295

  Lambert, P., 414

  large n problem, 240 – 245 in confidence intervals, 245

  and P -values, 240

  and severe tester, 241 ; see also mountains out of molehills

  Laudan, L., 77 , 88 , 129 , 225n1

  Law of Large Numbers (LLN), 112 , 298 , 426 Binomial, 112

  empirical law, 112 – 113

  theoretical (mathematical) law, 112

  Lazar, N., 17 , 215 – 216 , 395

  Lebesque, H., 131

  Leek, J., 7

  Leeper, E., 158

  Lehmann, E., 120 , 131 , 137 , 139 , 146 – 147 , 172 – 173 , 175 – 177 , 182 , 191 , 368 , 372 – 375 , 385n5 , 388 , 440

  Letzter, R., 106

  Levelt Committee (on Diederik Stapel), 78

  Levi, I., 379

  Liddell, T., 273

  lift-off vs. drag-down, 15 , 66 , 86 , 217 , 307 , 380

  light deflection, three predictions, 120 , see also eclipse tests

  likelihood, 10 , 30 – 31

  Likelihood, Law of (LL), 30 – 38 , 71 vs. error statistical, 41

  excludes compound hypotheses, 35

  vs. minimal severity requirement, 30

  compared to significance test, 34 , 242

  Likelihood Principle (LP), 30 , 41 , 44 – 46 , 172 , 438 – 439 and Bayesian inference, 398 , 431

  and Birnbaum’ s “ proof” , 173

  and irrelevance of predesignation, 269

  and logical empiricism, 90

  and P -values, 50 , 164 , 268 , 319

  post-LP inference, 173

  violated in bootstrap resampling, 305

  violated in model testing, 303 – 304

  weak, 148

  and weak conditionality principle (WCP), 172

  likelihood ratio (LR), 30 , 37 , 303 , 332 and Bayes factors, 184 , 247

  exhaustive [LR], 68 , 70

  hypothesis most generous to the alternative, H max , 252 , 254 , 260

  maximizing data, 260

  with the Normal, 53n6

  using α /(1– β ), 337 , 366

  likelihood ratio tests (lambda criterion), 133 – 135 , 139n4

  limb-sawing logic, 167

  Linda paradox, 422

  Lindemann, F., 128

  Lindley, D., 47 , 202 , 204 , 213 – 214 , 228 , 239 , 241 – 242 , 250 , 253 – 256 , 259 , 284 , 288 – 289 , 304 – 305 , 397 – 398 , 400 – 401 , 405 , 410 , 413 , 417 – 418

  Lindley’ s “ Philosophy of Statistics” , 397 criticism of, 398 – 400

  Lindman, H., 41 – 43 , 45 , 49 – 50 , 248 , 252 , 256 , 260 , 269

  linear regression model (LRM), 309 , 312 dynamic (DLRM), 316

  Little, R., 431

  Liu, C., 392n10

  live exhibits drill prompt (how tail areas exaggerate), 335

  final, 444

  Macho Men, 104 – 105

  Revisiting Popper’ s Demarcation of Science, 88

  severity when incalculable, 200

  Lodge, O., 127 , 128 , 230

  logical empiricism/positivism, 10 , 64 , 90 and Big Data, 229

  naïve, 79

  and probabilism, 65

  curse of, 224 – 225 , 227

  and verificationism, 229

  Loken, E., 105

  Lombardi, C., 264

  Longino, H., 236

  look elsewhere effect (LEE), 210 – 211 and Bayesian analysis, 54

  Louis, T., 43

  Lykken, D., 367

  Machery, E., 156

  Madigan, D., 305 , 405

  Maher, P., 71

  Marcus, G., 79

  Marewski, J., 136 , 182 , 224

  Markov dependency, 314 , 316

  Martin, R., 392n10

  Marx, K., 76 – 77 , 125

  Mason, J., 239 , 250

  maximum likelihood estimator, 133

  hypothesis, 271

  Mayo, D., 26 – 27 , 46 – 48 , 59 – 60 , 74 , 82 , 84 , 86 , 88n1 , 91 , 92n3 , 95 , 114 , 128 , 132 , 146n5 , 147 – 150 , 152 – 154 , 159 , 160n10 , 162 , 164 , 171 – 173 , 180 , 183 , 199 – 200 , 221 , 233 – 234 , 237 , 239 , 270 , 273 , 281 , 286 , 298 , 303n1 , 305 , 308 , 339 , 352n10 , 363 , 367 , 369n10 , 385 , 399n2 , 409 , 416 , 428 , 432 – 433

  McCloskey, D., 229 , 230 , 272n2 , 330 – 331

  mean heterogeneity, 313

  measurement, operationalist, 100 , 182 in psychology, 103

  Meehl, P., 92 – 95 , 125 , 294 , 367 – 368

  Meng, X. L., 433 , 435n8

  meta-analysis, 279 , 292 , 365

  meta-methodology, 9 – 10 , 54

  meta-research, 9 , 98 , 106 – 107 Meta-Research Innovative Center at Stanford (METRICS), 269n1

  methodological falsification, 82 – 83 , 95 , 377 ; see also Popper, K.

  methodological probability, 80 , 124 , 383 – 384 and error probability, 79 – 80 , 170 , 216

  in Higgs episode, 207

  metric theory of gravity, 161

  Michell, J., 100 – 101

  microarray, 53 , 277 , 293

  Mignard, F., 157n9

  Mill, J. S., 89 , 296

  Miller, J., 236

  miserable passages statistical theater (“ Les Miserables Citations” ), 372 – 375

  inferential construal, 375 – 378

  interpreted by contemporaries, 378 – 381

  misspecification test/model testing (M-S) A. Spanos’ approach, 308 – 320 ; see also auditing

  error fixing in, 311

  in error statistics, 308

  independent of primary question, 319

  irreplication and violated assumption, 320

  and model building, 319

  vs. model selection, 317

  non-exhaustive, 154 , 312

  ordering in, 319

  and Peirce, 307

  and predesignation, 319

  residuals as key, 303 , 310 , 314

  role of significance tests, 298 , 301 , 309 , 433 , 441 ; see also probabilistic reduction

  models, all are false, 4 , 296 – 297

  modus tollens , 84 unsound example, 126

  statistical, 94 , 351n9

  Molière, 208

  Morey, R., 363

  Morrison, D., 239 , 274 , 280

  mountains out of molehills fallacy 1st form (large n problem), 240 , 243n2

  2nd form, 144 , 266 , 326 , 359 , 366

  multiverse analysis, 105

  Munafò, M., 106 – 107

  Museums of Statistics, 30 , 46 , 397

  Statistical Science and Philosophy of Science, 59 , 119 , 131 , 156 , 189

  on Power Peninsula, 323 – 324

  Musgrave, A., 90 , 129

  National Women’ s Health Network, 21

  Nelder, J., 399 , 421

  Nelson, L., 43 , 237 , 270

  Nevins, J., 6 , 13 , 18 , 230

  new experimentalism, 85 , 100n7

  Newton, I., 91 , 120 – 124 , 126 – 128 , 156 – 157 , 160 , 224 , 229 – 230 , 296 , 369

  Neyman, J., 8 , 26 , 37 , 50 , 55 , 59 , 64 – 65 , 83 , 86 , 93 , 95 , 121 , 131 – 133 , 135 – 137 , 144 , 146 – 147 , 151 , 164 , 169 , 172 – 174 , 183 , 185n7 , 186 – 187 , 226 , 239 , 285 , 346 , 357 , 384 – 387 , 403 – 404 , 409 , 424 1933 paper with Pearson, 371 – 378

  and behavioral performance, 126 – 127 , 146 , 165 , 173 , 176 – 180 , 379 – 381

  on CIs and Fisher’ s fiducial intervals, 189 – 190 , 382 – 384 , 389 – 391

  criticizes Fisher, 341 – 342 , 389 – 390

  justification of statistical models, 111 – 113 , 298 – 300

  equates statistical tests and tests of significance, 174n1 , 176 , 386

  feud with R. A. Fish
er, 139 – 141 , 165 , 179 , 181 – 182 , 325 , 382 , 387 – 391 , 440

  modeling pest control, 299 – 300 , 421

  power analysis, 339 – 342 , 355 – 356

  in prison, 120

  “ The Problem of Inductive Inference” (1955), 110 , 341

  quarrel with Carnap, 108 , 110 – 111 , 114 , 147 , 341

  view of inference, 88 , 181 , 390 – 391 ; see also Neyman and Pearson

  Neyman, O. (Lola), 131 , 136 , 146n6

  Neyman and Pearson 1933 paper, 372

  begin collaboration, 121 , 371

  and Birnbaum, 55

  importance of error control, 37 , 50

  rescue Fisher, 386 – 387

  revolutionized statistics, 139

  and stopping rules, 50 ; see also miserable passages

  Neyman and Pearson (N-P) tests, 132 3 steps in, 131

  and balancing errors, 341

  behavioral vs. evidential interpretations, 65 , 181 , 378

  behavioristic, as verbal preference, 390

  criticisms of, 144 , 164 , 170 , 324 , 378 – 381

  development, 132 – 137

  and Fisher dovetail, 137 , 169 , 384 – 386

  gives rigor to Fisher, 132

  generic form of null and alternative hypotheses, 133

  Hacking on, 64 – 65 , 378

  ingredients, 129 – 130

  lambda criterion, 133

  lemma, 139n4

  meaning vs. application, 147 , 194

  and N-P-Wald behavioral-decision approach, 139 , 146 , 386 , 390

  and Popper’ s methodological falsification, 83

  P -values in, 138 , 175 , 180

  as severe tests, 142 – 146

  usual formulation, 137

  worse than useless, 136 ; see also statistical tests

  Neyman– Fisher break-up, 1935, 387 ; see also Fisher and Neyman wars

  Neyman– Pearson disagreements, 180 , 390 – 391

  Neyman– Pearson– Fisher battle 1955– 6 “ triad” , 388 – 390

  Neymanian interpretation of Fisher’ s fiducial distribution, 391

  nonsense and ludicrous, Senn on, 326 – 327 , 336 , 366

  Nordtvedt, K., 161

  Nordtvedt effect and Einstein equivalence principle, 160 – 162

  Normal distribution, 32 – 33 , 53n6 , 123 , 129 , 136 , 139 , 141 , 169 , 298 – 299 , 309 CI estimation of mean of, 191 , 196 , 248 , 382

 

‹ Prev