Statistical Inference as Severe Testing
Page 68
Fraser, D., 23 , 391 , 392n10 , 405 , 428 , 430
fraud busting, 21 , 54 , 284 – 285 role of error probabilities, 285
Frayn, M., 375 and Copenhagen , 375
Freedman, D., 272 , 305
Freiman, J, 339n4
frequentist matching priors, 254 – 256 , 264 , 400 , 411 , 428 – 430
frequentist evidence principle (FEV), 148 – 150 , 212 applications to Cox’ s taxonomy, 152 – 158
FEV(i): evidence of discrepancy, 149 , 152 – 153
FEV(ii): evidence for the absence of a discrepancy, 150 , 152 – 153 , 157 , 343
and severity (FEV/SEV), 153 , 159 , 352n10
frequentist statistics, 13 , 53 – 55 , 110 – 111 , 407 as pessimistic, 298
as piecemeal, 85 , 392 ; see also error statistics
frequentist theory of induction, 111 , 115
Freud, S., 76 , 92 , 96 , 280 – 281
Freudian metaphor for significance tests (Gigerenzer), 178 – 180
Fukushima, 69
Gaifman, H., 419
Gaito, J., 239
Garcia, V., 79
Geller, U., 31 , 235
Gellerized hypothesis, 31 – 34 , 38 , 235
Gelman, A., 27 , 83 , 94n4 , 105 , 228 , 234 , 256 , 289 , 304 – 305 , 359 , 361 , 401 , 403 , 407 , 417 , 419 , 432 – 435
General Theory of Relativity (GTR), 69 , 119 – 129 , 156 , 160 – 162 , 217 , 226 , 238 , 300 ; see also eclipse tests
getting beyond I’ m rubber and you’ re glue, 247 , 440 – 441
getting beyond statistics wars, xii , 9 , 11 – 12 , 178 , 283 , 436 , 444
Ghosh, J., 51 , 124 , 305n2 , 402 – 403 , 405 , 431 , 434
Gianotti, F., 210
Gibbons, J, 175 , 339
Gibney, E., 214
Giere, R., 285 , 422
Gigerenzer, G., 22 , 93 , 96 , 136 , 178 – 180 , 182 , 224 , 422
Gill, R., 284
Gillies, D., 299 , 421
Glymour, C., 51 , 69 , 70 , 73 , 156 , 227
Goldacre, B., 20 , 40 , 276
Goldman, A., 235
Good, I. J., 31 , 46 – 47 , 223 , 282 , 389 , 418
Goodman, S., 264 , 269 , 332 – 337 , 365 – 366
Gopnik, A., 79
Gorroochurn, P., 390n7
Gosset, W. (aka Student), 132 – 133 , 141 , 384
Greenland, S., 26 , 252 , 256 – 257 , 264 , 353 – 354 , 356 , 365 , 406
Grünwald, P., 253
Guggenheim Museum, 59
Gurney, J., 158
GWAS, genome-wide association studies, 277 , 294
Hacking, I, 30 , 32 , 37 , 45 , 64 – 65 , 136 , 222 – 223 , 378 – 380 , 417
Haidt, J., 7
Haig, B., 264
Hand, D., 166
Hannig, J., 392n10
Hardie, J., 291 – 292 , 294
Harlow, H., 100
Hawking, S., 157
Hawthorne, J, 69 – 70
Heisey, D., 355 – 358
Hempel, C., 90
Hendry, D., 279
Henkel, R., 239 , 274 , 280
Hennig, C., 228 , 407 , 419 , 435
heuristics and biases literature, 422
Higgs discovery, 10 , 189 , 202 – 207 , 210 – 215 , 217 , 331 , 405 ATLAS, 203 , 206 , 209 , 217
ATLAS and CMS, 202 , 211 , 213
null hypotheses in, 203 , 211
severity principle, implicit, 208 – 210
statistical controversies in, 202 – 217
subjective Bayesians on, 202 – 203 , 213 ; see also statistical fluctuations
Higgs, P., 202
Hinkley, D., 45 , 53 , 153n7 , 172 , 174 – 175 , 194 , 198 , 250 , 252 , 440
Hitchcock, C., 92n3
HIV/AIDS, 292
Hjort, N., 195 , 392n10
Hochberg, Y., 277 – 278 , 362n7
Hoenig, J., 355 – 358
honest hunter, 275
hormone replacement therapy (HRT), 20 – 21 , 230
hot air balloon, 8 , 23 , 29 , 395 , 397
Howson, C., 48 , 51n5 , 240 , 242 , 289 , 368 , 379 , 415 – 417 , 419
Hsu, J., 162n11
Hubbard, R., 175 , 179 , 183
Huber, P., 302
Huff, D., 3
Hume, D., 61 , 75
Hurlbert, S., 264
IID/NIID, 81 , 141 , 184 , 191 , 248 , 299 , 303 , 309 and bootstrap resampling, 305 – 307
figure of typical realization, 309
and GTR, 129
M-S testing of, 309 – 313
and omnibus tests, 154
and runs test, 310
implicationary assumption (i-assumption), 109 – 110 , 167 , 187 , 276 , 298 , 310 , 319 , 434
improper posteriors, 412 – 413
incompatibilism (between Fisher and N-P), 173 , 175 , 178 – 179 , 216n4 , 439 getting beyond, 178 – 182
incomplete statistic, 200
induction faulty analogy with deduction, 64 – 66 , 380
logical problem, 60 – 62
Peirce’ s rationale, 267
solving it now, 107 – 115 , 296
inductive behavior, 110 , 146 , 176 – 177 , 181 , 390 – 391 and de Finetti, 146 ; see also performance
inductive inference, 8 , 59 , 162 deductively invalid, 61
as ampliative, 64 , 307 , 399 , 409 , 427
frequentist statistics as a theory of, 147
vs. inductive behavior, 110
as severe testing, 92 , 108 – 109 , 209 , 302
as warranted, 10 , 50 , 65 , 87 , 301 , 443 ; see also induction
inductive logic, 32 , 379 evidential relationship, 275
Fisherian leanings, 391 ; see also confirmation theory
infant training study, 280 – 281
inference, defined, 65
insignificant results, 149 , 338 – 339 , 346 in cancer clusters, 158
in developmental economics, 292
and Fisher, 176 – 177
fortifying inferences: DNA, 282
in GTR, 125 – 126
in Higgs, 211 – 213
and infant training, 280 – 281
and power, 342 – 343 , 434
instrumentalism, 79 , 297
intentions, argument from, 48 – 50 , 268 – 270 , 283 , 286 , 320 data other than observed, 273
and stopping rules, 273 , 431 , 438 , see also Likelihood Principle
International Society of Bayesian Analysis (ISBA), 202
inverse inference, 24 , 383 , 386 – 387 , 404 , 409
Ioannidis, J., 4 , 20n2 , 92n2 , 293n12 , 361 – 362 , 364 – 366 , 368 – 370
irrelevant conjunctions, paradox of (tacking paradox), 69 – 70 and severity, 71 – 72
Isaac, story of, 368 – 369
Iyer, R., 7
Jastrow, J., 288n9
Jaynes, E., 403n4 , 419n9
Jefferys, W., 74
Jeffreys, H., 26 , 64 , 124 , 127 , 165 , 167 – 169 , 183 – 184 , 186 – 187 , 248 – 249 , 252 – 255 , 259n8 , 400 , 403n4 , 409 , 424
Jeffreys– Lindley paradox (Bayes/Fisher or Jeffreys/Fisher Disagreement), 239 , 241 – 242 , 250 – 251 , 253 – 255 , 256n7 , 284
Jeffreys’ tail area criticism, 168 – 170 , 206 , 332
Jenkins, G., 47 , 302 , 313
Johnson, V., 261 – 264 , 266 , 271 , 333
Kadane, J., 25 , 50 – 51 , 166 – 167 , 180 , 207 , 231 , 240 , 287 , 289 , 400
Kahneman, D., 97 , 99 , 422
Kaku, M., 119
Kalbfleisch, J., 180
Kass, R., 7 – 8 , 11 , 28 , 381 , 400 – 401 , 411n6 , 424 – 430
Kaye, D., 272
Kempthorne, O., 174 , 325
Kennefick, D.: re-analysis Sobral, 157
Kerridge, D., 47n4
Keynes, J., 90
Kheifets, L., 158
Kish, L., 280
Klioner, S., 157n9
known/old evidence problem, 51 , 405
Kruschke, J., 273
Kruse, M., 47 – 48
Kuebler, R., 339n4
Kuhn, T, xii, 88n1
kuru, 81 – 82 , 109 , 238
Kyburg, H., 64 , 232 , 415
Lad, F., 227
lady tasting tea, 16 – 19 , 31 , 169
Lakatos, I., 83 , 90 , 92 – 93 , 235
Lakens, D., 162n11 , 264
Lambert, C., 293 – 295
Lambert, P., 414
large n problem, 240 – 245 in confidence intervals, 245
and P -values, 240
and severe tester, 241 ; see also mountains out of molehills
Laudan, L., 77 , 88 , 129 , 225n1
Law of Large Numbers (LLN), 112 , 298 , 426 Binomial, 112
empirical law, 112 – 113
theoretical (mathematical) law, 112
Lazar, N., 17 , 215 – 216 , 395
Lebesque, H., 131
Leek, J., 7
Leeper, E., 158
Lehmann, E., 120 , 131 , 137 , 139 , 146 – 147 , 172 – 173 , 175 – 177 , 182 , 191 , 368 , 372 – 375 , 385n5 , 388 , 440
Letzter, R., 106
Levelt Committee (on Diederik Stapel), 78
Levi, I., 379
Liddell, T., 273
lift-off vs. drag-down, 15 , 66 , 86 , 217 , 307 , 380
light deflection, three predictions, 120 , see also eclipse tests
likelihood, 10 , 30 – 31
Likelihood, Law of (LL), 30 – 38 , 71 vs. error statistical, 41
excludes compound hypotheses, 35
vs. minimal severity requirement, 30
compared to significance test, 34 , 242
Likelihood Principle (LP), 30 , 41 , 44 – 46 , 172 , 438 – 439 and Bayesian inference, 398 , 431
and Birnbaum’ s “ proof” , 173
and irrelevance of predesignation, 269
and logical empiricism, 90
and P -values, 50 , 164 , 268 , 319
post-LP inference, 173
violated in bootstrap resampling, 305
violated in model testing, 303 – 304
weak, 148
and weak conditionality principle (WCP), 172
likelihood ratio (LR), 30 , 37 , 303 , 332 and Bayes factors, 184 , 247
exhaustive [LR], 68 , 70
hypothesis most generous to the alternative, H max , 252 , 254 , 260
maximizing data, 260
with the Normal, 53n6
using α /(1– β ), 337 , 366
likelihood ratio tests (lambda criterion), 133 – 135 , 139n4
limb-sawing logic, 167
Linda paradox, 422
Lindemann, F., 128
Lindley, D., 47 , 202 , 204 , 213 – 214 , 228 , 239 , 241 – 242 , 250 , 253 – 256 , 259 , 284 , 288 – 289 , 304 – 305 , 397 – 398 , 400 – 401 , 405 , 410 , 413 , 417 – 418
Lindley’ s “ Philosophy of Statistics” , 397 criticism of, 398 – 400
Lindman, H., 41 – 43 , 45 , 49 – 50 , 248 , 252 , 256 , 260 , 269
linear regression model (LRM), 309 , 312 dynamic (DLRM), 316
Little, R., 431
Liu, C., 392n10
live exhibits drill prompt (how tail areas exaggerate), 335
final, 444
Macho Men, 104 – 105
Revisiting Popper’ s Demarcation of Science, 88
severity when incalculable, 200
Lodge, O., 127 , 128 , 230
logical empiricism/positivism, 10 , 64 , 90 and Big Data, 229
naïve, 79
and probabilism, 65
curse of, 224 – 225 , 227
and verificationism, 229
Loken, E., 105
Lombardi, C., 264
Longino, H., 236
look elsewhere effect (LEE), 210 – 211 and Bayesian analysis, 54
Louis, T., 43
Lykken, D., 367
Machery, E., 156
Madigan, D., 305 , 405
Maher, P., 71
Marcus, G., 79
Marewski, J., 136 , 182 , 224
Markov dependency, 314 , 316
Martin, R., 392n10
Marx, K., 76 – 77 , 125
Mason, J., 239 , 250
maximum likelihood estimator, 133
hypothesis, 271
Mayo, D., 26 – 27 , 46 – 48 , 59 – 60 , 74 , 82 , 84 , 86 , 88n1 , 91 , 92n3 , 95 , 114 , 128 , 132 , 146n5 , 147 – 150 , 152 – 154 , 159 , 160n10 , 162 , 164 , 171 – 173 , 180 , 183 , 199 – 200 , 221 , 233 – 234 , 237 , 239 , 270 , 273 , 281 , 286 , 298 , 303n1 , 305 , 308 , 339 , 352n10 , 363 , 367 , 369n10 , 385 , 399n2 , 409 , 416 , 428 , 432 – 433
McCloskey, D., 229 , 230 , 272n2 , 330 – 331
mean heterogeneity, 313
measurement, operationalist, 100 , 182 in psychology, 103
Meehl, P., 92 – 95 , 125 , 294 , 367 – 368
Meng, X. L., 433 , 435n8
meta-analysis, 279 , 292 , 365
meta-methodology, 9 – 10 , 54
meta-research, 9 , 98 , 106 – 107 Meta-Research Innovative Center at Stanford (METRICS), 269n1
methodological falsification, 82 – 83 , 95 , 377 ; see also Popper, K.
methodological probability, 80 , 124 , 383 – 384 and error probability, 79 – 80 , 170 , 216
in Higgs episode, 207
metric theory of gravity, 161
Michell, J., 100 – 101
microarray, 53 , 277 , 293
Mignard, F., 157n9
Mill, J. S., 89 , 296
Miller, J., 236
miserable passages statistical theater (“ Les Miserables Citations” ), 372 – 375
inferential construal, 375 – 378
interpreted by contemporaries, 378 – 381
misspecification test/model testing (M-S) A. Spanos’ approach, 308 – 320 ; see also auditing
error fixing in, 311
in error statistics, 308
independent of primary question, 319
irreplication and violated assumption, 320
and model building, 319
vs. model selection, 317
non-exhaustive, 154 , 312
ordering in, 319
and Peirce, 307
and predesignation, 319
residuals as key, 303 , 310 , 314
role of significance tests, 298 , 301 , 309 , 433 , 441 ; see also probabilistic reduction
models, all are false, 4 , 296 – 297
modus tollens , 84 unsound example, 126
statistical, 94 , 351n9
Molière, 208
Morey, R., 363
Morrison, D., 239 , 274 , 280
mountains out of molehills fallacy 1st form (large n problem), 240 , 243n2
2nd form, 144 , 266 , 326 , 359 , 366
multiverse analysis, 105
Munafò, M., 106 – 107
Museums of Statistics, 30 , 46 , 397
Statistical Science and Philosophy of Science, 59 , 119 , 131 , 156 , 189
on Power Peninsula, 323 – 324
Musgrave, A., 90 , 129
National Women’ s Health Network, 21
Nelder, J., 399 , 421
Nelson, L., 43 , 237 , 270
Nevins, J., 6 , 13 , 18 , 230
new experimentalism, 85 , 100n7
Newton, I., 91 , 120 – 124 , 126 – 128 , 156 – 157 , 160 , 224 , 229 – 230 , 296 , 369
Neyman, J., 8 , 26 , 37 , 50 , 55 , 59 , 64 – 65 , 83 , 86 , 93 , 95 , 121 , 131 – 133 , 135 – 137 , 144 , 146 – 147 , 151 , 164 , 169 , 172 – 174 , 183 , 185n7 , 186 – 187 , 226 , 239 , 285 , 346 , 357 , 384 – 387 , 403 – 404 , 409 , 424 1933 paper with Pearson, 371 – 378
and behavioral performance, 126 – 127 , 146 , 165 , 173 , 176 – 180 , 379 – 381
on CIs and Fisher’ s fiducial intervals, 189 – 190 , 382 – 384 , 389 – 391
criticizes Fisher, 341 – 342 , 389 – 390
justification of statistical models, 111 – 113 , 298 – 300
equates statistical tests and tests of significance, 174n1 , 176 , 386
feud with R. A. Fish
er, 139 – 141 , 165 , 179 , 181 – 182 , 325 , 382 , 387 – 391 , 440
modeling pest control, 299 – 300 , 421
power analysis, 339 – 342 , 355 – 356
in prison, 120
“ The Problem of Inductive Inference” (1955), 110 , 341
quarrel with Carnap, 108 , 110 – 111 , 114 , 147 , 341
view of inference, 88 , 181 , 390 – 391 ; see also Neyman and Pearson
Neyman, O. (Lola), 131 , 136 , 146n6
Neyman and Pearson 1933 paper, 372
begin collaboration, 121 , 371
and Birnbaum, 55
importance of error control, 37 , 50
rescue Fisher, 386 – 387
revolutionized statistics, 139
and stopping rules, 50 ; see also miserable passages
Neyman and Pearson (N-P) tests, 132 3 steps in, 131
and balancing errors, 341
behavioral vs. evidential interpretations, 65 , 181 , 378
behavioristic, as verbal preference, 390
criticisms of, 144 , 164 , 170 , 324 , 378 – 381
development, 132 – 137
and Fisher dovetail, 137 , 169 , 384 – 386
gives rigor to Fisher, 132
generic form of null and alternative hypotheses, 133
Hacking on, 64 – 65 , 378
ingredients, 129 – 130
lambda criterion, 133
lemma, 139n4
meaning vs. application, 147 , 194
and N-P-Wald behavioral-decision approach, 139 , 146 , 386 , 390
and Popper’ s methodological falsification, 83
P -values in, 138 , 175 , 180
as severe tests, 142 – 146
usual formulation, 137
worse than useless, 136 ; see also statistical tests
Neyman– Fisher break-up, 1935, 387 ; see also Fisher and Neyman wars
Neyman– Pearson disagreements, 180 , 390 – 391
Neyman– Pearson– Fisher battle 1955– 6 “ triad” , 388 – 390
Neymanian interpretation of Fisher’ s fiducial distribution, 391
nonsense and ludicrous, Senn on, 326 – 327 , 336 , 366
Nordtvedt, K., 161
Nordtvedt effect and Einstein equivalence principle, 160 – 162
Normal distribution, 32 – 33 , 53n6 , 123 , 129 , 136 , 139 , 141 , 169 , 298 – 299 , 309 CI estimation of mean of, 191 , 196 , 248 , 382