Statistical Inference as Severe Testing
Page 67
Bayes factor, 37 , 184 , 253 – 255 , 261 – 264 , 305 , 320 , 335 – 336 aka post-experimental rejection ratio, 338
Bayes’ Theorem, 24 , 61 and the Likelihood Principle, 45 – 46 , 398
Bayes/Fisher disagreement, see Jeffreys– Lindley paradox
Bayesian foundations Bayes/frequentist, 24 – 26
classical subjective Bayesians and criticisms, 397 – 400
current state of play, 23 , 395 – 397 , 400
Dutch book, 415
error statistical basis, 27
need new, 432
and wash-out theorems, 231 – 232
Bayesian incoherence and violating Bayes’ Rule, 411 , 415 , 419 , 421 betting incoherency, 417
empirical studies of, 422
replying to axiomatic proofs, 421
temporal, 423
Bayesian inference as conditional on specific data, 51 , 183 , 188 , 431
role of probability flipped, 207 , 405 – 407
Bayesian model checking, 27 – 28 , 304 – 305 , 432 – 436
Bayesian priors and background information, 413
changing priors, 417 – 419
data generating vs. state of knowledge, 403
vs. degree of plausibility/belief, 406
default/non-subjective, 25 , 400
empirical (frequentist) Bayes, 185 , 404
gallimaufry, 402 – 403
hierarchy of, 433
and objectivity, 232 – 233
subjective elicitations, 410 – 411
Bayesian probabilists casual, 412
classical subjective, and criticisms of, 228 , 397 – 400
default/non-subjective Bayesians, 25 , 51 , 64 , 230 , 262 , 417
vs. error statistical/falsificationist Bayesian, 432 – 435
family feuds, 25 , 409
grace and amen, 413
ironic/bad faith, 412
pragmatic, 424 – 426
Bayesian statistics betting, 257 , 397 , 405 – 408 , 415 , 421
Bayes hypothesis test, 183 , 261
and known data, 417
notions of probability, 396 , 402
Bayesian vs. frequentist, 24 – 26 empathy with frequentists, 401
orthogonal, 400
unifications, 25 – 28 , 183 – 187 , 396 , 409 – 415
Beall, G., 299
begging the question, 61 – 62 , 184 , 247 , 399n2 , 422
Bem, D., 283 – 284
Benjamin, D., 264 , 266 , 337 – 338
Benjamini, Y., 277 – 278 , 362n7
Berger, J., 23 , 26 , 49 , 54 , 74 , 144 , 175 – 176 , 183 – 187 , 199 , 230 – 231 , 248 – 252 , 258 – 260 , 305 , 337 – 338 , 400 , 409 – 415 , 417 – 418 , 426 , 430 – 431 , 434
Berger, R., 162n11 , 248 , 251 – 252 , 259 , 264 , 303 – 304
Bernardo, J., 198 , 255 – 256 , 259 , 400 , 402 – 403 , 411 , 431n5
Bernoulli model, 17
Bernoulli trials, 31 , 111 , 147 , 186 , 254 , 275 , 298 , 303
Berry, S., 287 , 289
Bertrand– Borel debate, 372 – 382
Bertrand, J., 372 – 374
Best Bet and Breakfast, 350
Beyond the Standard Model (BSM) physics, 212 – 214
bias, 20 – 21
biasing selection effects, see selection effects, biasing
Big Data/machine learning, 123 , 229 , 267 , 294 , 395 , 397 and experimental design, 293
and positivism, 229
big picture inference, 127 , 226 , 279 , 423
Binomial distribution, 111 – 114 , 139 , 147 , 285 , 298 , 320 , 386
Binomial vs. negative Binomial, 303 , 411n6
Birnbaum, A., 38 , 53 , 55 , 170 , 172 – 173
Black, L., 294 – 295
Bogen, J., 121
Bonferroni correction, 275 , 294 and false discovery rates, 277 – 278
Boole, G., 387
bootstrap resampling, 305 – 307
Borel, É., 131 , 372 – 375 , 377
Bowley, A., 189 – 190 , 387
Box, G., 27 , 297 , 301 – 304 , 313 , 405n5 , 421 , 433
Box, J. (Fisher), 120
Braithwaite, R., 65
Brans– Dicke theory, 161 adjustable parameters, 74
Breiman, L., 414
Bristol-Roach, M., 16 – 18 , 31
Brown, E., 7 – 8 , 381 , 427
Buchen, L., 124n2
Buehler, R., 390n8
Burgman, M., 358
Burnham, K., 159 , 162 , 318 – 319
Byrd, J., 358
Calibation (lounge), 350
calibration, 54 , 396 and pragmatic Bayesians, 305 , 424 ; see also error probabilities
cancer clusters, power lines, 157 – 158
capability and severity, 189 , 193 – 195
and confidence intervals 191 – 192
of methods 300
capitalizing on chance, 274 – 275
Carlin, B., 43
Carlin, J., 264 , 289 , 359 – 361
Carnap, R, 59 , 63 – 64 , 67 , 108 , 110 – 111 , 114 , 147 , 341 , 400 , 416
Cartlidge, E., 213
Cartwright, N., 291 – 292 , 294
Casella, G., 248 , 251 – 252 , 259 , 264 , 303 – 304
Castelvecchi, D., 214
catchall, hypothesis (~H ) and factor (Pr(x |~H )), 68 , 84 , 213 , 302 and open-endedness, 399 , 420 – 421 , 443
Central Limit Theorem (CLT), 186n7 , 298 and water plant accident, 142
and relative frequencies, 299
CERN, 202
Center for Open Science, 97
Chalmers, A., 129
Chalmers, T., 339n4
Chandra Sekar, C., 269 , 376
Charpentier, E., 229
Cherkassky, V., 79
chestnuts and howlers of CIs, 198 – 200 Jeffreys’ tail area, 168 – 170
and large n , 242 – 243
of power, 325
of selection effects, 276 – 277 , 282
of tests, 165 – 172
chi square, 139
chutzpah, 12 , 221
clinically relevant/irrelevant difference, 326 – 327
Cobb, A., 235 – 236
Cochrane collaboration, 292
Cohen, J., 323 – 324 , 338 – 339 , 340 , 356
Colquhoun, D., 277n4 , 289 , 365 – 366
comparativism/comparative accounts, 13 , 30 , 33 , 36 , 82 , 261 , 332 , 334 – 336 , 441 and falsification, 82 , 318
vs. significance test, 243 , 268 , 284 ; see also Likelihood, Law of
COMpare Team, 40
conditional (if– then) claim, 150 , 167
conditional probability, 24 , 35 , 37 , 46 , 405 vs. an error probability, 205
illicit transposing, 208 , 331 , 363
conditioning (D. R. Cox) to achieve relevance, 173 , 200
to separate from nuisance parameters, 385
confidence concept (Conf) Birnbaum, 55 , 170
Mayo extension, 55
Confidence Court Inn, 350
confidence distribution, 195 , 358 , 442 as fiducial, 382 , 391
and unifying Bayesian, fiducial, frequentist (BFF), 391 , 435n8
confidence intervals (CIs), 18 , 153 , 189 , 356 , 427 – 428 and Bayes– Frequentist agreement, 425
duality with tests, 190 – 193 , 198 , 244 , 357 , 442
and fiducial, 390
generic vs. specific (particular), 191 , 246
in the Higgs, 211
performance construal, 244 , 384
reforming, 193 , 244 – 246 , 358
severity interpretation, 198 , 346 , 429 , 442
vacuous (chestnut), 198 – 199
warranted by severity analysis, 358
confirmation theory, 59 Carnapian, 59 , 63 – 64
C-function (c † ), 63
incremental vs. absolute, 66 – 67
inductive intuitions, 64
as fit measures, 72 – 73
logical probabilities, 63
paradoxes of, 73
positive instance, 60
Popper against, 68 ; see also Carnap
Consolidated Standards of Reporting Trials (CONSORT), 40 , 49
Coombes, K., 6 , 18
counterfactual reasoning, 52 , 110 , 178 in array of models, 300
and possible worlds, 429n2
and randomization, 287
and severity reasoning, 195 – 196 , 245 , 342 , 429
Cousins, R., 203 , 210 – 212 , 216
Cox, D., 27 , 45 , 47 , 53 – 54 , 59 , 93 – 94 , 132 , 147 – 154 , 158 – 159 , 162 , 164 – 165 , 171 – 175 , 180 , 194 – 195 , 198 – 200 , 221 , 231 , 233 – 234 , 237 , 248 , 250 – 253 , 281 – 282 , 288 , 296 , 298 , 303n1 , 343 , 352n10 , 371 , 382 – 386 , 396 , 398 , 401 , 403 , 408 – 409 , 412 , 418 , 428 – 429 , 440
Cox’ s taxonomy (of test hypotheses), 150 , 310 – 312 dividing nulls, 154 , 253
fully embedded null, 152 – 153
nested alternative, 153 , 312
omnibus vs. focused test, 154 , 310
substantively based hypotheses, 157 – 158
credible intervals, 426 , 428
CRISPR, 229
critical region (rejection region), 134 , 140 , 153n7 and Bayesian tests, 262
relevant for inference, 169 – 170
similar regions, 386n6
crud factor, 367
Crupi, V., 72
Cumming, G., 244 – 246 , 293n12 , 354
c α (critical value) for standard normal, 191 use of > or ≥ , 138 , 143 , 192 , 197
Dawid, A. P., 282 , 363n8 , 399 , 412 – 413 , 419
de Finetti, B., 66 , 146 – 147 , 223 , 227 , 414
Deaton, A., 291
default/non-subjective priors, 25 , 184 – 187 , 402 , 431 non-informative priors, 400
relative to parameters of interest 411 – 412 , 415
violate Likelihood Principle, 431
deep learning, 79
Delampady, M., 51 , 124 , 252n4 , 305n2 , 402 – 403 , 405 , 431 , 434
Demarcation Problem (between science and pseudoscience), 59 , 75 – 76 , 90 , 106 , 235 and GTR, 128
and Popper, 75 – 78
severe tester and, 88 – 89 , 222
Diaconis, P., 235
diagnostic screening (DS) view of tests, 185 , 361 – 363 , 408 and Bayes’ rule, 363
bias adjustments in, 364
dangers of, 369 – 370
false finding rate (FFR), 362 , 370
point against point, 366
positive predictive value (PPV), 363
and probabilistic instantiation fallacy, 367
sensitivity (SENS) in, 363
specificity (SPEC) in, 364
Dicke, R., 161
Dienes, Z., 319
dirty hands argument, 224
discrepancy, hypothesis as, 130 , 143 , 151 , 241 our convention on, 240 , 263
DNA match, 281 – 282
Dominus, S., 104
double counting, 92 , 269
Doudna, J., 229
Dr. Hack, court case, 267 – 268 , 271 – 272
Draper, D., 405
Duflo, E., 290 – 292
Duhem, P., 83 – 85
Duhem’ s Problem, 83 – 89 , 107 , 154 , 311 , 385 , 435
Dupré, J., 78
Durante, K., 105
Durbin– Watson tests, 311 , 314
Dyson, E., 120 , 156
Earman, J., 69 , 74 , 128 , 156
eclecticism in statistics, 27 , 397 , 424
eclipse tests, 1919, 119 , 121 – 125 Barnard on, 126 , 139
Einstein effect, 122 , 125
H. Jeffreys on, 124 , 127
mirror distortion controversy, 125 , 156 – 157
Newton saving hypotheses, 127 – 128
Sobral and Principe results, 123 – 124 ; see also General Theory of Relativity (GTR)
ecumenism in statistics, 27 , 301
Eddington, A., 119 – 122 , 124 , 126 , 156 , 226 , 369
Edwards, A., 32
Edwards, Lindman, and Savage (E, L, & S), see individual authors
Edwards, W., 41 – 43 , 45 , 49 – 50 , 248 , 252 , 256 , 260 , 269
effect size (ES), population (discrepancy) and observed, 340
efficient tests of statistical hypotheses, 374 – 378 , 421 and power, 377 – 378
Efron, B., 6 , 24 , 85 , 279n6 , 298 , 305 – 306 , 391n9 , 395 , 397 , 399 – 400 , 413
Einstein, A., 119 – 124 , 127 – 128 , 224 , 226 , 229
Einstein’ s Café, 154
Eisenhart, C., 404
either or question/horn, 270 , 282
Ellis, P., 356
empirical Bayes, 404 Lindley on, 405 ; see also Robbins, H. , Efron, B.
en quelque sorte remarquable , 373 – 375
Englert, F., 202
enumerative induction (EI)/(straight rule of induction), 61 Popper on, 75
Carnap’ s, 111
and Bayes Theorem (B-boost), 63
epistemic probability, 195
epistemology, 4 ; see also normative epistemology
equivalence principle (weak), 161 Einstein (self-gravitating bodies), 161
equivalence testing, 160n10
error probabilities, xii , 9 , 20 , 26 for discrediting cherry-picking, 283
epistemic interpretation, 26 , 429
inferential use of, 14 , 49 , 164
and Likelihoodism, 41 , 48
violate Likelihood Principle, 49 – 51 , 164 , 270 , 333 , 431
meaning vs. application, 147 , 194
performance construal, 13 , 38 , 140 , 429
and preregistration, 286 , 320 , 439
P -value as, 175 , 183
rubbing off construal, 194 , 390 , 429
and stringency, 382
for solving induction, 114 – 115
Type I and Type II, 9 , 137 – 140
understanding, 174
error probability1 vs. error probability2 , 183 – 187 , 231 , 338
error statistics, xii , 9 error probes, 17
foundations for Bayesian tools, 27 – 28
vs. logics of statistical inference, 32 , 65 , 438
severe testing as a proper subset of, 55
blog, 418 ; see also Likelihood, Law of
ESP, 31 , 283 – 284 degenerating program/falsified, 235
explaining a known effect lost key, 281 – 282
in eclipse results, 127 , 157
in testing assumptions, 320
fallacies of rejection spurious P -value, isolated result, magnitude error, 3 , 22 , 94
high power = high hurdle, 332 ; see also mountains out of molehills
fallacy of non-rejection/insignificance, 3 , 152 , 339 , 353 ; see also SIN
false discovery rate (FDR) (Benjamini and Hochberg), 277 – 278 , 363n7
falsification of alternatives in significance tests, 158 – 162
and anomaly, 83
asymmetry w/confirmation, 60 , 81 , 125
of Bayesian priors, 417 – 420
and Big Data, 294 – 295
of central dogma of biology, 81 – 82 , 85
confusion about, 125
and eclipse tests, 125 – 127
error statistics as direct, 436 , 443
falsifying hypotheses, 83 , 121
and Fisher, 132
in the Higgs, 212
and i-assumptions, 298
of inquiry, 101 – 102
methodological, 82 – 83
and non-replication in GWAS, 294
and Popper, 75 – 89
vs. probabilism, 27 , 82
as provisional, 235
of replication, 99 , 104
statistical, 152 ; see also Duhem’ s Problem
family-wise error rate (FWER), 275 , 278 – 279
Feddersen, A., 390n8
Feyerabend, P., 228
Feynman, R., 3 , 10 , 23 , 89
fiducial c percent limit, 382
frequency distribution, 383
inference, 382 – 384 , 389
vs. i
nverse inference, 383
Island(s), 371 , 382 , 391 ; see also Fisher
file drawer, 98 , 176 , 212 , 292
FIRST (Fairly Intimately Related to the Statistical Test) interpretations, 150 – 151 , 174n2 , 234 and Cox’ s Taxonomy, 150 – 158
Fisher and Neyman wars, 141 , 165 , 386 – 388 contrast to early compatibility, 140 , 384 , 387
deconstructing, 371
pathologies of, 390 – 391
similarly behavioristic, 176 – 177 ; see also incompatibilism
Fisher, R. A., 4 – 5 , 8 , 16 , 26 , 46 , 59 , 75 , 93 , 95 , 120 , 126 , 130 – 131 , 134 – 137 , 147 , 151 , 159 , 161 , 166 – 170 , 173 – 190 , 200 , 250 , 266 , 281 , 286 , 290 , 303 , 323 – 325 , 331 , 335 , 339 , 368 , 371 , 374 – 391 , 398 , 409 , 424 , 440 – 441 , 444 and autopsy, 294
on background information, 233 – 234
challenged the old guard, 386 – 387
criticized by Neyman, 341 , 377
relations with Neyman, 139 , 146 , 165 , 181 , 190 , 386 – 387 ; see also Neyman, feud with Fisher
fiducial probability and Neyman’ s performance, 382 – 384 , 384n4 , 389 – 391
and fraud busting, 54 , 284
and induction, 66 , 86
against isolated results, 4 , 22 , 83 , 204 , 362 , 438
and making theories elaborate, 237
political principle, 5
on priors, 232 , 383 , 402 , 404 – 405
shifting views on behavioristic performance, 176 , 182 , 390
statistical model, 132
tribe, 146 – 147 , 187
on UMP tests, 386 , 390
Fisher’ s testing principle (against isolated results), 4 , 204
Fisherian test statistic criteria, 132 , 167 , 384
Fitelson, B., 68 – 71
flexibility, rewards, and bias hypothesis, 98
Folks, J., 325
forking paths, 105
formal epistemology, 59 and intuitive principles of evidence, 73
Forster, M., 318
Foucault, M., 228
foundations, decoupling from traditional, 164 , 401 – 402 ; see also Bayesian foundations