Statistical Inference as Severe Testing

Home > Other > Statistical Inference as Severe Testing > Page 67
Statistical Inference as Severe Testing Page 67

by Deborah G Mayo


  Bayes factor, 37 , 184 , 253 – 255 , 261 – 264 , 305 , 320 , 335 – 336 aka post-experimental rejection ratio, 338

  Bayes’ Theorem, 24 , 61 and the Likelihood Principle, 45 – 46 , 398

  Bayes/Fisher disagreement, see Jeffreys– Lindley paradox

  Bayesian foundations Bayes/frequentist, 24 – 26

  classical subjective Bayesians and criticisms, 397 – 400

  current state of play, 23 , 395 – 397 , 400

  Dutch book, 415

  error statistical basis, 27

  need new, 432

  and wash-out theorems, 231 – 232

  Bayesian incoherence and violating Bayes’ Rule, 411 , 415 , 419 , 421 betting incoherency, 417

  empirical studies of, 422

  replying to axiomatic proofs, 421

  temporal, 423

  Bayesian inference as conditional on specific data, 51 , 183 , 188 , 431

  role of probability flipped, 207 , 405 – 407

  Bayesian model checking, 27 – 28 , 304 – 305 , 432 – 436

  Bayesian priors and background information, 413

  changing priors, 417 – 419

  data generating vs. state of knowledge, 403

  vs. degree of plausibility/belief, 406

  default/non-subjective, 25 , 400

  empirical (frequentist) Bayes, 185 , 404

  gallimaufry, 402 – 403

  hierarchy of, 433

  and objectivity, 232 – 233

  subjective elicitations, 410 – 411

  Bayesian probabilists casual, 412

  classical subjective, and criticisms of, 228 , 397 – 400

  default/non-subjective Bayesians, 25 , 51 , 64 , 230 , 262 , 417

  vs. error statistical/falsificationist Bayesian, 432 – 435

  family feuds, 25 , 409

  grace and amen, 413

  ironic/bad faith, 412

  pragmatic, 424 – 426

  Bayesian statistics betting, 257 , 397 , 405 – 408 , 415 , 421

  Bayes hypothesis test, 183 , 261

  and known data, 417

  notions of probability, 396 , 402

  Bayesian vs. frequentist, 24 – 26 empathy with frequentists, 401

  orthogonal, 400

  unifications, 25 – 28 , 183 – 187 , 396 , 409 – 415

  Beall, G., 299

  begging the question, 61 – 62 , 184 , 247 , 399n2 , 422

  Bem, D., 283 – 284

  Benjamin, D., 264 , 266 , 337 – 338

  Benjamini, Y., 277 – 278 , 362n7

  Berger, J., 23 , 26 , 49 , 54 , 74 , 144 , 175 – 176 , 183 – 187 , 199 , 230 – 231 , 248 – 252 , 258 – 260 , 305 , 337 – 338 , 400 , 409 – 415 , 417 – 418 , 426 , 430 – 431 , 434

  Berger, R., 162n11 , 248 , 251 – 252 , 259 , 264 , 303 – 304

  Bernardo, J., 198 , 255 – 256 , 259 , 400 , 402 – 403 , 411 , 431n5

  Bernoulli model, 17

  Bernoulli trials, 31 , 111 , 147 , 186 , 254 , 275 , 298 , 303

  Berry, S., 287 , 289

  Bertrand– Borel debate, 372 – 382

  Bertrand, J., 372 – 374

  Best Bet and Breakfast, 350

  Beyond the Standard Model (BSM) physics, 212 – 214

  bias, 20 – 21

  biasing selection effects, see selection effects, biasing

  Big Data/machine learning, 123 , 229 , 267 , 294 , 395 , 397 and experimental design, 293

  and positivism, 229

  big picture inference, 127 , 226 , 279 , 423

  Binomial distribution, 111 – 114 , 139 , 147 , 285 , 298 , 320 , 386

  Binomial vs. negative Binomial, 303 , 411n6

  Birnbaum, A., 38 , 53 , 55 , 170 , 172 – 173

  Black, L., 294 – 295

  Bogen, J., 121

  Bonferroni correction, 275 , 294 and false discovery rates, 277 – 278

  Boole, G., 387

  bootstrap resampling, 305 – 307

  Borel, É., 131 , 372 – 375 , 377

  Bowley, A., 189 – 190 , 387

  Box, G., 27 , 297 , 301 – 304 , 313 , 405n5 , 421 , 433

  Box, J. (Fisher), 120

  Braithwaite, R., 65

  Brans– Dicke theory, 161 adjustable parameters, 74

  Breiman, L., 414

  Bristol-Roach, M., 16 – 18 , 31

  Brown, E., 7 – 8 , 381 , 427

  Buchen, L., 124n2

  Buehler, R., 390n8

  Burgman, M., 358

  Burnham, K., 159 , 162 , 318 – 319

  Byrd, J., 358

  Calibation (lounge), 350

  calibration, 54 , 396 and pragmatic Bayesians, 305 , 424 ; see also error probabilities

  cancer clusters, power lines, 157 – 158

  capability and severity, 189 , 193 – 195

  and confidence intervals 191 – 192

  of methods 300

  capitalizing on chance, 274 – 275

  Carlin, B., 43

  Carlin, J., 264 , 289 , 359 – 361

  Carnap, R, 59 , 63 – 64 , 67 , 108 , 110 – 111 , 114 , 147 , 341 , 400 , 416

  Cartlidge, E., 213

  Cartwright, N., 291 – 292 , 294

  Casella, G., 248 , 251 – 252 , 259 , 264 , 303 – 304

  Castelvecchi, D., 214

  catchall, hypothesis (~H ) and factor (Pr(x |~H )), 68 , 84 , 213 , 302 and open-endedness, 399 , 420 – 421 , 443

  Central Limit Theorem (CLT), 186n7 , 298 and water plant accident, 142

  and relative frequencies, 299

  CERN, 202

  Center for Open Science, 97

  Chalmers, A., 129

  Chalmers, T., 339n4

  Chandra Sekar, C., 269 , 376

  Charpentier, E., 229

  Cherkassky, V., 79

  chestnuts and howlers of CIs, 198 – 200 Jeffreys’ tail area, 168 – 170

  and large n , 242 – 243

  of power, 325

  of selection effects, 276 – 277 , 282

  of tests, 165 – 172

  chi square, 139

  chutzpah, 12 , 221

  clinically relevant/irrelevant difference, 326 – 327

  Cobb, A., 235 – 236

  Cochrane collaboration, 292

  Cohen, J., 323 – 324 , 338 – 339 , 340 , 356

  Colquhoun, D., 277n4 , 289 , 365 – 366

  comparativism/comparative accounts, 13 , 30 , 33 , 36 , 82 , 261 , 332 , 334 – 336 , 441 and falsification, 82 , 318

  vs. significance test, 243 , 268 , 284 ; see also Likelihood, Law of

  COMpare Team, 40

  conditional (if– then) claim, 150 , 167

  conditional probability, 24 , 35 , 37 , 46 , 405 vs. an error probability, 205

  illicit transposing, 208 , 331 , 363

  conditioning (D. R. Cox) to achieve relevance, 173 , 200

  to separate from nuisance parameters, 385

  confidence concept (Conf) Birnbaum, 55 , 170

  Mayo extension, 55

  Confidence Court Inn, 350

  confidence distribution, 195 , 358 , 442 as fiducial, 382 , 391

  and unifying Bayesian, fiducial, frequentist (BFF), 391 , 435n8

  confidence intervals (CIs), 18 , 153 , 189 , 356 , 427 – 428 and Bayes– Frequentist agreement, 425

  duality with tests, 190 – 193 , 198 , 244 , 357 , 442

  and fiducial, 390

  generic vs. specific (particular), 191 , 246

  in the Higgs, 211

  performance construal, 244 , 384

  reforming, 193 , 244 – 246 , 358

  severity interpretation, 198 , 346 , 429 , 442

  vacuous (chestnut), 198 – 199

  warranted by severity analysis, 358

  confirmation theory, 59 Carnapian, 59 , 63 – 64

  C-function (c † ), 63

  incremental vs. absolute, 66 – 67

  inductive intuitions, 64

  as fit measures, 72 – 73

  logical probabilities, 63

  paradoxes of, 73

 
positive instance, 60

  Popper against, 68 ; see also Carnap

  Consolidated Standards of Reporting Trials (CONSORT), 40 , 49

  Coombes, K., 6 , 18

  counterfactual reasoning, 52 , 110 , 178 in array of models, 300

  and possible worlds, 429n2

  and randomization, 287

  and severity reasoning, 195 – 196 , 245 , 342 , 429

  Cousins, R., 203 , 210 – 212 , 216

  Cox, D., 27 , 45 , 47 , 53 – 54 , 59 , 93 – 94 , 132 , 147 – 154 , 158 – 159 , 162 , 164 – 165 , 171 – 175 , 180 , 194 – 195 , 198 – 200 , 221 , 231 , 233 – 234 , 237 , 248 , 250 – 253 , 281 – 282 , 288 , 296 , 298 , 303n1 , 343 , 352n10 , 371 , 382 – 386 , 396 , 398 , 401 , 403 , 408 – 409 , 412 , 418 , 428 – 429 , 440

  Cox’ s taxonomy (of test hypotheses), 150 , 310 – 312 dividing nulls, 154 , 253

  fully embedded null, 152 – 153

  nested alternative, 153 , 312

  omnibus vs. focused test, 154 , 310

  substantively based hypotheses, 157 – 158

  credible intervals, 426 , 428

  CRISPR, 229

  critical region (rejection region), 134 , 140 , 153n7 and Bayesian tests, 262

  relevant for inference, 169 – 170

  similar regions, 386n6

  crud factor, 367

  Crupi, V., 72

  Cumming, G., 244 – 246 , 293n12 , 354

  c α (critical value) for standard normal, 191 use of > or ≥ , 138 , 143 , 192 , 197

  Dawid, A. P., 282 , 363n8 , 399 , 412 – 413 , 419

  de Finetti, B., 66 , 146 – 147 , 223 , 227 , 414

  Deaton, A., 291

  default/non-subjective priors, 25 , 184 – 187 , 402 , 431 non-informative priors, 400

  relative to parameters of interest 411 – 412 , 415

  violate Likelihood Principle, 431

  deep learning, 79

  Delampady, M., 51 , 124 , 252n4 , 305n2 , 402 – 403 , 405 , 431 , 434

  Demarcation Problem (between science and pseudoscience), 59 , 75 – 76 , 90 , 106 , 235 and GTR, 128

  and Popper, 75 – 78

  severe tester and, 88 – 89 , 222

  Diaconis, P., 235

  diagnostic screening (DS) view of tests, 185 , 361 – 363 , 408 and Bayes’ rule, 363

  bias adjustments in, 364

  dangers of, 369 – 370

  false finding rate (FFR), 362 , 370

  point against point, 366

  positive predictive value (PPV), 363

  and probabilistic instantiation fallacy, 367

  sensitivity (SENS) in, 363

  specificity (SPEC) in, 364

  Dicke, R., 161

  Dienes, Z., 319

  dirty hands argument, 224

  discrepancy, hypothesis as, 130 , 143 , 151 , 241 our convention on, 240 , 263

  DNA match, 281 – 282

  Dominus, S., 104

  double counting, 92 , 269

  Doudna, J., 229

  Dr. Hack, court case, 267 – 268 , 271 – 272

  Draper, D., 405

  Duflo, E., 290 – 292

  Duhem, P., 83 – 85

  Duhem’ s Problem, 83 – 89 , 107 , 154 , 311 , 385 , 435

  Dupré, J., 78

  Durante, K., 105

  Durbin– Watson tests, 311 , 314

  Dyson, E., 120 , 156

  Earman, J., 69 , 74 , 128 , 156

  eclecticism in statistics, 27 , 397 , 424

  eclipse tests, 1919, 119 , 121 – 125 Barnard on, 126 , 139

  Einstein effect, 122 , 125

  H. Jeffreys on, 124 , 127

  mirror distortion controversy, 125 , 156 – 157

  Newton saving hypotheses, 127 – 128

  Sobral and Principe results, 123 – 124 ; see also General Theory of Relativity (GTR)

  ecumenism in statistics, 27 , 301

  Eddington, A., 119 – 122 , 124 , 126 , 156 , 226 , 369

  Edwards, A., 32

  Edwards, Lindman, and Savage (E, L, & S), see individual authors

  Edwards, W., 41 – 43 , 45 , 49 – 50 , 248 , 252 , 256 , 260 , 269

  effect size (ES), population (discrepancy) and observed, 340

  efficient tests of statistical hypotheses, 374 – 378 , 421 and power, 377 – 378

  Efron, B., 6 , 24 , 85 , 279n6 , 298 , 305 – 306 , 391n9 , 395 , 397 , 399 – 400 , 413

  Einstein, A., 119 – 124 , 127 – 128 , 224 , 226 , 229

  Einstein’ s Café, 154

  Eisenhart, C., 404

  either or question/horn, 270 , 282

  Ellis, P., 356

  empirical Bayes, 404 Lindley on, 405 ; see also Robbins, H. , Efron, B.

  en quelque sorte remarquable , 373 – 375

  Englert, F., 202

  enumerative induction (EI)/(straight rule of induction), 61 Popper on, 75

  Carnap’ s, 111

  and Bayes Theorem (B-boost), 63

  epistemic probability, 195

  epistemology, 4 ; see also normative epistemology

  equivalence principle (weak), 161 Einstein (self-gravitating bodies), 161

  equivalence testing, 160n10

  error probabilities, xii , 9 , 20 , 26 for discrediting cherry-picking, 283

  epistemic interpretation, 26 , 429

  inferential use of, 14 , 49 , 164

  and Likelihoodism, 41 , 48

  violate Likelihood Principle, 49 – 51 , 164 , 270 , 333 , 431

  meaning vs. application, 147 , 194

  performance construal, 13 , 38 , 140 , 429

  and preregistration, 286 , 320 , 439

  P -value as, 175 , 183

  rubbing off construal, 194 , 390 , 429

  and stringency, 382

  for solving induction, 114 – 115

  Type I and Type II, 9 , 137 – 140

  understanding, 174

  error probability1 vs. error probability2 , 183 – 187 , 231 , 338

  error statistics, xii , 9 error probes, 17

  foundations for Bayesian tools, 27 – 28

  vs. logics of statistical inference, 32 , 65 , 438

  severe testing as a proper subset of, 55

  blog, 418 ; see also Likelihood, Law of

  ESP, 31 , 283 – 284 degenerating program/falsified, 235

  explaining a known effect lost key, 281 – 282

  in eclipse results, 127 , 157

  in testing assumptions, 320

  fallacies of rejection spurious P -value, isolated result, magnitude error, 3 , 22 , 94

  high power = high hurdle, 332 ; see also mountains out of molehills

  fallacy of non-rejection/insignificance, 3 , 152 , 339 , 353 ; see also SIN

  false discovery rate (FDR) (Benjamini and Hochberg), 277 – 278 , 363n7

  falsification of alternatives in significance tests, 158 – 162

  and anomaly, 83

  asymmetry w/confirmation, 60 , 81 , 125

  of Bayesian priors, 417 – 420

  and Big Data, 294 – 295

  of central dogma of biology, 81 – 82 , 85

  confusion about, 125

  and eclipse tests, 125 – 127

  error statistics as direct, 436 , 443

  falsifying hypotheses, 83 , 121

  and Fisher, 132

  in the Higgs, 212

  and i-assumptions, 298

  of inquiry, 101 – 102

  methodological, 82 – 83

  and non-replication in GWAS, 294

  and Popper, 75 – 89

  vs. probabilism, 27 , 82

  as provisional, 235

  of replication, 99 , 104

  statistical, 152 ; see also Duhem’ s Problem

  family-wise error rate (FWER), 275 , 278 – 279

  Feddersen, A., 390n8

  Feyerabend, P., 228

  Feynman, R., 3 , 10 , 23 , 89

  fiducial c percent limit, 382

  frequency distribution, 383

  inference, 382 – 384 , 389

  vs. i
nverse inference, 383

  Island(s), 371 , 382 , 391 ; see also Fisher

  file drawer, 98 , 176 , 212 , 292

  FIRST (Fairly Intimately Related to the Statistical Test) interpretations, 150 – 151 , 174n2 , 234 and Cox’ s Taxonomy, 150 – 158

  Fisher and Neyman wars, 141 , 165 , 386 – 388 contrast to early compatibility, 140 , 384 , 387

  deconstructing, 371

  pathologies of, 390 – 391

  similarly behavioristic, 176 – 177 ; see also incompatibilism

  Fisher, R. A., 4 – 5 , 8 , 16 , 26 , 46 , 59 , 75 , 93 , 95 , 120 , 126 , 130 – 131 , 134 – 137 , 147 , 151 , 159 , 161 , 166 – 170 , 173 – 190 , 200 , 250 , 266 , 281 , 286 , 290 , 303 , 323 – 325 , 331 , 335 , 339 , 368 , 371 , 374 – 391 , 398 , 409 , 424 , 440 – 441 , 444 and autopsy, 294

  on background information, 233 – 234

  challenged the old guard, 386 – 387

  criticized by Neyman, 341 , 377

  relations with Neyman, 139 , 146 , 165 , 181 , 190 , 386 – 387 ; see also Neyman, feud with Fisher

  fiducial probability and Neyman’ s performance, 382 – 384 , 384n4 , 389 – 391

  and fraud busting, 54 , 284

  and induction, 66 , 86

  against isolated results, 4 , 22 , 83 , 204 , 362 , 438

  and making theories elaborate, 237

  political principle, 5

  on priors, 232 , 383 , 402 , 404 – 405

  shifting views on behavioristic performance, 176 , 182 , 390

  statistical model, 132

  tribe, 146 – 147 , 187

  on UMP tests, 386 , 390

  Fisher’ s testing principle (against isolated results), 4 , 204

  Fisherian test statistic criteria, 132 , 167 , 384

  Fitelson, B., 68 – 71

  flexibility, rewards, and bias hypothesis, 98

  Folks, J., 325

  forking paths, 105

  formal epistemology, 59 and intuitive principles of evidence, 73

  Forster, M., 318

  Foucault, M., 228

  foundations, decoupling from traditional, 164 , 401 – 402 ; see also Bayesian foundations

 

‹ Prev