by Peter M Lee
4.7 Exercises on Chapter 4
1. Show that if the prior probability of a hypothesis is close to unity, then the posterior probability p0 satisfies and more exactly .
2. Watkins (1986, Section 13.3) reports that theory predicted the existence of a Z particle of mass GeV, while first experimental results showed its mass to be GeV. Find the prior and posterior odds and the Bayes ratio for the hypothesis that its mass is less than 93.0 GeV.
3. An experimental station wishes to test whether a growth hormone will increase the yield of wheat above the average value of 100 units per plot produced under currently standard conditions. Twelve plots treated with the hormone give the yields:
Find the P-value for the hypothesis under consideration.
4. In a genetic experiment, theory predicts that if two genes are on different chromosomes, then the probability of a certain event will be 3/16. In an actual trial, the event occurs 56 times in 300. Use Lindley’s method to decide whether there is enough evidence to reject the hypothesis that the genes are on the same chromosome.
5. With the data in the example in Section 3.4 on ‘The Poisson distribution’, would it be appropriate to reject the hypothesis that the true mean equalled the prior mean (i.e. that )? [Use Lindley’s method.]
6. Suppose that the standard test statistic takes the value z=2.5 and that the sample size is n = 100. How close to does a value of θ have to be for the value of the normal likelihood function at to be within 10% of its value at ?
7. Show that the Bayes factor for a test of a point null hypothesis for the normal distribution (where the prior under the alternative hypothesis is also normal) can be expanded in a power series in as
8. Suppose that x1, x2, . Show over the interval the likelihood varies by a factor of approximately
9. At the beginning of Section 4.5, we saw that under the alternative hypothesis that the predictive density for was , so that
Show that a maximum of this density considered as a function of ψ occurs when , which gives a possible value for ψ if . Hence, show that if then for any such alternative hypothesis, the Bayes factor satisfies
and deduce a bound for p0 (depending on the value of ).
10. In the situation discussed in Section 4.5, for a given P-value (so equivalently for a given z) and assuming that , at what value of n is the posterior probability of the null hypothesis a minimum.
11. Mendel (1865) reported finding 1850 angular wrinkled seeds to 5474 round or roundish in an experiment in which his theory predicted a ratio of 1:3. Use the method employed for Weldon’s dice data in Section 4.5 to test whether his theory is confirmed by the data. [However, Fisher (1936) cast some doubt on the genuineness of the data.]
12. A window is broken in forcing entry to a house. The refractive index of a piece of glass found at the scene of the crime is x, which is supposed . The refractive index of a piece of glass found on a suspect is y, which is supposed . In the process of establishing the guilt or innocence of the suspect, we are interested in investigating whether is true or not. The prior distributions of and are both where . Write
Show that, if H0 is true and , then and are independent and
By writing and , go on to show that u has an distribution and that z has an , so approximately an , distribution. Conversely, show that if H0 is false and and are assumed independent, then and are all independent and
By writing
and
show that in this case u has an , so approximately an , distribution, while z has an , so approximately an , distribution. Conclude that the Bayes factor is approximately
Suppose that the ratio of the standard deviations is 100 and that , so that the difference between x and y represents two standard deviations, and that , so that both specimens are of commonly occurring glass. Show that a classical test would reject H0 at the 5% level, but that B=9.57, so that the odds in favour of H0 are multiplied by a factor just below 10.
[This problem is due to Lindley (1977); see also Shafer (1982). Lindley comments that, ‘What the [classical] test fails to take into account is the extraordinary coincidence of x and y being so close together were the two pieces of glass truly different’.]
13. Lindley (1957) originally discussed his paradox under slightly different assumptions from those made in this book. Follow through the reasoning used in Section 4.5 with representing a uniform distribution on the interval to find the corresponding Bayes factor assuming that , so that an variable lies in this interval with very high probability. Check that your answers are unlikely to disagree with those found in Section 4.5 under the assumption that represents a normal density.
14. Express in your own words the arguments given by Jeffreys (1961, Section 5.2) in favour of a Cauchy distribution
in the problem discussed in the previous question.
15. Suppose that x has a binomial distribution of index n and parameter θ, and that it is desired to test against the alternative hypothesis : a. Find lower bounds on the posterior probability of H0 and on the Bayes factor for H0 versus H1, bounds which are valid for any .
b. If n = 20, and x = 15 is observed, calculate the (two-tailed) P-value and the lower bound on the posterior probability when the prior probability of the null hypothesis is .
16. Twelve observations from a normal distribution of mean θ and variance are available, of which the sample mean is 1.2 and the sample variance is 1.1. Compare the Bayes factors in favour of the null hypothesis that assuming that (a) is unknown and (b) it is known that .
17. Suppose that in testing a point null hypothesis you find a value of the usual Student’s statistic of 2.4 on 8 degrees of freedom. Would the methodology of Section 4.6 require you to ‘think again’?
18. Which entries in the table in Section 4.5 on ‘Point null hypotheses for the normal distribution’ would, according to the methodology of Section 4.6, cause you to ‘think again’?
5
Two-sample problems
5.1 Two-sample problems – both variances unknown
5.1.1 The problem of two normal samples
We now want to consider the situation in which we have independent samples from two normal distributions, namely,
which are independent of each other, and the quantity really of interest is the posterior distribution of
This problem arises in comparative situations, for example, in comparing the achievement in geometry tests of boy and girl pupils.
5.1.2 Paired comparisons
Before proceeding further, you should be warned against a possible misapplication of the model. If m = n and each of the xs is in some sense paired with one of the ys, say xi with yi, you should define
and then investigate the ws as a sample
for some ω. This is known as the method of paired comparisons. It might arise if, for example, the comparison of performance of boys and girls were restricted to pairs of twins of opposite sexes. The reason that such a situation is not to be treated as a two sample problem in the sense described at the start is that there will be an effect common to any pair of twins, so that the observations on the boys and on the girls will not be fully independent. It is a very valuable technique which can often give a more precise measurement of an effect, but it is important to distinguish it from a case where the two samples are independent. There is no particular difficulty in analyzing the results of a paired comparison experiment by the methods described in Chapter I for samples from a single normal distribution.
5.1.3 Example of a paired comparison problem
‘Student’ (1908) quotes data due to A. R. Cushny and A. R. Peebles on the extra hours of sleep gained by ten patients using laevo (L) and dextro (D) hyoscyamine hydrobromide, as follows:
[In fact he misidentifies the substances involved – see E. S. Pearson (1990, p. 54), but the method is still well illustrated]. If we are interested in the difference between the effects of the two forms of the drug, we should find the mean and the sample sum of squares S = 13.616, and hence the sample standard d
eviation s = 1.23. Assuming a standard reference prior for δ and a variance known to equal 1.232, the posterior distribution of the effect δ of using the L rather than the D form is N(1.58, 1.232). We can then use this distribution, for example, to give an HDR for δ or to test a hypothesis about δ (such as versus ) in the ways discussed in previous sections. On the other hand, if we are interested simply in the effect of the L form, then the data about the D form are irrelevant and we can use the same methods on the xi. It is straightforward to extend the analysis to allow for a non-trivial prior for δ or an unknown variance or both.
5.1.4 The case where both variances are known
In the case of the two-sample problem proper, there are three cases that can arise:
(i) and ψ are known;
(ii) It is known that but their common value is unknown;
(iii) and ψ are unknown.
For the rest of this section, we shall restrict ourselves to case (i). It should, however, be noted that it is not really likely that you would know the variances exactly (although you might have some idea from past experience). The main reason for discussing this case, as in the problem of a single sample from a normal distribution, is that it involves fewer complexities than the case where the variances are unknown.
If λ and μ have independent reference priors then it follows from Section 2.3 on ‘Several normal observations with a normal prior’ that the posterior for λ is , and similarly the posterior for μ is independently of λ. It follows that
5.1.5 Example
The weight gains (in grammes) between the 28th and 84th days of age of m = 12 rats receiving diets with high-protein diets were as follows:
while the weight gains for n = 7 rates on a low-protein diet were
(cf. Armitage et al. 2001, Section 4.4). The sample mean and sum of squared deviations about the mean for the high-protein group are and 5032, implying a sample variance of 5032/11 = 457. For the low-protein group the mean and sum of squared deviations about the mean are and 2552, implying a sample variance of 2552/6 = 425. Although the values for the variances are derived from the samples, the method will be illustrated by proceeding as if they were known (perhaps from past experience). Then
from which it follows that the posterior distribution of the parameter δ that measures the effect of using a high-protein rather than a low-protein diet is N(120–101, 457/12+425/7), that is N(19, 99).
It is now possible to deduce, for example, that a 90% HDR for δ is , that is, (3, 35). Also, the posterior probability that is or about 97%. Furthermore, it is possible to conduct a test of the point null hypothesis that δ = 0. If the variance of δ under the alternative hypothesis is denoted ω (rather than ψ as in Section 4.4 on ‘Point null hypotheses for the normal distribution’ since ψ now has another meaning), then the Bayes factor is
where z is the standardized normal variable (under the null hypothesis), namely, . It is not wholly clear what value should be used for ω. One possibility might be to take , and if this is done then
If the prior probability of the null hypothesis is taken as , then this gives a posterior probability of p0=(1+0.61–1)–1=0.38, so that it has dropped, but not dropped very much.
5.1.6 Non-trivial prior information
The method is easily generalized to the case where substantial prior information is available. If the prior for λ is then the posterior is where (as was shown in Section 2.3 on ‘Several normal observations with a normal prior’)
Similarly, if the prior for μ is then the posterior for μ is where and are similarly defined. It follows that
and inferences can proceed much as before.
5.2 Variances unknown but equal
5.2.1 Solution using reference priors
We shall now consider the case where we are interested in and we have independent vectors and such that
so that the two samples have a common variance .
We can proceed much as we did in Section 2.12 on ‘Normal mean and variance both unknown’. Begin by defining
For the moment, take independent priors uniform in λ, μ and , that is,
With this prior, the posterior is
where
It follows that, for given , the parameters λ and μ have independent normal distributions, and hence that the joint density of and δ is
where is an density. The variance can now be integrated out just as in Section 2.12 when we considered a single sample from a normal distribution of unknown variance, giving a very similar conclusion, that is, that if
where , then . Note that the variance estimator s2 is found by adding the sums of squares Sx and Sy about the observed means and dividing by the sum of the corresponding numbers of degrees of freedom, and , and that this latter sum gives the number of degrees of freedom of the resulting Student’s t variable. Another way of looking at it is that s2 is a weighted mean of the variance estimators s2x and s2y given by the two samples with weights proportional to the corresponding degrees of freedom.
5.2.2 Example
This section can be illustrated by using the data considered in the last section on the weight growth of rats, this time supposing (more realistically) that the variances are equal but unknown. We found that Sx=5032, Sy=2552, and so that S = 7584, , s2=7584/17=446 and
Since and , the posterior distribution of δ is given by
From tables of the t distribution it follows, for example, that a 90% HDR for δ is , that is (2, 36). This is not very different from the result in Section 5.2, and indeed it will not usually make a great deal of difference to assume that variances are known unless the samples are very small.
It would also be possible to do other things with this posterior distribution, for example, to find the probability that or to test the point null hypothesis that , but this should be enough to give the idea.
5.2.3 Non-trivial prior information
A simple analysis is possible if we have prior information which, at least approximately, is such that the prior for is and, conditional on , the priors for λ and μ are such that
independently of one another. This means that
Of course, as in any case where conjugate priors provide a nice mathematical theory, it is a question that has to be faced up to in any particular case whether or not a prior of this form is a reasonable approximation to your prior beliefs, and if it is not then a more untidy analysis involving numerical integration will be necessary. The reference prior used earlier is of this form, though it results from the slightly strange choice of values , S0=m0=n0=0. With such a prior, the posterior is
where
(The formula for S1 takes a little manipulation.) It is now possible to proceed as in the reference prior case, and so, for given , the parameters λ and μ have independent normal distributions, so that the joint density of and can be written as
where is an density. The variance can now be integrated out as before, giving a very similar result, namely, that if
where , then .
The methodology is sufficiently similar to the case where a reference prior is used that it does not seem necessary to give a numerical example. Of course, the difficulty in using it, in practice, lies in finding appropriate values of the parameters of the prior distribution .
5.3 Variances unknown and unequal (Behrens–Fisher problem)
5.3.1 Formulation of the problem
In this section, we are concerned with the most general case of the problem of two normal samples, where neither the means nor the variances are assumed equal. Consequently we have independent vectors and such that
and . This is known as the Behrens–Fisher problem (or sometimes as the Behrens problem or the Fisher–Behrens problem).
It is convenient to use the notation of the previous section, except that sometimes we write and instead of and to avoid using sub-subscripts. In addition, it is useful to define
For the moment, we shall assume independent reference priors uniform in λ, μ, and ψ. Then, just as in Section 2.12 on ‘Normal mean a
nd variance both unknown’, it follows that the posterior distributions of λ and μ are independent and are such that
It is now useful to define T and θ by
(θ can be taken in the first quadrant). It is then easy to check that
Since θ is known (from the data) and the distributions of Tx and Ty are known, it follows that the distribution of T can be evaluated. This distribution is tabulated and is called Behrens’ (or the Behrens–Fisher or Fisher–Behrens) distribution, and it will be denoted
It was first referred to in Behrens (1929).
5.3.2 Patil’s approximation
Behrens’ distribution turns out to have a rather nasty form, so that the density at any one point can only be found by a complicated integral, although a reasonable approximation was given by Patil (1965). To use this approximation, you need to find
Then approximately
Because b is not necessarily an integer, use of this approximation may necessitate interpolation in tables of the t distribution.
A rather limited table of percentage points of the Behrens distribution based on this approximation is to be found in the tables at the end of the book, but this will often be enough to give some idea as to what is going on. If more percentage points are required or the tables are not available, Patil’s approximation or something like the program in Appendix C has to be used.