Bayesian Statistics (4th ed)

Page 4

by Peter M Lee

so that

The constant can be found by integration if it is required. Alternatively, a glance at Appendix A will show that, given k, π has a beta distribution

and that the constant of proportionality is the reciprocal of the beta function . Thus, this beta distribution should represent your beliefs about π after you have observed k successes in n trials. This example has a special importance in that it is the one which Bayes himself discussed.

1.4.6 Independent random variables

The idea of independence extends from independence of events to independence of random variables. The basic idea is that y is independent of x if being told that x has any particular value does not affect your beliefs about the value of y. Because of complications involving events of probability zero, it is best to adopt the formal definition that x and y are independent if

for all values x and y. This definition works equally well in the discrete and the continuous cases (and indeed in the case where one random variable is continuous and the other is discrete). It trivially suffices that p(x, y) be a product of a function of x and a function of y.

All the above generalizes in a fairly obvious way to the case of more than two random variables, and the notions of pairwise and mutual independence go through from events to random variables easily enough. However, we will find that we do not often need such generalizations.

1.5 Means and variances

1.5.1 Expectations

Suppose that m is a discrete random variable and that the series

is absolutely convergent, that is such that

Then the sum of the original series is called the mean or expectation of the random variable, and we denote it

A motivation for this definition is as follows. In a large number N of trials, we would expect the value m to occur about p(m)N times, so that the sum total of the values that would occur in these N trials (counted according to their multiplicity) would be about

so that the average value should be about

Thus, we can think of expectation as being, at least in some circumstances, a form of very long term average. On the other hand, there are circumstances in which it is difficult to believe in the possibility of arbitrarily large numbers of trials, so this interpretation is not always available. It can also be thought of as giving the position of the ‘centre of gravity’ of the distribution imagined as a distribution of mass spread along the x-axis.

More generally, if g(m) is a function of the random variable and is absolutely convergent, then its sum is the expectation of g(m). Similarly, if h(m, n) is a function of two random variables m and n and the series is absolutely convergent, then its sum is the expectation of h(m, n). These definitions are consistent in that if we consider g(m) and h(m, n) as random variables with densities of their own, then it is easily shown that we get these values for their expectations.

In the continuous case, we define the expectation of a random variable x by

provided that the integral is absolutely convergent, and more generally define the expectation of a function g(x) of x by

provided that the integral is absolutely convergent, and similarly for the expectation of a function h(x, y) of two random variables. Note that the formulae in the discrete and continuous cases are, as usual, identical except for the use of summation in the one case and integration in the other.

1.5.2 The expectation of a sum and of a product

If x and y are any two random variables, independent or not, and a, b and c are constants, then in the continuous case

and similarly in the discrete case. Yet more generally, if g(x) is a function of x and h(y) a function of y, then

We have already noted that the idea of independence is closely tied up with multiplication, and this is true when it comes to expectations as well. Thus, if x and y are independent, then

and more generally if g(x) and h(y) are functions of independent random variables x and y, then

1.5.3 Variance, precision and standard deviation

We often need a measure of how spread out a distribution is, and for most purposes the most useful such measure is the variance of x, defined by

Clearly if the distribution is very little spread out, then most values are close to one another and so close to their mean, so that is small with high probability and hence is small. Conversely, if the distribution is well spread out then is large. It is sometimes useful to refer to the reciprocal of the variance, which is called the precision. Further, because the variance is essentially quadratic, we sometimes work in terms of its positive square root, the standard deviation, especially in numerical work. It is often useful that

The notion of a variance is analogous to that of a moment of inertia in mechanics, and this formula corresponds to the parallel axes theorem in mechanics. This analogy seldom carries much weight nowadays, because so many of those studying statistics took it up with the purpose of avoiding mechanics.

In discrete cases, it is sometimes useful that

1.5.4 Examples

As an example, suppose that . Then

After a little manipulation, this can be expressed as

Because the sum is a sum of binomial probabilities, this expression reduces to , and so

Similarly,

and so

For a second example, suppose . Then

The integrand in the last expression is an odd function of and so vanishes, so that

Moreover,

so that on writing

Integrating by parts (using z as the part to differentiate), we get

1.5.5 Variance of a sum; covariance and correlation

Sometimes we need to find the variance of a sum of random variables. To do this, note that

where the covariance of x and y is defined by

More generally,

for any constants a, b and c. By considering this expression as a quadratic in a for fixed b or vice versa and noting that (because its value is always positive) this quadratic cannot have two unequal real roots, we see that

We define the correlation coefficient between x and y by

It follows that

and indeed a little further thought shows that if and only if

with probability 1 for some constants a, b and c with a and b having opposite signs, while if and only if the same thing happens except that a and b have the same sign. If we say that x and y are uncorrelated.

It is easily seen that if x and y are independent then

from which it follows that independent random variables are uncorrelated.

The converse is not in general true, but it can be shown that if x and y have a bivariate normal distribution (as described in Appendix A), then they are independent if and only if they are uncorrelated.

It should be noted that if x and y are uncorrelated, and in particular if they are independent

(observe that there is a plus sign on the right-hand side even if there is a minus sign on the left).

1.5.6 Approximations to the mean and variance of a function of a random variable

Very occasionally, it will be useful to have an approximation to the mean and variance of a function of a random variable. Suppose that

Then if g is a reasonably smooth function and x is not too far from its expectation, Taylor’s theorem implies that

It, therefore, seems reasonable that a fair approximation to the expectation of z is given by

and if this is so, then a reasonable approximation to may well be given by

As an example, suppose that

and that z=g(x), where

so that

and thus . The aforementioned argument then implies that

The interesting thing about this transformation, which has a long history [see Eisenhart et al. (1947, Chapter 16) and Fisher (1954)], is that, to the extent to which the approximation is valid, the variance of z does not depend on the parameter π. It is accordingly known as a variance-stabilizing transformation. We will return to this transformation in Section 3.2 on
the ‘Reference Prior for the Binomial Distribution’.

1.5.7 Conditional expectations and variances

If the reader wishes, the following may be omitted on a first reading and then returned to as needed.

We define the conditional expectation of y given x by

in the continuous case and by the corresponding sum in the discrete case. If we wish to be pedantic, it can occasionally be useful to indicate what we are averaging over by writing

just as we can write , but this is rarely necessary (though it can slightly clarify a proof on occasion). More generally, the conditional expectation of a function g(y) of y given x is

We can also define a conditional varianceas

Despite some notational complexity, this is easy enough to find since after all a conditional distribution is just a particular case of a probability distribution. If we are really pedantic, then is a real number which is a function of the real number x, while is a random variable which is a function of the random variable , which takes the value when takes the value x. However, the distinction, which is hard to grasp in the first place, is usually unimportant.

We may note that the formula

could be written as

but we must be careful that it is an expectation over values of (i.e. ) that occurs here.

Very occasionally we make use of results like

The proofs are possibly more confusing than helpful. They run as follows:

Similarly, we get the generalization

and in particular

hence

while

from which it follows that

1.5.8 Medians and modes

The mean is not the only measure of the centre of a distribution. We also need to consider the median from time to time, which is defined as any value x0 such that

In the case of most continuous random variables there is a unique median such that

We occasionally refer also to the mode, defined as that value at which the pdf is a maximum. One important use we shall have for the mode will be in methods for finding the median based on the approximation

or equivalently

(see the preliminary remarks in Appendix A).

1.6 Exercises on Chapter 1

1. A card came is played with 52 cards divided equally between four players, North, South, East and West, all arrangements being equally likely. Thirteen of the cards are referred to as trumps. If you know that North and South have ten trumps between them, what is the probability that all three remaining trumps are in the same hand? If it is known that the king of trumps is included among the other three, what is the probability that one player has the king and the other the remaining two trumps?

2. a. Under what circumstances is an event A independent of itself?

b. By considering events concerned with independent tosses of a red die and a blue die, or otherwise. give examples of events A, B and C which are not independent, but nevertheless are such that every pair of them is independent.

c. By considering events concerned with three independent tosses of a coin and supposing that A and B both represent tossing a head on the first trial, give examples of events A, B and C which are such that although no pair of them is independent.

3. Whether certain mice are black or brown depends on a pair of genes, each of which is either B or b. If both members of the pair are alike, the mouse is said to be homozygous, and if they are different it is said to be heterozygous bb. The mouse is brown only if it is homozygous bb. The offspring of a pair of mice have two such genes, one from each parent, and if the parent is heterozygous, the inherited gene is equally likely to be B or b. Suppose that a black mouse results from a mating between two heterozygotes. a. What are the probabilities that this mouse is homozygous and that it is heterozygous?

Now suppose that this mouse is mated with a brown mouse, resulting in seven offspring, all of which turn out to be black.

b. Use Bayes’ Theorem to find the probability that the black mouse was homozygous BB.

c. Recalculate the same probability by regarding the seven offspring as seven observations made sequentially, treating the posterior after each observation as the prior for the next (cf. Fisher, 1959, Section II.2).

4. The example on Bayes’ Theorem in Section 1.2 concerning the biology of twins was based on the assumption that births of boys and girls occur equally frequently, and yet it has been known for a very long time that fewer girls are born than boys (cf. Arbuthnot, 1710). Suppose that the probability of a girl is p, so that

Find the proportion of monozygotic twins in the whole population of twins in terms of p and the sex distribution among all twins.

5. Suppose a red and a blue die are tossed. Let x be the sum of the number showing on the red die and twice the number showing on the blue die. Find the density function and the distribution function of x.

6. Suppose that where n is large and π is small but has an intermediate value. Use the exponential limit to show that and . Extend this result to show that k is such that

that is, k is approximately distributed as a Poisson variable of mean λ (cf. Appendix A).

7. Suppose that and have independent Poisson distributions of means and respectively (see question 6) and that k = m + n. a. Show that and .

b. Generalize by showing that k has a Poisson distribution of mean .

c. Show that conditional on k, the distribution of m is binomial of index k and parameter .

8. Modify the formula for the density of a one-to-one function g(x) of a random variable x to find an expression for the density of x2 in terms of that of x, in both the continuous and discrete case. Hence, show that the square of a standard normal density has a chi-squared density on one degree of freedom as defined in Appendix A.

9. Suppose that are independently and all have the same continuous distribution, with density f(x) and distribution function F(x). Find the distribution functions of

in terms of F(x), and so find expressions for the density functions of M and m.

10. Suppose that u and v are independently uniformly distributed on the interval [0, 1], so that the divide the interval into three sub-intervals. Find the joint density function of the lengths of the first two sub-intervals.

11. Show that two continuous random variables x and y are independent (i.e. p(x, y)=p(x)p(y) for all x and y) if and only if their joint distribution function F(x, y) satisfies F(x, y)=F(x)F(y) for all x and y. Prove that the same thing is true for discrete random variables. [This is an example of a result which is easier to prove in the continuous case.]

12. Suppose that the random variable x has a negative binomial distribution of index n and parameter π, so that

Find the mean and variance of x and check that your answer agrees with that given in Appendix A.

13. A random variable X is said to have a chi-squared distribution on degrees of freedom if it has the same distribution as

where Z1, Z2, , are independent standard normal variates. Use the facts that , and to find the mean and variance of X. Confirm these values using the probability density of X, which is

(see Appendix A).

14. The skewness of a random variable x is defined as where

(but note that some authors work in terms of ). Find the skewness of a random variable X with a binomial distribution of index n and parameter π.

15. Suppose that a continuous random variable X has mean μ and variance . By writing

and using a lower bound for the integrand in the latter integral, prove that

Show that the result also holds for discrete random variables. [This result is known as Čebyšev’s Inequality (the name is spelt in many other ways, including Chebyshev and Tchebycheff).]

16. Suppose that x and y are such that

Show that x and y are uncorrelated but that they are not independent.

17. Let x and y have a bivariate normal distribution and suppose that x and y both have mean 0 and variance 1, so that their marginal distributions are standard
normal and their joint density is

Show that if the correlation coefficient between x and y is ρ, then that between x2 and y2 is .

18. Suppose that x has a Poisson distribution (see question 6) P(λ) of mean λ and that, for given x, y, has a binomial distribution B(x, π) of index x and parameter π. a. Show that the unconditional distribution of y is Poisson of mean

b. Verify that the formula

derived in Section 1.5 holds in this case.

19. Define

and show (by setting z=xy and then substituting z for y) that

Deduce that

By substituting (1+x2)z2=2t, so that show that so that the density of the standard normal distribution as defined in Section 1.3 does integrate to unity and so is indeed a density. (This method is due to Laplace, 1812, Section 24.)

2

Bayesian inference for the normal distribution

2.1 Nature of Bayesian inference

‹ Prev Next ›