Report CopyRight/DMCA Form For : Lectures On Statistics
Lectures on Statistics William G Faris December 1 2003 ii Contents 1 Expectation 1 1 1 Random variables and expectation 1
1 Expectation 1,1 1 Random variables and expectation 1. 1 2 The sample mean 3,1 3 The sample variance 4,1 4 The central limit theorem 5. 1 5 Joint distributions of random variables 6,1 6 Problems 7. 2 Probability 9,2 1 Events and probability 9,2 2 The sample proportion 10. 2 3 The central limit theorem 11,2 4 Problems 13,3 Estimation 15. 3 1 Estimating means 15,3 2 Two population means 17. 3 3 Estimating population proportions 17,3 4 Two population proportions 18. 3 5 Supplement Confidence intervals 18,3 6 Problems 19. 4 Hypothesis testing 21,4 1 Null and alternative hypothesis 21. 4 2 Hypothesis on a mean 21,4 3 Two means 23,4 4 Hypothesis on a proportion 23. 4 5 Two proportions 24,4 6 Independence 24,4 7 Power 25. 4 8 Loss 29,4 9 Supplement P values 31,4 10 Problems 33. iv CONTENTS,5 Order statistics 35,5 1 Sample median and population median 35. 5 2 Comparison of sample mean and sample median 37. 5 3 The Kolmogorov Smirnov statistic 38,5 4 Other goodness of fit statistics 39. 5 5 Comparison with a fitted distribution 40,5 6 Supplement Uniform order statistics 41. 5 7 Problems 42,6 The bootstrap 43,6 1 Bootstrap samples 43. 6 2 The ideal bootstrap estimator 44,6 3 The Monte Carlo bootstrap estimator 44. 6 4 Supplement Sampling from a finite population 45. 6 5 Problems 47,7 Variance and bias in estimation 49. 7 1 Risk 49,7 2 Unbiased estimators 50,7 3 The Crame r Rao bound 52. 7 4 Functional invariance 54,7 5 Problems 55,8 Maximum likelihood estimation 57. 8 1 The likelihood function 57,8 2 The maximum likelihood estimator 59. 8 3 Asymptotic behavior of the maximum likelihood estimator 59. 8 4 Asymptotic theory 60, 8 5 Maximum likelihood as a fundamental principle 61. 8 6 Problems 63,9 Bayesian theory 65,9 1 The Bayesian framework 65. 9 2 Baysian estimation for the mean of a normal population 65. 9 3 Probability distributions 66,9 4 Prior and posterior distributions 68. 9 5 Bayesian estimation of a population proportion 69. 9 6 Problems 71,10 Decision theory and Bayesian theory 73. 10 1 Decision theory 73,10 2 Bayesian decisions 75. 10 3 Bayesian decisions and risk 76,10 4 Problems 78. CONTENTS v,11 Testing hypotheses 79,11 1 Null and alternative hypothesis 79. 11 2 Simple null and alternative hypotheses 80,11 3 Minimax risk 81. 11 4 One sided tests 82,11 5 Bayes tests for simple hypotheses 82. 11 6 One sided Bayes tests 84,11 7 p values 85,11 8 Two sided Bayes tests 86. 11 9 Lessons for hypothesis testing 87,11 10Problems 88. 12 Bayes and likelihood procedures 91,12 1 Bayes decisions 91. 12 2 Estimation 92,12 3 Testing 94,12 4 Problems 98. 13 Regression and Correlation 101,13 1 Regression 101. 13 2 Correlation 103,13 3 Principal component analysis 105. 14 Linear models Estimation 109,14 1 Estimation 109. 14 2 Regression 110,14 3 Analysis of variance one way 111. 14 4 Analysis of variance two way 112,14 5 Problems 113. 15 Linear models Hypothesis testing 115,15 1 Hypothesis testing 115. 15 2 Chi squared and F 116,15 3 Regression 116,15 4 Analysis of variance one way 117. 15 5 Analysis of variance two way 118,15 6 One way versus two way 119. 15 7 Problems 120,A Linear algebra review 121,A 1 Vector spaces 121. A 2 Matrix multiplication 122,A 3 The transpose 122. A 4 The theorem of Pythagoras 124,A 5 The projection theorem 124. A 6 Problems 126,vi CONTENTS,Expectation,1 1 Random variables and expectation. This chapter is a brief review of probability We consider an experiment with a. set of outcomes A random variable is a function from to R Thus for each. outcome in there is a corresponding experimental number X. A probability model assigns to each positive random variable X 0 an. expectation or mean E X with 0 E X If X is a random variable that. is not positive then it is possible that the expectation is not defined However if. E X then E X is defined and E X E X In some circumstances. the expectation will be called the population mean. The expectation satisfies the following properties. 1 E aX aE X,2 E X Y E X E Y,3 X Y implies E X E Y, The first two properties are called linearity The third property is the order. property and the fourth property is normalization, One more special but very useful class consists of the random variables for. which E X 2 We shall see in a moment that for every such random. variable E X 2 E X 2 so this is included in the class of random variables. for which E X is defined There is a fundamental inequality that is used over. and over the Schwarz inequality,Theorem 1 1 p p,E XY E X 2 E Y 2 1 1. 2 CHAPTER 1 EXPECTATION,Use the elementary inequality. 1 2 2 1 1 2,XY a X Y 1 2,By the order property and linearity. E XY a E X 2 E Y 2 1 3, If E X 2 0 then choose a2 E Y 2 E X 2 If E X 2 0 then by taking. a sufficiently large one sees that E XY 0,Corollary 1 1 p. E X E X 2 1 4, In probability it is common to use the centered random variable X E X. This is the random variable that measures deviations from the expected value. There is a special terminology in this case The variance of X is. Var X E X E X 2 1 5, In the following we shall sometimes call this the population variance Note the. important identity,Var X E X 2 E X 2 1 6, There is a special notation that is in standard use The mean of X is written. The Greek mu reminds us that this is a mean The variance of X is written. X Var X E X X 2 1 8, The square root of the variance is the standard deviation of X This is. X E X X 2 1 9, The Greek sigma reminds us that this is a standard deviation If we center the. random variable and divided by its standard deviation we get the standardized. random variable,The covariance of X and Y is,Cov X Y E X E X Y E Y 1 11. Note the important identity,Cov X Y E XY E X E Y 1 12. From the Schwarz inequality we have the following important theorem. 1 2 THE SAMPLE MEAN 3,Theorem 1 2 p p,Cov X Y Var X Var Y 1 13. Sometimes this is stated in terms of the correlation coefficient. X Y p p 1 14,Var X Var Y, which is the covariance of the standardized random variables In the following. we shall sometimes call this the population correlation coefficient The result is. the following,Corollary 1 2,X Y 1 1 15, Perhaps the most important theorem in probability is the following It is a. trivial consequence of linearity but it is the key to the law of large numbers. Theorem 1 3,Var X Y Var X 2 Cov X Y Var Y 1 16, Random variables X and Y are said to be uncorrelated if Cov X Y 0. Note that this is equivalent to the identity E XY E X E Y. Corollary 1 3 If X and Y are uncorrelated then the variances add. Var X Y Var X Var Y 1 17,1 2 The sample mean, In statistics the sample mean is used to estimate the population mean. Theorem 1 4 Let X1 X2 X3 Xn be random variables each with mean. be their sample mean Then the expectation of X n is. E X n 1 19,Proof The expectation of the sample mean X n is. 1X 1 X 1X 1,E X n E Xi E Xi E Xi n 1 20,n i 1 n i 1 n i 1 n. 4 CHAPTER 1 EXPECTATION, Theorem 1 5 Let X1 X2 X3 Xn be random variables each with mean. and standard deviation Assume that each pair Xi Xj of random variables. with i 6 j is uncorrelated Let, be their sample mean Then the standard deviation of X n is. Proof The variance of the sample mean X n is,1X 1 X 1 X 1 1. Var X n Var Xi 2 Var Xi 2 Var Xi 2 n 2 2,n i 1 n i 1. We can think of these two results as a form of the weak law of large numbers. The law of large numbers is the law of averages that says that averaging. uncorrelated random variable gives a result that is approximately constant In. this case the sample mean has expectation and standard deviation n. Thus if n is large enough it is a random variable with expectation and with. little variability, The factor 1 n is both the blessing and the curse of statistics It is a. wonderful fact since it says that averaging reduces variability The problem. of course is that while 1 n goes to zero as n gets larger it does so rather. slowly So one must somehow obtain a quite large sample in order to ensure. rather moderate variability, The reason the law is called the weak law is that it gives a statement about. a fixed large sample size n There is another law called the strong law that gives. a corresponding statement about what happens for all sample sizes n that are. sufficiently large Since in statistics one usually has a sample of a fixed size n. and only looks at the sample mean for this n it is the more elementary weak. law that is relevant to most statistical situations. 1 3 The sample variance,The sample mean Pn, is a random variable that may be used to estimate an unknown population mean. In the same way the sample variance,2 Xi X n 2,s i 1 1 25. may be used to estimate an unknown population variance 2. 1 4 THE CENTRAL LIMIT THEOREM 5, The n 1 in the denominator seems strange However it is due to the. fact that while there are n observations Xi their deviations from the sample. mean Xi X n sum to zero so there are only n 1 quantities that can vary. independently The following theorem shows how this choice of denominator. makes the calculation of the expectation give a simple answer. Theorem 1 6 Let X1 X2 X3 Xn be random variables each with mean. and standard deviation Assume that each pair Xi Xj of random vari. ables with i 6 j is uncorrelated Let s2 be the sample variance Then the. expectation of s2 is,E s2 2 1 26,Proof Compute,Xi 2 Xi X n X n 2 Xi X n 2 X n 2 1 27. i 1 i 1 i 1 i 1, Notice that the cross terms sum to zero Take expectations This gives. n 2 E Xi X n 2 n 1 28,The result then follows from elementary algebra. 1 4 The central limit theorem, Random variables X Y are called independent if for all functions g and h we. E g X h Y E g X E h Y 1 29, Clearly independent random variables are uncorrelated The notion of indepen. dence has an obvious generalization to more than two random variables. Two random variables X and Y are said to have the same distribution if. for all functions g we have E g X E g Y Thus all probability predictions. about the two random variables taken individually are the same. From now on we deal with a standard situation We consider a sequence. of random variables X1 X2 X3 Xn They are assumed to be produced. by repeating an experiment n times The number n is called the sample size. Typically we shall assume that these random variables are independent Further. we shall assume that they all have the same distribution that is they are. identically distributed, We say that a random variable Z has a standard normal distribution if for. all bounded functions f we have,E g Z g z exp dz 1 30. It is called standard because it has mean zero and variance 1. 6 CHAPTER 1 EXPECTATION, Theorem 1 7 Let X1 Xn be independent random variables all with the. same mean and standard deviation Assume that they are identically dis. tributed Let X n be the sample mean The standardized sample mean is. Let g be a bounded piecewise continuous function Then. E g Zn E g Z 1 32,as n where Z is standard normal,1 5 Joint distributions of random variables. Let X1 Xn be random variables Then the joint distribution of these ran. dom variables is specified by giving either a continuous density or a discrete. If for all functions g for which the expectation exists we have. E g X1 Xn g x1 xn f x1 xn dx1 dxn 1 33, then the random variables have a continuous probability density f x1 xn. with parameter, If for all functions g for which the expectation exists we have. E g X1 Xn g x1 xn f x1 xn 1 34, then the random variables have a discrete probability density f x1 xn. with parameter, In the case of independent identically distributed random variables the joint. probability density factors,f x1 xn f x1 f xn 1 35. Example The exponential distribution is defined by the continuous density. f x e x 1 36, for x 0 and f x 0 for x 0 Here 0 is related to the mean by. 1 It is a typical distribution for a waiting time in continuous time. for the next jump The distribution of X1 Xn is then Gamma with. parameters n It is the waiting time for the nth jump. Example The normal distribution is defined by the continuous density. f x 2 e 2 2 1 37,1 6 PROBLEMS 7, The distribution of X1 Xn is then normal with mean n and variance. Example The Bernoulli distribution is defined by the discrete density. f x p px 1 p 1 x 1 38, for x 0 1 Here p This counts the occurrence of a single success or. failure The distribution of X1 Xn is then binomial It counts the. number of successes in n independent trials, Example The geometric distribution is defined by the discrete density. f x p p 1 p x 1 39, for x 0 1 2 3 Here p1 1 This is the distribution of the number. of failures before the first success The distribution of X1 Xn is then. negative binomial It is the number of failures before the nth success. Example The Poisson distribution with mean 0 is defined by the dis. crete density,f x e 1 40, for x 0 1 2 3 It is a typical distribution of the number of successes in a. fixed interval of continuous time The distribution of X1 Xn is Poisson. with mean n It counts the number of successes in n disjoint intervals. 1 6 Problems, 1 Consider the experiment of throwing P6 a die n times The results are. X1 Xn Then E f Xi 16 k 1 f k and the Xi are independent. Find the mean and standard deviation of each Xi, 2 Consider the dice experiment Take n 25 Find the mean X of the. sample mean X Find the standard deviation X of the sample mean X. 3 Perform the dice experiment with n 25 and get an outcome Record. the 25 numbers Report the sample mean X Report the sample. standard deviation s, 4 Consider independent random variables X1 Xn For notational con. venience consider the centered random variables Yi Xi so that. E Yi 0 Let 2 E Yi2 and q 4 E Yi4 Prove that,E Y n4 nq 4 3n n 1 4 1 41. 5 In the proceeding problem show that,E Y n4 q 3 4 1 42. 2 k 1 2 k 1,8 CHAPTER 1 EXPECTATION, In terms of the original Xi this says that there is a constant C such that. E X n 4 C 1 43, Thus if k is large then all the sample means X n for n k are likely to. be close to in some average sense This is a form of the strong law of. large numbers Compare with the weak law,E X k 2 2 1 44. which only shows that for each fixed k the sample mean X k is very likely. to be close to,Probability,2 1 Events and probability. Probability is a special case of expectation, We consider an experiment with a set of outcomes An event is a subset. of For each event A there is a random variable 1A called the indicator of. this event The value 1A 1 if the outcome belongs to A and the value. 1A 0 if the outcome does not belong to A Thus one scores 1 if the. outcome belongs to A and one scores zero if the outcome does not belong to. A The probability of the event is defined by,P A E 1A 2 1. In the following we shall sometimes call the probability of an event the. population proportion Probability satisfies the following properties. 2 P A B P A B P A P B,3 A B implies P A P B, The second properties is additivity The third property is the order property. The first and fourth properties are normalizations They say that the probability. of the sure event is one while the probability of the impossible event is zero. The additivity property is often used in the following form Say that A B are. exclusive events if A B If A B are exclusive then P A B P A P B. Another useful concept is that of complementary event The event Ac is. defined to consist of all of the outcomes that are not in A Then by additivity. P A P Ac 1, Events A B are said to be independent if P A B P A P B.