500 likes | 661 Views
Chapter 10. Introduction to Estimation. Statistical Inference…. Statistical inference is the process by which we acquire information and draw conclusions about populations from samples.
E N D
Chapter 10 Introduction to Estimation
Statistical Inference… • Statistical inference is the process by which we acquire information and draw conclusions about populations from samples. • In order to do inference, we require the skills and knowledge of descriptive statistics, probability distributions, and sampling distributions.
Estimation… • There are two types of inference: estimation and hypothesis testing; estimation is introduced first. • The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. • E.g., the sample mean ( ) is employed to estimate the population mean ( ).
Estimation… • The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. • There are two types of estimators: • Point Estimator • Interval Estimator
Point Estimator… • A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point. • We saw earlier that point probabilities in continuous distributions were virtually zero. Likewise, we’d expect that the point estimator gets closer to the parameter value with an increased sample size, but point estimators don’t reflect the effects of larger sample sizes. Hence we will employ the interval estimator to estimate population parameters…
Parameter Population distribution ? Point estimator Point Estimator A point estimator draws inference about a population by estimating the value of an unknown parameter using a single value or point. Sampling distribution
Interval Estimator… • An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. • That is we say (with some ___% certainty) that the population parameter of interest is between some lower and upper bounds.
Population distribution Parameter Sample distribution Interval Estimator • An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Interval estimator
Point & Interval Estimation… • For example, suppose we want to estimate the mean summer income of a class of business students. For n=25 students, • is calculated to be 400 $/week. • point estimate interval estimate • An alternative statement is: • The mean income is between 380 and 420 $/week.
Qualities of Estimators… • Qualities desirable in estimators include unbiasedness, consistency, and relative efficiency: • An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. • An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. • If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient.
Unbiased Estimators… • An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. • E.g. the sample mean X is an unbiased estimator of the population mean , since: • E(X) =
Consistency… • An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. • E.g. X is a consistent estimator of because: • V(X) is • That is, as n grows larger, the variance of X grows smaller.
Relative Efficiency… • If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient. • E.g. both the the sample median and sample mean are unbiased estimators of the population mean, however, the sample median has a greater variance than the sample mean, so we choose since it is relatively efficient when compared to the sample median.
Estimating when is known… • We can calculate an interval estimator from a sampling distribution, by: • Drawing a sample of size n from the population • Calculating its mean, • And, by the central limit theorem, we know that X is normally (or approximately normally) distributed so… • …will have a standard normal (or approximately normal) distribution.
Estimating when is known… • Looking at this in more detail… Known, i.e. sample mean Known, i.e. standard normal distribution Unknown, i.e. we want to estimate the population mean Known, i.e. its assumed we know the population standard deviation… Known, i.e. the number of items sampled
Estimating when is known… the confidence interval • We established in Chapter 9: • Thus, the probability that the interval: • contains the population mean is 1– . This is a confidence interval estimator for . the sample mean is in the center of the interval…
Estimation of a Population Mean • The normal curve rule tells us that the sampling • distribution of will have almost all values between • m – 3 and m + 3 • Since the sampling distribution of the sample mean • should be centered at the population mean m, • this tells us that should be no further away from m than • . • This range of values will almost certainly contain m.
An Example A large hospital wants to estimate the average length of time that surgical patients remain in the hospital. To accomplish this, a random sample of 100 patient records is obtained from the records of all patients who have had surgery in recent years. For these 100 patients, the sample mean is found to be 7.8 days. The standard deviation of hospital stay for all surgical patients is 9.4 days. Estimate the mean length of stay, by forming an interval that almost certainly contains the value of the mean, μ.
Solution = 7.8 days σ = 9.4 days n = 100 3 = 3 x = 2.82 The smallest and largest plausible values for μ are 7.8-2.82 = 4.98 and 7.8 + 2.82 = 10.62
Estimation of a Population Mean The Normal Curve Rule provides a quick method to estimate m, but we wish to refine the solution. In general, our confidence interval to estimate m will be: – z to + z The value of “z” will depend on the level of confidence. The values are given below, but we need to understand where they are derived. Desired Confidence 90% 95% 99% z-value 1.645 1.96 2.575
Confidence Interval Formula Let’s pick the 95% level of confidence and derive the formula for the interval. What proportion of z-scores lie between –1.96 and 1.96? The area between 0 and 1.96 is 0.475. The area between –1.96 and 0 is also 0.475. So total area is 0.475 + 0.475 = 0.950. So 95% of z-scores lie between –1.96 and 1.96.
Confidence Interval Estimator for : • The probability 1– is called the confidence level. Usually represented with a “plus/minus” ( ± ) sign upper confidence limit (UCL) lower confidence limit (LCL)
Margin of error • Sometimes, in reporting the results of surveys, the media uses the term “margin of error”. This is roughly equivalent to the and used in the confidence interval • equation. It shows how far away from the sample mean (x bar) the upper and lower confidence limits are.
Confidence Interval Formula - Continued Recall that if the sampling distribution of the mean is normally distributed, that implies the sample mean, “X-bar”, can be changed into a z-score using or since the mean of the sampling distribution of the sample mean is equal to the population mean, m, which is what we wish to estimate.
Confidence Interval Formula - Continued Hence, if our z-scores can be from –1.96 to +1.96, then we expect the sample mean to be within 1.96 standard errors of the true population mean, m. In other words, if is within 1.96 of m, then the interval from to will contain m. Otherwise the interval will not contain m.
Graphically… • …here is the confidence interval for : width
Graphically… • …the actual location of the population mean … …may be here… …or here… …or possibly even here… The population mean is a fixed but unknown quantity. Its incorrect to interpret the confidence interval estimate as a probability statement about . The interval acts as the lower and upper limits of the interval estimate of the population mean.
Four commonly used confidence levels… • Confidence Level cut & keep handy! Table 10.1
Example 10.1(P 309)… • A computer company samples demand during lead time over 25 time periods: • Its is known that the standard deviation of demand over lead time is 75 computers. We want to estimate the mean demand over lead time with 95% confidence in order to set inventory levels…
Example 10.1… • “We want to estimate the mean demand over lead time with 95% confidence in order to set inventory levels…” • Thus, the parameter to be estimated is the pop’n mean: • And so our confidence interval estimator will be: IDENTIFY
Example 10.1… CALCULATE • In order to use our confidence interval estimator, we need the following pieces of data: • therefore: • The lower and upper confidence limits are 340.76 and 399.56. Calculated from the data… Given
Using Excel… CALCULATE • By using the Data Analysis Plus™ toolset, on the Xm10-01 spreadsheet, we get the same answer with less effort… Tools > Data Analysis Plus > Z-Estimate: Mean
Example 10.1… INTERPRET • The estimation for the mean demand during lead time lies between 340.76 and 399.56 — we can use this as input in developing an inventory policy. • That is, we estimated that the mean demand during lead time falls between 340.76 and 399.56, and this type of estimator is correct 95% of the time. That also means that 5% of the time the estimator will be incorrect. • Incidentally, the media often refer to the 95% figure as “19 times out of 20,” which emphasizes the long-run aspect of the confidence level.
Interval Width… • A wide interval provides little information. • For example, suppose we estimate with 95% confidence that an accountant’s average starting salary is between $15,000 and $100,000. • Contrast this with: a 95% confidence interval estimate of starting salaries between $42,000 and $45,000. • The second estimate is much narrower, providing accounting students more precise information about starting salaries.
Interval Width… • The width of the confidence interval estimate is a function of the confidence level, the populationstandard deviation, and the sample size…
Interval Width… • The width of the confidence interval estimate is a function of the confidence level, the populationstandard deviation, and the sample size… • A larger confidence level • produces a w i d e r • confidence interval:
a/2 = 5% a/2 = 2.5% a/2 = 5% a/2 = 2.5% The Affects of Changing the Confidence Level Confidence level 90% 95% Let us increase the confidence level from 90% to 95%. Larger confidence level produces a wider confidence interval
Interval Width… • The width of the confidence interval estimate is a function of the confidence level, the populationstandard deviation, and the sample size… • Larger values of σ • produce w i d e r • confidence intervals
a/2 = .05 a/2 = .05 The Affects of s on the interval width 90% Confidence level Suppose the standard deviation has increased by 50%. To maintain a certain level of confidence, a larger standard deviation requires a larger confidence interval.
Interval Width… • The width of the confidence interval estimate is a function of the confidence level, the populationstandard deviation, and the sample size… • Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged. • Note: this also increases the cost of obtaining additional data
Confidence interval for Format Describe in words. Give the formula to be used to calculate the interval. Calculate the interval. Interpret the interval in terms of the problem. This will include the variable, units, etc. of the problem; what is estimated to be in the interval; and the amount of confidence. When asked, you should also be able to do the following: Be able to state when it is valid to use this procedure. Be able to explain what “confidence” means.
The Confidence Interval for m ( s is known) • Doll Computer Company delivers computers directly to its customers who order via the Internet. • To reduce inventory costs in its warehouses Doll employs an inventory model, that requires the estimate of the mean demand during lead time. • It is found that lead time demand is normally distributed with a standard deviation of 75 computers per lead time. • Estimate the lead time demand with 95% confidence.
The Confidence Interval for m ( s is known) • Solution: • The parameter to be estimated is m, the mean demand during lead time. • We need to compute the interval estimation for m. • From the data provided in file Xm10-01, the sample mean is • Conclusion: We are 95% confident that the true mean demand during lead time for computers is between 340 and 400 days. Since 1 - a =.95, a = .05. Thus a/2 = .025. Z.025 = 1.96
Interpretation • We are 95% confident that the true mean demand during lead time for computers is between 340 and 400 days.
Selecting the Sample Size… • We can control the width of the interval by determining the sample size necessary to produce narrow intervals. • Suppose we want to estimate the mean demand “to within 5 units”; i.e. we want to the interval estimate to be: • Since: • It follows that Solve for n to get requisite sample size!
Selecting the Sample Size… • Solving the equation… • that is, to produce a 95% confidence interval estimate of the mean (±5 units), we need to sample 865 lead time periods (vs. the 25 data points we have currently).
Sample Size to Estimate a Mean… • The general formula for the sample size needed to estimate a population mean with an interval estimate of: • Requires a sample size of at least this large:
Example 10.2 (p 322)… • A lumber company must estimate the mean diameter of trees to determine whether or not there is sufficient lumber to harvest an area of forest. They need to estimate this to within 1 inch at a confidence level of 99%. The tree diameters are normally distributed with a standard deviation of 6 inches. • How many trees need to be sampled?
1 Example 10.2… • Things we know: • Confidence level = 99%, therefore =.01 • We want , hence W=1. • We are given that = 6.
1 Example 10.2… • We compute… • That is, we will need to sample at least 239 trees to have a • 99% confidence interval of