Statistical Decision Making

Statistical Decision Making

Almost all problems in statistics can be formulated as a problem of making a decision . • That is given some data observed from some phenomena, a decision will have to be made about the phenomena

Decisions are generally broken into two types: • Estimation decisions and • Hypothesis Testing decisions.

Probability Theory plays a very important role in these decisions and the assessment of error made by these decisions

Definition: A random variable X is a numerical quantity that is determined by the outcome of a random experiment

Example: An individual is selected at random from a population and X = the weight of the individual

The probability distribution of a random variable (continuous) is describe by: its probability density curve f(x).

i.e. a curve which has the following properties : • 1. f(x) is always positive. • 2. The total are under the curve f(x) is one. • 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.

Examples of some important Univariate distributions

Normal distribution with m = 50 and s =15 Normal distribution with m = 70 and s =20 1.The Normal distribution A common probability density curve is the “Normal” density curve - symmetric and bell shaped Comment:If m = 0 and s = 1 the distribution is called the standard normal distribution

2.The Chi-squared distribution with n degrees of freedom

Comment: If z1, z2, ..., zn are independent random variables each having a standard normal distribution then U = has a chi-squared distribution with n degrees of freedom.

3. The F distribution withn1 degrees of freedom in the numerator and n2 degrees of freedom in the denominator if x  0 where K =

Comment: If U1 and U2 are independent random variables each having Chi-squared distribution with n1 and n2 degrees of freedom respectively then F = has a F distribution with n1 degrees of freedom in the numerator and n2 degrees of freedom in the denominator

4.The t distribution with n degrees of freedom where K =

Comment: If zand U are independent random variables, and z has a standard Normal distribution while U has a Chi-squared distribution with n degrees of freedom then t = has a t distribution with n degrees of freedom.

The Sampling distribution of a statistic

A random sample from a probability distribution, with density function f(x) is a collection of n independent random variables, x1, x2, ...,xn with a probability distribution described by f(x).

If for example we collect a random sample of individuals from a population and • measure some variable X for each of those individuals, • the n measurements x1, x2, ...,xn will form a set of n independent random variables with a probability distribution equivalent to the distribution of X across the population.

A statistic T is any quantity computed from the random observations x1, x2, ...,xn.

Any statistic will necessarily be also a random variable and therefore will have a probability distribution described by some probability density function fT(t). • This distribution is called the sampling distribution of the statistic T.

This distribution is very important if one is using this statistic in a statistical analysis. • It is used to assess the accuracy of a statistic if it is used as an estimator. • It is used to determine thresholds for acceptance and rejection if it is used for Hypothesis testing.

Some examples of Sampling distributions of statistics

Distribution of the sample mean for a sample from a Normal popululation Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let

Than has a normal sampling distribution with mean and standard deviation

Distribution of the z statistic Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let Then z has a standard normal distibution

Comment: Many statistics T have a normal distribution with mean mT and standard deviation sT. Then will have a standard normal distribution.

Distribution of the c2 statistic for sample variance Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let = sample variance and = sample standard deviation

Let Then c2 has chi-squared distribution with n = n-1 degrees of freedom.

The chi-squared distribution

Distribution of the t statistic Let x1, x2, ...,xn is a sample from a normal population with mean m and standard deviation s Let then t has student’s t distribution with n = n-1 degrees of freedom

Comment: If an estimator T has a normal distribution with mean mT and standard deviation sT. If sT is an estimatior of sT based on n degrees of freedom Then will have student’s t distribution with n degrees of freedom.

t distribution standard normal distribution

Point estimation • A statistic T is called an estimator of the parameter q if its value is used as an estimate of the parameter q. • The performance of an estimator T will be determined by how “close” the sampling distribution of T is to the parameter, q, being estimated.

An estimator T is called an unbiased estimator of q if mT, the mean of the sampling distribution of T satisfies mT = q. • This implies that in the long run the average value of T is q.

An estimator T is called the Minimum Variance Unbiased estimator of q if T is an unbiased estimator and it has the smallest standard error sT amongst all unbiased estimators of q. • If the sampling distribution of T is normal, the standard error of T is extremely important. It completely describes the variability of the estimator T.

Interval Estimation • Point estimators give only single values as an estimate. There is no indication of the accuracy of the estimate. • The accuracy can sometimes be measured and shown by displaying the standard error of the estimate.

There is however a better way. • Using the idea of confidence interval estimates • The unknown parameter is estimated with a range of values that have a given probability of capturing the parameter being estimated.

The interval TL to TU is called a (1 - a)  100 % confidence interval for the parameter q, if the probability that q lies in the range TL to TU is equal to 1 - a. • Here are statistics random numerical quantities calculated from the data.

Examples Confidence interval for the mean of a Normal population (based on the z statistic). is a (1 - a)  100 % confidence interval for m, the mean of a normal population. Here za/2 is the upper a/2  100 % percentage point of the standard normal distribution.

More generally if T is an unbiased estimator of the parameter q and has a normal sampling distribution with known standard error sT then is a (1 - a)  100 % confidence interval for q.

Confidence interval for the mean of a Normal population (based on the t statistic). is a (1 - a)  100 % confidence interval for m, the mean of a normal population. Here ta/2 is the upper a/2  100 % percentage point of the Student’s t distribution with n = n-1 degrees of freedom.

More generally if T is an unbiased estimator of the parameter q and has a normal sampling distribution with estmated standard error sT, based on n degrees of freedom, then is a (1 - a)  100 % confidence interval for q.

Multiple Confidence intervals In many situations one is interested in estimating not only a single parameter, q, but a collection of parameters, q1, q2, q3, ... . A collection of intervals, TL1 to TU1, TL2 to TU2, TL3 to TU3, ... are called a set of (1 - a)  100 % multiple confidence intervals if the probability that all the intervals capture their respective parameters is 1 - a

Statistical Decision Making