8.1 A Single Population Mean using the Normal Distribution
recommended:
1-37, 95-103
A confidence interval is one way of expressing your expectation of a population
parameter, for example population mean \( \mu \), using a statistic, such as
sample mean \( \bar{x} \), calculated from a sample.
For a first, basic example of how this works, we can use the population standard
deviation \( \sigma \), and a sample mean \( \bar{x} \), to estimate the
population mean \( \mu \), thanks to the Central Limit Theorem.
Terminology:
\( \sigma \), the population standard deviation
\( \bar{x} \), the sample mean of a sample of size \( n \),
often called a point estimate
\( \alpha \), the percent level of error that we do not wish to
exceed, expressed as a decimal
\( 1-\alpha \), what we call the confidence level,
again a percentage expressed as a decimal
\( EBM \), the Error Bound for the population Mean
Procedure:
calculate the \( z \)-value for which the area below the
standard normal distribution to the right is
\( \alpha/2 \), this also means that the area to the left is
\( 1-\alpha/2 \). This value of \( z \) is called a
critical value and is denoted \( z_{\alpha/2} \)
\( \displaystyle{EBM=z_{\alpha/2}\frac{\sigma}{\sqrt{n}}}\)
using the Central Limit Theorem
the confidence interval with confidence level \( (1-\alpha) \)
(percent) for the population mean \( \mu \) is the interval
\( (\bar{x}-EBM,\bar{x}+EBM) \)
this is often expressed as "the \( (1-\alpha) \) (percent)
confidence interval for the population mean \( \mu \)"
Interpretation:
we are not saying
"We are \( (1-\alpha) \) (percent) certain that the
population mean \( \mu \) is in this interval."
we are saying
"\( (1-\alpha) \) (percent) of the confidence intervals
constructed in this fashion should contain the
population mean \( \mu \)."
Notice that in the the expression
\( \displaystyle{EBM=z_{\alpha/2}\frac{\sigma}{\sqrt{n}}}\), \( \sigma \) is
known, and if we knew any two of the remaining three quantities, we could
determine the third
setting \( EBM \) and \( z_{\alpha/2} \) allows us to determine an
appropriate sample size \( n \)
setting \( n \) and \( z_{\alpha/2} \) allows us to determine the
\( EBM \)
setting \( EBM \) and \( n \) allows us to know the confidence level
through \( z_{\alpha/2} \)
8.2 A Single Population Mean using the Student t Distribution
recommended:
38-61, 104-116
For our second demonstration of confidence intervals, we can estimate the
population mean \( \mu \) using a sample mean \( \bar{x} \) and standard
deviation \( s \), and a special distribution known as "Student's
\(t\)-distribution".
To gain this extra flexibility of using the sample standard deviation in place
of the population standard deviation, we do need to know that our underlying
population is normally distributed. That condition was not required when we
knew the population standard deviation.
Student's \(t\)-distribution with \(n-1\) degrees of freedom is a
symmetric, bell-shaped probability distribution, created by drawing a simple
random sample of size \( n \) from a population with mean \( \mu \) and standard
deviation \( \sigma \), and calculating the \(t\)-score
\( \displaystyle{t=\frac{\bar{x}-\mu}{s/\sqrt{n}}} \).
These \(t\)-scores form Student's \(t\)-distribution with \(n-1\) degrees of
freedom, which we denoted by \(t_{n-1}\), and this distribution has these
properties:
it is a symmetric, bell-shaped distribution with mean \(0\).
it is somewhat shorter and broader than the standard normal distribution.
as \(n\) increases, the \(t\)-distribution is more and more like the
standard normal distribution.
Terminology:
\( \bar{x} \) and \( s \), a sample mean and standard deviation from
a sample of size \( n \), taken from a normally distributed population
\( \alpha \), the percent level of error that we do not wish to
exceed, expressed as a decimal
\( 1-\alpha \), what we call the confidence level,
again a percentage expressed as a decimal
\( EBM \), the Error Bound for the population Mean
Procedure:
calculate the \( t \)-value for which the area below the
\(t\)-distribution to the right is \( \alpha/2 \), this
also means that the area to the left is \( 1-\alpha/2 \). This
value of \( t \) is called a critical value and is
denoted \( z_{\alpha/2} \)
the confidence interval with confidence level \( (1-\alpha) \)
(percent) for the population mean \( \mu \) is the interval
\( (\bar{x}-EBM,\bar{x}+EBM) \)
this is often expressed as "the \( (1-\alpha) \) (percent)
confidence interval for the population mean \( \mu \)"
Interpretation:
again the interpretation of the confidence interval is
"\( (1-\alpha) \) (percent) of the confidence intervals
constructed in this fashion should contain the
population mean \( \mu \)."
For our third demonstration of confidence intervals, we will create a confidence
interval for the proportion of a population with a given characteristic.
Estimating the proportion of a population with a given characteristic is a
binomial question. Each member of the population either has the characteristic
("success") or they do not ("failure").
We will convert a random variable \(X \sim B(n,p) \), described with a binomial
distribution, to a distribution of proportions by dividing the number of
successes in a sample by the size of the sample \( n \), \( P' \sim X/n \).
Remember, for \( B(n,p) \), \( \mu=np \) and \( \sigma=\sqrt{npq}=\sqrt{np(1-p)} \),
and if \( n \) is large and \( p \) is not close to \( 0 \) or \( 1 \), we can
use the normal distribution \( N(np,\sqrt{npq}) \) to approximate the binomial
distribution, then the distribution of proportions can be represented by the
normal distribution \( N(p,\sqrt{pq/n}) \).
This normal distribution representing the distribution of proportions allows us
to use a \(z\)-score to create a \((1-\alpha)\)-confidence interval for an
unknown population proportion from a sample proportion.
Hypothesis testing is another way of expressing estimates of population
paramenters
Hypothesis testing is a procedure based on sample evidence and
probability used to test a hypothesis
make a statement regarding the nature of a population
collect evidence (sample data) to test the statement
analyze the data to assess the plausibility of the statement
since we cannot be \( 100\% \) certain, we can only say whether the data
support the statement or not
the null hypothesis, \( H_{0} \), is a statement of no change, no effect,
or no difference and is assumed to be true until eveidence indicates otherwise
the alternative hypothesis, \( H_{a} \), is a statement contradictory
to \( H_{0} \)
Common structures for null and alternative hypotheses
two-tailed tests
\( H_{0} \): (parameter)\( = \)(value)
\( H_{a} \): (parameter)\( \neq \)(value)
left-tailed tests
\( H_{0} \): (parameter)\( = \)(value)
\( H_{a} \): (parameter)\( < \)(value)
right-tailed tests
\( H_{0} \): (parameter)\( = \)(value)
\( H_{a} \): (parameter)\( > \)(value)
at least tests
\( H_{0} \): (parameter)\( \ge \)(value)
\( H_{a} \): (parameter)\( < \)(value)
at most tests
\( H_{0} \): (parameter)\( \le \)(value)
\( H_{a} \): (parameter)\( > \)(value)
stating conclusions to hypothesis testing
we either "reject \( H_{0} \)" or "do not reject \( H_{0} \)"
we are never "accepting the null hypothesis"
9.2 Outcomes and the Type I and Type II Errors
recommended:
11-20, 66-71
There are four possible outcomes when you perform a hypothesis test
when you do not reject the null hypothesis and the null hypothesis is
true, that is a correct outcome
when you reject the null hypothesis and the null hypothesis is
false, that is a correct outcome
a Type I error is when you reject the null hypothesis and the
null hypothesis is true
a Type II error is when you do not reject the null hypothesis
and the null hypothesis is false
let \( \alpha =P(\mbox{Type I error})=P(\mbox{rejecting $H_{0}$ when $H_{0}$ is true}) \)
and let
\( \beta =P(\mbox{Type II error})=P(\mbox{not rejecting $H_{0}$ when $H_{0}$ is false})\)
\( \alpha \) is also called the level of significance
the setting of the level of significance depends on the consequences of
making a Type I error
the probability of rejecting \( H_{0} \) when \( H_{0} \) is false,
\( 1-\beta \), is called the power of the test