Particular distributions are associated with particular hypothesis tests
Tests of a Population Mean (\(z\)-tests and \(t\)-tests)
\(\bar{X} \sim N(\mu_{X},\sigma_{X}/\sqrt{n})\) - a normal
distribution when the population standard deviation is known
\(\bar{X} \sim t_{df}\) - the \(t\)-distribution when the
population standard deviation is not known and the distribution
of the sample mean is approximately normal
Tests of a Single Population Proportion
\(P' \sim N(p,\sqrt{pq/n})\) - a normal distribution when \(n\)
is large
Additional assumptions
for a \(z\)-test
you are taking a simple random sample from the population
the population is normally distributed or the sample size is
sufficiently large (\(n \ge 30\))
you know the population standard deviation
for a \(t\)-test
you are taking a simple random sample from the population
the population is normally distributed or the sample size is
sufficiently large (\(n \ge 30\))
you are using a sample standard deviation to approximate the
population standard deviation
for a test of a single population proportion
you are taking a simple random sample from the population
you meet the conditions for a binomial distribution
the shape of the binomial distribution is approximately normal,
which is ensured if \(np, nq > 5\) (or \(npq \ge 10\)) and the
sample size is less than \(5\%\) of the population
9.4 Rare Events, the Sample, Decision and Conclusion
recommended:
32-48, 73
The basics of a hypothesis test
you are making an assumption about a population
you perform a test on a random sample from the population
if the test reveals that the sample has a property that would be very
unlikely if your assumption were true, then you are forced to reject
your assumption
Using a sample to test the null hypothesis
preset a level of significance \(\alpha\), the probability of a Type I
Error
calculate the probability, called the \(p\)-value, that if the null
hypothesis is true, the results from another randomly selected sample
will be as extreme, or more extreme, as the results obtained from the
given sample
if \(p\)-value \( < \alpha\), reject the null hypothesis
"the results of the sample data are significant"
"there is sufficient evidence to conclude that the null hypothesis
is an incorrect belief and that the alternative hypothesis may be
correct"
if \(p\)-value \( \ge \alpha\), do not reject the null hypothesis
"the results of the sample data are not significant"
"there is not sufficient evidence to conclude that the alternative
hypothesis may be correct"
Note: when you "do not reject the null hypothesis" it does not mean that
the null hypothesis is "true", it means that the sample data have failed to
provided sufficient evidence that the null hypothesis is incorrect
9.5 Additional Information and Full Hypothesis Test Examples
recommended:
49-61, 74-87
The "level of significance", \(\alpha\), is the probability of a Type I Error, the
probability of rejecting the null hypothesis when the null hypothesis is true.
The level of significance must be preset, a common level of significance is
\(\alpha=0.05\).
Sketching the distribution, the value of \(\alpha\), and the \(p\)-value can
help you visualize the hypothesis test and its result.
The alternative hypothesis determines whether the hypothesis test is "left-tailed",
"right-tailed", or "two-tailed".
The alternative hypothesis never refers to equality.
Whether or not you reject the null hypothesis, \(p\)-values further from
\(\alpha\) will generate more "confidence", in the mind of the investigator, than
\(p\)-values closer to \(\alpha\).
10.1 Two Population Means with Unknown Standard Deviations
recommended:
1-30, 78-92
We can use a \(t\)-test to compare population means and perform tests on
population means, but we need to develop a test statistic for the distribution of
the difference in sample means \(\bar{X}_{1}-\bar{X}_{2}\), estimating a standard
deviation with a combination of the sample standard deviations.
Given two samples from two independent normally distributed populations.
sample means: \(\bar{x}_{1}\), \(\bar{x}_{2}\)
sample standard deviations: \(s_{1}\), \(s_{2}\)
sample sizes: \(n_{1}\), \(n_{2}\)
We define
the standard error (estimated standard deviation):
\(\displaystyle{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{1}^{2}}{n_{1}}}}\)
the test statistic:
\(t=\displaystyle{\frac{(\bar{x}_{1}-\bar{x}_{2})-(\mu_{1}-\mu_{2})}{\displaystyle{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{1}^{2}}{n_{1}}}}}}\)
degrees of freedom:
\(df=\displaystyle{\frac{\displaystyle{\left(\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}\right)^{2}}}{\displaystyle{\left(\frac{1}{n_{1}-1}\right)\left(\frac{s_{1}^{2}}{n_{1}}\right)^{2}+\left(\frac{1}{n_{2}-1}\right)\left(\frac{s_{2}^{2}}{n_{2}}\right)^{2}}}}\)