In a "goodness-of-fit" test, we determine whether a set of data (observed) fit a
particular distribution (expected)
test statistic: \( \displaystyle{\sum_{1 \le k \le n} \frac{(O_{k}-E_{k})^{2}}{E_{k}}} \)
\( n \) is the number of classes or categories in the data
\( O_{1}, O_{2}, \ldots, O_{n} \) are the observed values in each
category
\( E_{1}, E_{2}, \ldots, E_{n} \) are the expected values in each
category
the expected value in each category must be at least \(5\) to perform
this test
the distribution of the test is \( \chi^{2}_{n-1} \), with \(df=n-1\), and the
\(p\)-value is the area under the distribution and to the right of the test
statistic
the form of the null and alternative hypotheses are generally of the form
\(H_{0}:\) "the observed data fits the distribution of the expected data"
\(H_{1}:\) "the observed data does not fit the distribution of the expected data"
11.3 Test of Independence
recommended:
23-42, 86-100
Tests of independence involve contingency tables of observed data, we are
determining if the categories described by the rows and the categories described
by the columns are independent, in the probabilistic sense.
test statistic:
\( \displaystyle{\sum_{1 \le i \le m,1 \le j \le n} \frac{(O_{ij}-E_{ij})^{2}}{E_{ij}}} \)
\( m \) and \( n \) are the number of rows and columns in the contingency
table, respectively or categories in the data
\( O_{ij} \) is the observed data value in the \(i^{th}\) row and the
\(j^{th}\) column of the contingency table
\( E_{ij} \) is the observed data value formed by the total of the
\(i^{th}\) row times the total of the \(j^{th}\) column divided by the
total number of observed data values
each expected value \( E_{ij} \) must be at least \(5\) to perform
this test
the distribution of the test is \( \chi^{2}_{(m-1)(n-1)} \), with \(df=(m-1)(n-1)\),
and the \(p\)-value is the area under the distribution and to the right of the
test statistic
the form of the null and alternative hypotheses are generally of the form
\(H_{0}:\) "the two events are independent"
\(H_{1}:\) "the two events are not independent"
12.1 Linear Equations
recommended:
1-16, 57-58
A linear equation is an equation of the form \(y=a+bx\) where \(a\) and \(b\) are
constants and \(x\) and \(y\) are variables. \(x\) is called the independent
variable and \(y\) is called the dependent variable.
Linear equations model linear relationships where constant changes in one
variable result in, or are the result of, constant changes in the other variable.
The constant \(b\) is called the slope of the line and the constant
\(a\) is called the \(y\)-intercept.
the \(y\)-intercept, \(a\), is the value of \(y\) associated when \(x=0\)
the slope, \(b\), is the common ratio of the change in \(y\) values to the
change in \(x\) values between any two points \((x_{1},y_{1})\) and
\((x_{2},y_{2})\) in the relationship
12.2 Scatter Plots
recommended:
17-19, 59-63
A scatter plot is a common way to display the relationship between two variables
\(x\) and \(y\).
A scatter plot shows the direction of a relationship between variables.
A clear direction emerges when either
high values of one variable occur with high values of the other variable
or low values of one variable occur with low values of the other variable
, or
high values of one variable occur with low values of the other variable
You can judge the strength of the relationship by observing how close
the points come to resembling a line.
When we examine a scatter plot, we want to notice the overall pattern and any
deviations from the pattern.
A random pattern or a horizontal pattern indicate no relationship.