Dave's Web Corner

semesters > winter 2023 > mth208 > week 14

MTH-208 Week 14 (April 9-15)

Outline
Assessments
Handouts
Videos
Technology

semesters > winter 2023 > mth208 > week 14 > outline

Outline

12.3 The Regression Equation

recommended: 20-27, 64-66
Given a set of points \(\{(x_{1},y_{1}), (x_{2},y_{2}), \ldots, (x_{n},y_{n})\}\) and a line \(\hat{y}=a+bx\), we calculate
- \(\bar{x}\) and \(s_{x}\) for \(\{x_{1}, x_{2}, \ldots, x_{n}\}\), \(\bar{y}\) and \(s_{y}\) for \(\{y_{1}, y_{2}, \ldots, y_{n}\}\)
- expected values: \(\hat{y}_{i}=a+bx_{i}\), for each \(x_{i}\), \(1 \le i \le n\)
- the \(i^{th}\) residual (\(i^{th}\) error): \(\epsilon_{i}=y_{i}-\hat{y}_{i}\), \(1 \le i \le n\)
- the Sum of the Squared Errors (\(SSE\)): \(\sum_{i=1}^{n} \epsilon_{i}^{2}\)
The values of \(a\) and \(b\) that minimizes the \(SSE\) are
- \(\displaystyle{b=\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\) and \(a=\bar{y}-b\bar{x}\)
The line \(\hat{y}_{i}=a+bx_{i}\) with this \(a\) and \(b\), which minimizes the Sum of the Squared Residuals is called the line of best fit or the least-squares regression line.
The measure of how well the least-squares regression line fits the set of points is the correlation coefficient, \(r\)
- \(\displaystyle{ r= \frac {n\left(\sum_{i=1}^{n}x_{i}y_{i}\right)-\left(\sum_{i=1}^{n}x_{i}\right)\left(\sum_{i=1}^{n}y_{i}\right)} {\sqrt{\left(n\left(\sum_{i=1}^{n}x_{i}^{2}\right)-\left(\sum_{i=1}^{n}x_{i}\right)^{2}\right)\left(n\left(\sum_{i=1}^{n}y_{i}^{2}\right)-\left(\sum_{i=1}^{n}y_{i}\right)^{2}\right)}} }\)
Correlation Coefficient facts:
- \(-1 \le r \le 1\) (\(|r| \le 1\))
- values of \(r\) closer to \(-1\) or \(1\) indicate a stronger linear relationship (called correlation)
- \(r=0\) would indicate no linear relationship
- \(r=1\) or \(r=-1\) would indicate that all data points lie on the same line
- \(r<0\) or \(r>0\) indicate that the least-squares regression line has a negative or positive slope, respectively
The square of the correlation coefficient, \(r^{2}\), is called the determination coefficient
- \(r^{2}\) (expressed as a percentage or a proportion) represents that portion of variation in the dependent variable, \(y\), that is due to variation in the independent variable, \(x\)
- \(1-r^{2}\) (expressed as a percentage or a proportion) represents that portion of variation in the dependent variable, \(y\), that is not due to variation in the independent variable, \(x\)

12.4 Testing the Significance of the Correlation Coefficient

recommended: 28-30
We can perform a hypothesis test of the significance of the correlation coefficient, \(r\), to decide whether the linear relationship in the sample data is strong enough to model the relationship in the population.
let \(\rho\) be the (unknown) correlation coefficient of population and set a two-tailed test (\(t\) test) with level of significance \(\alpha=0.05\)
- \(H_{0}: \rho=0\)
- \(H_{1}: \rho \neq 0\)
- test statistic (\(t\)-score) \(\displaystyle{\frac{r\sqrt{n-2}}{\sqrt{1-r^{2}}}}\)
- \(df=n-2\)
If we "reject \(H_{0}\)", then we sah that "\(r\) is significant".
\(r\) is significant and the scatterplot shows a linear trend, then the least-squares regression line can be used to predict the values of \(y\) for values of \(x\) that lie within the observed values of \(x\), otherwise the least-squares regression line should not be used to predict values of \(y\)
the least-squares regression line may not be appropriate or reliable for prediction outside the observed values of \(x\), even if \(r\) si significant and the scatterplot shows a linear trend
Assumptions in testing the significance of the correlation coefficient:
- there is a linear relationship in the population that models the average value of \(y\) in terms of the value of \(x\)
- the \(y\)-values for any given \(x\) value are normally distributed about the value of the least-squares regression line at \(x\)
- the standard deviation of these distributions are equal for each value of \(x\)
- the residual errors are mutually independent
- the data come form a random sample or well-designed randomized experiment

12.5 Prediction

recommended: 31-50, 67-71
If we decide that the correlation coefficient is significant, we can make predictions with the least-squares regression line.
Making predictions inside the observed values of \(x\) is called interpolation, outside the observed values of \(x\) is called extrapolation

12.6 Outliers

recommended: 51-56, 72-77
In linear regression, outliers are observed data points that are far from the least-squares regression line. ``Far'' in this context means more than two standard deviations from the least-squares regression line.
Influential points are observed data points that are far from the other observed data points in the horizontal directions. These data points may have a large effect on the slope of the regression line.
You can test to see if a point may be an influential point by removing it from the data set and seeing if the slope of the least-squares regression line is changed significantly.
To check for potential outliers, we use the standard deviation of the residuals, with \(n-2\) degrees of freedom.
- \( \displaystyle{s=\sqrt{\frac{SSE}{n-2}} \left( =\sqrt{\frac{\sum_{i=1}^{n}\epsilon_{i}^{2}}{n-2}} =\sqrt{\frac{\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}}{n-2}} \right) } \)
If the potential outlier reflects an error in the data, then we can either correct the error or remove the potential outlier. If the potential outlier is correct, then it remains in the data.
Either way, a researcher should document the inquiry, their findings, and the outcome, as part of the record.

semesters > winter 2023 > mth208 > week 14 > assessments

Assessments

Calendar of all assessments and deadlines (XYZ and written homework, exams)
Homework
- XYZ Homework
  - Completed at https://www.xyzhomework.com
- Written Homework
  - Submitted Mondays and Wednesdays by 11:59pm as a single PDF file to our MTH-208W Assignments folder.
Exams
- Exam 3: in class Wednesday, April 26
  - Exam 3 (solutions)
  - Exam 3 covers chapters 11-13
Schedule
- Week 14
  - Written Homework 21, due Monday, April 10
  - Written Homework 22, due Wednesday, April 12
  - XYZ Homework Sections 12.3-12.6, due Saturday, April 15
- Week 15
  - Written Homework 23, due Monday, April 17
  - Written Homework 24, due Wednesday, April 19
  - XYZ Homework Sections 12.3-12.6, due Saturday, April 22
- Week 16
  - Written Homework 25, due Monday, March 24
  - Exam 3, Wednesday, April 26

semesters > winter 2023 > mth208 > week 14 > handouts

Handouts

Announcements
- Schmidt Award
- Math Graduate Award
- Honors Program introduction
- Tutoring Center services
- MTH-208W Study Groups
Graph Paper is a folder containing a variery of graph paper
Formulas
- Formulas and Functions
Hypothesis Tests
- Graphing Distributions to Illustrate Hypothesis Tests
Linear Regression
- Linear Regression Introduction

semesters > winter 2023 > mth208 > week 14 > videos

Videos

Playlists
- MTH-208W Recommended Problems
- MTH-208W Topics
Recommended Problems
- Chapter 12
Topics
- Estimating a Single Population Mean

semesters > winter 2023 > mth208 > week 14 > technology

MTH-208 Week 14 (April 9-15)

Outline

12.3 The Regression Equation

12.4 Testing the Significance of the Correlation Coefficient

12.5 Prediction

12.6 Outliers

Assessments

Handouts

Videos

Technology