(h) Inferential Statistics: Regression and Correlation Part C
1 There were 62 values of Y analyzed and therefore n = 62. The total sum of squares degrees of freedom (df) is determined as n-1 or 61. The regression of Y on X has 1 degree of freedom. The residual or unexplained degrees of freedom is determined by subtracting regression df (1) from total sum of squares df (61).
The strength of the relationship between the X and Y variables increases as the value of r approaches 1.00 and -1.00. Perfect correlation occurs if r equals either 1.00 (perfect positive) or -1.00 (perfect negative). Positive correlation coefficients indicate that an increase in the value of the X variable results in an increase in the value of the Y variable. Negative correlation coefficients indicate that an increase in the value of the X variable results in a decrease in the value of the Y variable.
CITATION
1 There were 62 values of Y analyzed and therefore n = 62. The total sum of squares degrees of freedom (df) is determined as n-1 or 61. The regression of Y on X has 1 degree of freedom. The residual or unexplained degrees of freedom is determined by subtracting regression df (1) from total sum of squares df (61).
2 MS is calculated as SS / df.
Using the Analysis of Variance procedure, the regression is tested by determining the calculated F statistic:
F = (Regression MS) / (Residual SS) = (2.1115) / (0.0112) = 188.86
To test this statistic we use a table of F to determine a critical test value for a probability of 0.01 or 1% (this relationship can occur by chance only in 1 out 100 cases) and with 1,60 degrees of freedom. According to the table the critical test value is 7.1. In this test, the relationship is deemed significant if the calculated F statistic is greater than the critical test value. This regression is statistically significant at the 0.01 level because 188.86 is greater than 7.1.
Caution must be taken when interpreting the results of regression. In our example, we found a significant relationship between precipitation and cucumber yield. However, this conclusion may not be the result of a causal relationship between the two variables. A third variable that is directly associated to both precipitation and cucumber yield may be confounding the interpretation of the analysis. Absolute verification of associations between variables can only be confirmed with experimental manipulation.
Using the Analysis of Variance procedure, the regression is tested by determining the calculated F statistic:
F = (Regression MS) / (Residual SS) = (2.1115) / (0.0112) = 188.86
To test this statistic we use a table of F to determine a critical test value for a probability of 0.01 or 1% (this relationship can occur by chance only in 1 out 100 cases) and with 1,60 degrees of freedom. According to the table the critical test value is 7.1. In this test, the relationship is deemed significant if the calculated F statistic is greater than the critical test value. This regression is statistically significant at the 0.01 level because 188.86 is greater than 7.1.
Caution must be taken when interpreting the results of regression. In our example, we found a significant relationship between precipitation and cucumber yield. However, this conclusion may not be the result of a causal relationship between the two variables. A third variable that is directly associated to both precipitation and cucumber yield may be confounding the interpretation of the analysis. Absolute verification of associations between variables can only be confirmed with experimental manipulation.
Coefficient of Determination
To measure how strong the correlation is between the two variables, we can determine the amount of the total variation in Y that is associated with the regression model. This ratio is sometimes called the coefficient of determination and is represented by the symbol r2. The value of the coefficient of determination ranges from 1.00 to 0.00. The calculated coefficient of determination from the data set above was 0.76 or 76% (as calculated below). This value suggests that 76% of the variation in Y was associated with the change seen X from the data set observations.
Coefficient of determination
Coefficient of determination
= (Regression SS) / (Total SS)
= (2.1115) / (2.7826) = 0.7588
= (2.1115) / (2.7826) = 0.7588
Correlation Coefficient
Another useful regression statistic that measures the strength of the correlation between to variables is the correlation coefficient.This statistic is often represented by the symbol "r" and is determined by taking the square-root of the coefficient of determination. The value of the correlation coefficient ranges from 1.00 to -1.00. A value of 0.0 indicates that there is absolutely no relationship between the X and Y variables.
The strength of the relationship between the X and Y variables increases as the value of r approaches 1.00 and -1.00. Perfect correlation occurs if r equals either 1.00 (perfect positive) or -1.00 (perfect negative). Positive correlation coefficients indicate that an increase in the value of the X variable results in an increase in the value of the Y variable. Negative correlation coefficients indicate that an increase in the value of the X variable results in a decrease in the value of the Y variable.
CITATION
Pidwirny, M. (2006). "Inferential Statistics: Regression and Correlation". Fundamentals of Physical Geography, 2nd Edition. 29/11/2011. http://www.physicalgeography.net/fundamentals/3h.html
Do you like this post? Please link back to this article by copying one of the codes below.
URL: HTML link code: BB (forum) link code: