(h) Inferential Statistics: Regression and Correlation

Regression and correlation analysis are statistical techniques used extensively in physical geography to examine causal relationships between variables. Regression and correlation measure the degree of relationship between two or more variables in two different but related ways. In regression analysis, a single dependent variable, Y, is considered to be a function of one or more independent variables, X1, X2, and so on.

The values of both the dependent and independent variables are assumed as being ascertained in an error-free random manner. Further, parametric forms of regression analysis assume that for any given value of the independent variable, values of the dependent variable are normally distributed about some mean. Application of this statistical procedure to dependent and independent variables produces an equation that "best" approximates the functional relationship between the data observations.

Correlation analysis measures the degree of association between two or more variables. Parametric methods of correlation analysis assume that for any pair or set of values taken under a given set of conditions, variation in each of the variables is random and follows a normal distribution pattern. Utilization of correlation analysis on dependent and independent variables produces a statistic called the correlation coefficient (r). The square of this statistical parameter (the coefficient of determination or r2) describes what proportion of the variation in the dependent variable is associated with the regression of an independent variable.

Analysis of variance is used to test the significance of the variation in the dependent variable that can be attributed to the regression of one or more independent variables. Employment of this statitical procedure produces a calculated F-value that is compared to a critical F-values for a particular level of statistical probability. Obtaining a significant calculated F-value indicates that the results of regression and correlation are indeed true and not the consequence of chance.

Simple Linear Regression

In a simple regression analysis, one dependent variable is examined in relation to only one independent variable. The analysis is designed to derive an equation for the line that best models the relationship between the dependent and independent variables. This equation has the mathematical form:

Y = a + bX

where, Y is the value of the dependent variable, X is the value of the independent variable, a is the intercept of the regression line on the Y axis when X = 0, and b is the slope of the regression line.

The following table contains randomly collected data on growing season precipitation and cucumber yield (Table 3h-1). It is reasonable to suggest that the amount of water received on a field during the growing season will influence the yield of cucumbers growing on it. We can use this data to illustate how regression analysis is carried out. In this table, precipitation is our independent variable and is not affected by variation in cucumber yield. However, cucumber yield is influenced by precipitation, and is therefore designated as the Y variable in the analysis.

Regression and correlation analysis are statistical techniques used extensively in physical geography to examine causal relationships between variables. Regression and correlation measure the degree of relationship between two or more variables in two different but related ways. In regression analysis, a single dependent variable, Y, is considered to be a function of one or more independent variables, X1, X2, and so on.

The values of both the dependent and independent variables are assumed as being ascertained in an error-free random manner. Further, parametric forms of regression analysis assume that for any given value of the independent variable, values of the dependent variable are normally distributed about some mean. Application of this statistical procedure to dependent and independent variables produces an equation that "best" approximates the functional relationship between the data observations.

Correlation analysis measures the degree of association between two or more variables. Parametric methods of correlation analysis assume that for any pair or set of values taken under a given set of conditions, variation in each of the variables is random and follows a normal distribution pattern. Utilization of correlation analysis on dependent and independent variables produces a statistic called the correlation coefficient (r). The square of this statistical parameter (the coefficient of determination or r2) describes what proportion of the variation in the dependent variable is associated with the regression of an independent variable.

Analysis of variance is used to test the significance of the variation in the dependent variable that can be attributed to the regression of one or more independent variables. Employment of this statitical procedure produces a calculated F-value that is compared to a critical F-values for a particular level of statistical probability. Obtaining a significant calculated F-value indicates that the results of regression and correlation are indeed true and not the consequence of chance.

Simple Linear Regression

In a simple regression analysis, one dependent variable is examined in relation to only one independent variable. The analysis is designed to derive an equation for the line that best models the relationship between the dependent and independent variables. This equation has the mathematical form:

Y = a + bX

where, Y is the value of the dependent variable, X is the value of the independent variable, a is the intercept of the regression line on the Y axis when X = 0, and b is the slope of the regression line.

The following table contains randomly collected data on growing season precipitation and cucumber yield (Table 3h-1). It is reasonable to suggest that the amount of water received on a field during the growing season will influence the yield of cucumbers growing on it. We can use this data to illustate how regression analysis is carried out. In this table, precipitation is our independent variable and is not affected by variation in cucumber yield. However, cucumber yield is influenced by precipitation, and is therefore designated as the Y variable in the analysis.

Table 3h-1: Cucumber yield vs precipitation data for 62 observations. |

Precipitation mm (X) |
Cucumbers kilograms per m ^{2} (Y) |
Precipitation mm (X) |
Cucumbers kilograms per m ^{2} (Y) |

22 |
.36 |
103 |
.74 |

6 |
.09 |
43 |
.64 |

93 |
.67 |
22 |
.50 |

62 |
.44 |
75 |
.39 |

84 |
.72 |
29 |
.30 |

14 |
.24 |
76 |
.61 |

52 |
.33 |
20 |
.29 |

69 |
.61 |
29 |
.38 |

104 |
.66 |
50 |
.53 |

100 |
.80 |
59 |
.58 |

41 |
.47 |
70 |
.62 |

85 |
.60 |
81 |
.66 |

90 |
.51 |
93 |
.69 |

27 |
.14 |
99 |
.71 |

18 |
.32 |
14 |
.14 |

48 |
.21 |
51 |
.41 |

37 |
.54 |
75 |
.66 |

67 |
.70 |
6 |
.18 |

56 |
.67 |
20 |
.21 |

31 |
.42 |
36 |
.29 |

17 |
.39 |
50 |
.56 |

7 |
.25 |
9 |
.13 |

2 |
.06 |
2 |
.10 |

53 |
.47 |
21 |
.18 |

70 |
.55 |
17 |
.17 |

6 |
.07 |
87 |
.63 |

90 |
.69 |
97 |
.66 |

46 |
.42 |
33 |
.18 |

36 |
.39 |
20 |
.06 |

14 |
.09 |
96 |
.58 |

60 |
.54 |
61 |
.42 |

**CITATION**

Pidwirny, M. (2006). Fundamentals of Physical Geography, 2nd Edition. 29/12/2011. http://www.physicalgeography.net/fundamentals/1b.html

Do you like this post? Please link back to this article by copying one of the codes below.

URL: HTML link code: BB (forum) link code: