After completion of this topic, you should be able to:
Explain the concept of the multiple regression
Justify the need to take into consideration the assumptions of multiple regression
Compute and justify the use of the multiple regression
nterpret the SPSS output for multiple regression
What is Multiple Regression?
Linear Regression is used when you want to predict the value of one variable (y) based on the information you have about another variable (x). For example, using linear regression you are able to predict performance in mathematics based on information you have about attitudes towards mathematics. In this case, you are studying the relationship between a single dependent variable and one independent variable. Multiple Regression is used to study the relationship between a single dependent variable (DV) and several independent variable (IV).
For example, you want to know the relationship between father's level of education and mother's level of education on academic performance. When using Multiple Regression, researchers use the term “independent variables” to identify those variables that they think will influence some other “dependent variable”. If these two variables are correlated, then knowing the score of one variable will allow you to predict the score of the other variable. The stronger the correlation, the closer the scores will fall to the regression line and therefore the more accurate the prediction.
Multiple regression is simply an extension of this principle, where we predict one variable on the basis of several other variables. Having more than one Independent Variable (or predictor variable) is useful when predicting human behaviour (which we do in education). Our actions, thoughts and emotions are all likely to be influenced by some combination of several factors. Using multiple regression we can test theories (or models) about precisely which set of variables is influencing our behaviour.
Conditions for Using Multiple Regression
Continuous Variable
The Dependent Variable should be a continuous variable which means it should be have score such as 1, 20, 30 or 99 (such as scores in a mathematics test, or GPA and so forth). If your Dependent Variable is categorical such 1 = low, 2 = average and 3 = high, then a different regression method called Logistic Regression should be used for categorical variable [which is not discussed here].
The independent variable should as far as possible be a continuous variable and have scores such as 1, 20, 30 or 99. However, if you do have to use categorical variable such as 1 = male and 2 = female, you have to create a dummy variable [which is not discussed here].
Large Number of Cases
Multiple regression requires a large number of cases. How many is enough? You could use this guide. You should have 40 times as many subjects as independent variables. i.e. if you intend to use 2 independent variables to predict than you should have at least 2 x 40 = 80 subjects.
Multicollinearity
To what extent are the Independent Variables and the Dependent Variable correlated? Generally, you want the Dependent Variable to be correlated with each Independent Variable. For example, you surely do not want to correlate head size with academic performance!
On the other hand, each Independent Variable (self-esteem) should not be strongly correlated with other independent variables (such as motivation, IQ). However, when dealing with human behaviour it is common for the Independent Variables to be correlated.
What do you think will happen when there is a high correlation between the different Independent Variables? Such high correlations cause problems when trying to draw inferences about the relative contribution of each independent variable to the dependent variable. Is it attitude or motivation that contributed to academic performance? [Fortunately, SPSS provides a method for checking for multicollinearity]. .