another stats question help please


HI again, keenbean, rick and fellow stats gurus
if I have a high value of adjusted r (such as .94) does it make sense to say that 94% of the variance has been accounted for when variable X1 has been found to be the most significant predictor of Y?

also when there is a problem of multicolinearity, can I say about those variables:

"Variables X3 and X4 have been excluded from the model because they have multicolinearity"

thanks satchi


Hi Satchi,

I am sorry, but your questions go over my head. Too complicated for me.



Hi Satchi...afraid you have lost me too this time. But one thing I do seem to remember is that the variance that has been accounted for is actually r squared, i.e. .94 X .94 in your case. However, I could completely be making that up, so probably best to check! With regards to multicolinearity I have no idea, sorry :( It's been quite a while since my last stats course, and I tend to run to one of the statisticians in the department when I am struggling! KB


hi, I forgot to type in "square" after the adjusted r square :-)
so it was the squared value...
but thanks anyway
I will have to hit the books :-)


Keenbean's right, it's r squared that gives you proportion of variance explained.

And yeah, that's what you say about multicolinearity, but I think you should say which variable they are multicollinear with e.g.
"Variables X3 and X4 have been excluded from the model because they have multicolinearity with variable X1"


thanks Melsie!

Avatar for sneaks

r squared gives you the proportion of variance explained by the model.

Adjusted r squared gives you the proportion of variance explained by the model - WHEN APPLYING IT TO THE WHOLE POPULATION (rather than just the sample). So if your r squared and adjusted r squared are close, it means the model predicts very closely what would happen in the population rather than just the sample you used.

I wouldn't exclude variables because of multicollinearity, just be careful about your interpretation because of potential suppression effects and stick them in a hierarchical regression if necessary.


Hi there,

You might find it helpful to read Pallant 2007 and Brace 2009 (or earlier versions). After telling you how to use SPSS and how to interpret the output (in a nice clear and uncomplicated way) they show you how to make sense of it all and how to write it up.


hi buttercup and sneaks
thanks for your replies
sneaks what do you mean by suppression effects, can u explain to me briefly
thanks a lot

Avatar for sneaks

if variables are highly correlated, when put into a regression they will often use up similar variance.

So variable x and variable y could be put into a regression and only variable x comes out as significant, BUT it could be that variable y would be significant, but its not because variable x is 'eating up' all the variance, because they are correlated.

So when variables are highly correlated you can sometimes miss a significant effect, because the other variable steals the variance - let me know if that doesn't make sense!


hi sneaks thanks a lot I understand better now.
So what do we do if two variables are highly correlated? how do we regress them then?
another question is: what if we do a two-block regression (hierarchy) and from the 2nd block, SPSS tells us that variables F,G and H are excluded. Then what after that? Is it up to us what we want to put in the block? Say if we did that two-block regression and then we found excluded variables, is that the end of the result.

thanks a lot

Avatar for sneaks

Hi satchi,

If it were me, I would probably play around with it - putting both highly correlated variables in and seeing what happened! - you just have to be careful about interpreting the results (as I say you might miss a potentially significant effec).

If I were you I would use a hierarchical regresion (the one with the different blocks) but use the 'entry' method - found on a drop down menu in the regression dialogue box if using SPSS. Usually you would put things you want to control for in the higher boxes, so often Block 1: Demographics (age, gender etc.) Block 2: variable(s) that the literature already says should be significant Block 3: My variables that I want to see, when controlling for e.g. age and variable x whether they are significant.

Don't use 'stepwise' methods - they are rubbish, the entry method is usually the best - see Andy Field for an explanation on the different types.

So if you have a multicollinearity issue you could put one of those variables in a separate block, after the other variable and see if it comes out significant, and then swap them around, so it comes in the block before and see if it changes - then you can establish if there is a significant effect, when controlling for the other variable.


hi sneaks
thanks a lot for your advice; Im going to run the regression tomorrow.
:-) satchi


Hi Satchi, just read the last couple of comments from Sneaks. I am not entirely sure of the nature of the variables you are 'regressing', but if they are latent variables...take a look at the Average Variance Explained procedure by Fornell and Larcker (1981). When factors are highly correlated we may sometimes argue that they are measuring the same construct/phenomenon. Obviously, if you are using observed items only, this advice = fairly academic.(up)


Hi rj24
Thanks for your advice. What is the title of the paper/book by fornell and Larcker?
now I'm a bit confused about the term latent variable, is latent variable only for factor analysis? or a variable for regression can also be a latent variable? I haven't done factor analysis before.

I ran the regression again, switched the blocks and the adjusted r2 is almost the same, like .65 and .63.