Close Home Forum Sign up / Log in

another stats question help please

S

HI again, keenbean, rick and fellow stats gurus
if I have a high value of adjusted r (such as .94) does it make sense to say that 94% of the variance has been accounted for when variable X1 has been found to be the most significant predictor of Y?

also when there is a problem of multicolinearity, can I say about those variables:

"Variables X3 and X4 have been excluded from the model because they have multicolinearity"

thanks satchi

R

Hi Satchi,

I am sorry, but your questions go over my head. Too complicated for me.

:-)

K

Hi Satchi...afraid you have lost me too this time. But one thing I do seem to remember is that the variance that has been accounted for is actually r squared, i.e. .94 X .94 in your case. However, I could completely be making that up, so probably best to check! With regards to multicolinearity I have no idea, sorry :( It's been quite a while since my last stats course, and I tend to run to one of the statisticians in the department when I am struggling! KB

S

hi, I forgot to type in "square" after the adjusted r square :-)
so it was the squared value...
but thanks anyway
I will have to hit the books :-)
satchi

M

Keenbean's right, it's r squared that gives you proportion of variance explained.

And yeah, that's what you say about multicolinearity, but I think you should say which variable they are multicollinear with e.g.
"Variables X3 and X4 have been excluded from the model because they have multicolinearity with variable X1"

S

thanks Melsie!
satchi

r squared gives you the proportion of variance explained by the model.

Adjusted r squared gives you the proportion of variance explained by the model - WHEN APPLYING IT TO THE WHOLE POPULATION (rather than just the sample). So if your r squared and adjusted r squared are close, it means the model predicts very closely what would happen in the population rather than just the sample you used.

I wouldn't exclude variables because of multicollinearity, just be careful about your interpretation because of potential suppression effects and stick them in a hierarchical regression if necessary.

B

Hi there,

You might find it helpful to read Pallant 2007 and Brace 2009 (or earlier versions). After telling you how to use SPSS and how to interpret the output (in a nice clear and uncomplicated way) they show you how to make sense of it all and how to write it up.

S

hi buttercup and sneaks
thanks for your replies
sneaks what do you mean by suppression effects, can u explain to me briefly
thanks a lot
satchi

if variables are highly correlated, when put into a regression they will often use up similar variance.

So variable x and variable y could be put into a regression and only variable x comes out as significant, BUT it could be that variable y would be significant, but its not because variable x is 'eating up' all the variance, because they are correlated.

So when variables are highly correlated you can sometimes miss a significant effect, because the other variable steals the variance - let me know if that doesn't make sense!

S

hi sneaks thanks a lot I understand better now.
So what do we do if two variables are highly correlated? how do we regress them then?
another question is: what if we do a two-block regression (hierarchy) and from the 2nd block, SPSS tells us that variables F,G and H are excluded. Then what after that? Is it up to us what we want to put in the block? Say if we did that two-block regression and then we found excluded variables, is that the end of the result.

thanks a lot
satchi

Hi satchi,

If it were me, I would probably play around with it - putting both highly correlated variables in and seeing what happened! - you just have to be careful about interpreting the results (as I say you might miss a potentially significant effec).

If I were you I would use a hierarchical regresion (the one with the different blocks) but use the 'entry' method - found on a drop down menu in the regression dialogue box if using SPSS. Usually you would put things you want to control for in the higher boxes, so often Block 1: Demographics (age, gender etc.) Block 2: variable(s) that the literature already says should be significant Block 3: My variables that I want to see, when controlling for e.g. age and variable x whether they are significant.

Don't use 'stepwise' methods - they are rubbish, the entry method is usually the best - see Andy Field for an explanation on the different types.

So if you have a multicollinearity issue you could put one of those variables in a separate block, after the other variable and see if it comes out significant, and then swap them around, so it comes in the block before and see if it changes - then you can establish if there is a significant effect, when controlling for the other variable.

S

hi sneaks
thanks a lot for your advice; Im going to run the regression tomorrow.
:-) satchi

R

Hi Satchi, just read the last couple of comments from Sneaks. I am not entirely sure of the nature of the variables you are 'regressing', but if they are latent variables...take a look at the Average Variance Explained procedure by Fornell and Larcker (1981). When factors are highly correlated we may sometimes argue that they are measuring the same construct/phenomenon. Obviously, if you are using observed items only, this advice = fairly academic.(up)

S

Hi rj24
Thanks for your advice. What is the title of the paper/book by fornell and Larcker?
now I'm a bit confused about the term latent variable, is latent variable only for factor analysis? or a variable for regression can also be a latent variable? I haven't done factor analysis before.

I ran the regression again, switched the blocks and the adjusted r2 is almost the same, like .65 and .63.
thanks
satchi


13810