i think i know roughly how correlations work - if u plot the values, u can get a trendline (the line that minimizes distance from line to all the points). if u then take this line and calculate square root of distance from line to points... something something correlation
like, if all the points are real far from the line, its low correlation, if all the points are close to the line, its high correlation
but i don't understand the relationship of this to the slope of the line itself?
like what does it *mean* for there to be a real strong trend slope with an r of 0.02? Or weak with r of .9? is that even possible? what does this even mean about the data?
i conceive of correlations as about predictive power - given knowledge of X, how can we predict Y?
im getting confused cause i have a data finding where i had ppl select how much they liked... let's call it ice cream, on a 0-3 scale. I also asked... let's say, how much butter did you eat growing up?
and found that people liked ice cream 3/3, reported double the amount of childhood butter consumption than ppl who like ice cream at 0/3.
This is a *trendline* I'm finding here, right? a consistent increase in avg ice cream preference per degree of childhood butter consumption
i feel like i should be able to make strongish predictions off of this. If you tell me you're into ice cream 3/3, I should 2x my estimate of how much butter consumption was in your childhood, right? This *feels* like a strong thing to me.
But my actual correlation for this data is r=0.08, which is very tiny! Is it that correlations aren't supposed to tell me the dramaticness of my prediction, but rather the reliability of it? Should I 2x my estimate of butter consumption but i'm only ~1% more likely to be right?
I'm pretty sure it's this. If the slope is large but the correlation is small, it basically means that whatever factor you're looking at makes a large difference on average, but not necessarily in any particular case.
oh this phrasing made it feel more intuitive for me, like group A can score 10, and group B can score 20, which is a 2x increase, but the variance can be huge or small?
I'm not sure how to 'fit' this into thinking about data yet though
Yes. After looking up the definitions (it's been a while since I did stats): correlation is a statistical measure that expresses the extent to which two variables are linearly related. The trendline works off of the assumption that the variables *are* linearly related.
Yes! correlation is actually just a measure of covariance between two variables (it's a standardized covariance).
Averages can be wildly different from each other across the two groups but you can still have a near 0 correlation
I just simulated this in R