Conversation

My intuitions around lopsided samples in correlations get a lil fuzzy; If 5 out of 100k people have gone to jail, but these jailbirds make up up 50% of all people over 7'0 tall (there's 10 of them), then this is still treated with the full power of 100k, right?
22
50
Replying to
The Pearson correlation assumes that both variables are normally distributed. Minor deviations from this assumption are "mostly ok", but if you make the deviations extreme enough the Pearson correlation can be set to any value by just adding a single data point 1/n
1
2
Replying to and
This is part of the practical reason that people remove outliers, to make their data more normal (though this is hacky). There are other types of correlations that are less effected by outliers, such as Spearman's correlation. 3/n
1
3
Replying to and
The Pearson correlation is not the best thing to use of when you have binary response options, particularly if they are unbalanced. e.g. 99% of people are A and 1% are B just looks very different from a normal distribution. 4/n
1
4
Replying to and
If I wanted to test if people who had gone to jail had different heights than people who had not, I would use a t-test (which also assumes normality, but is very robust to violation of assumptions in practice), though the most valid choice is probably ANOVA. 5/5
2
4
Replying to
ty; i don't deeeeply understand/remember t-tests yet but I'm familiar with the basic concepts here. But re: my og example, what if it *is* normally distributed, just small? I happened to give a binary example, but what if it's not?