Conversation

My intuitions around lopsided samples in correlations get a lil fuzzy; If 5 out of 100k people have gone to jail, but these jailbirds make up up 50% of all people over 7'0 tall (there's 10 of them), then this is still treated with the full power of 100k, right?
22
50
Replying to
That's just a different stratification of the data. You wouldn't have the 100k power if you are using "over 7 feet" as some sort of classification. That gets into the "curse of dimensionality" at some point. The more you subset the data, the less power.
1
11
Replying to
I wouldn't say it doesn't matter. You could need more power. Also, it may not be a perfect linear relationship. You could look for a quadratic or cubic relationship as well. Or just do some sort of spline if you're looking for a better fit.