My intuitions around lopsided samples in correlations get a lil fuzzy; If 5 out of 100k people have gone to jail, but these jailbirds make up up 50% of all people over 7'0 tall (there's 10 of them), then this is still treated with the full power of 100k, right?
Conversation
Replying to
I mean, sure, but it also means that nobody shorter than 7’ has ever gone to jail
1
Replying to
What your referring to seems like the difference between a conditional distribution (height given they went to jail) which isn’t necessarily the same as the marginal distribution (i.e. just the distribution of height). In a lot of cases Pr(X | Y) =\= Pr(X).
1
Replying to
That's just a different stratification of the data. You wouldn't have the 100k power if you are using "over 7 feet" as some sort of classification.
That gets into the "curse of dimensionality" at some point. The more you subset the data, the less power.
1
11
Replying to
The statistical analysis of survey data is fraught with issues like this. There are some excellent tutorials out there
1
Replying to
You should probably ask the Telegram group to teach you how to do simulation studies, they are great at investigating this sort of question.
5
Replying to
Question a bit vague but I think the answer is yes. I made a sample of 10000 roughly using the normal distribution with 10 people more than 7 feet and then 5 of those go to jail. These are the regression results:
1
1
Replying to
The Pearson correlation assumes that both variables are normally distributed. Minor deviations from this assumption are "mostly ok", but if you make the deviations extreme enough the Pearson correlation can be set to any value by just adding a single data point 1/n
1
2
Replying to
Depends on how you spin it. Do you see why showing your work as a scientist/researcher/journalist/anchor/correspondent is crucial to trust.
They have lost a great deal in 2 years if not 6 years. A study of 100 is not exactly conclusive for 330 million.
1










