What sort of R^2 do you get with all the n-grams? also, could use 'p.adjust' to do non-Bonferroni multiple correction.
-
-
Replying to @gwern
Like in multiple regression? Currently have about 280 ngrams and ~1900 names, so could use OLS MR and get CV R2.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil @gwern
The p_cor is the corrected p value. glmnet not suitable for categorical predictors. Need a good function for LASSO with GLM.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil @gwern
Huh. Quite good R2 even with OLS. R2=50%, R2-CV=29%. (10 fold CV, 5 runs)
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil
That *is* surprising. Do even surnames, marking family history/class/genetics, achieve CV performance like 29%?
2 replies 0 retweets 0 likes -
Replying to @gwern @KirkegaardEmil
I guess there's some sort of very strong name-preferences->parental quality going on there. Wonder what they're preferring.
1 reply 0 retweets 0 likes -
Replying to @gwern
From my skimming, seems a lot of it is Muslim related. They tend to use a's a lot. The d value for "a" anywhere is -.50.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil
Hah! But how much of population R^2 does Muslim country descent explain in other datasets? Sets a lower bound then.
1 reply 0 retweets 0 likes -
Replying to @gwern
The other analyses are aggregated at country level, not name level. Comparable? r is .78 for Islam%.https://openpsych.net/paper/21
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil
I think you would have to weight by their share of overall population. These first names are population, not group-level.
1 reply 0 retweets 0 likes
LASSO picked 81/281 predictors. Fit that model with OLS. R2=43, R2CV = 38%. Wow!
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.