@KirkegaardEmil https://quid.com/feed/how-quid-uses-deep-learning-with-small-data … example of the sort of approach I mean fpr handling your surnames by regression on n-grams.
Huh. Quite good R2 even with OLS. R2=50%, R2-CV=29%. (10 fold CV, 5 runs)
-
-
That *is* surprising. Do even surnames, marking family history/class/genetics, achieve CV performance like 29%?
-
I guess there's some sort of very strong name-preferences->parental quality going on there. Wonder what they're preferring.
-
From my skimming, seems a lot of it is Muslim related. They tend to use a's a lot. The d value for "a" anywhere is -.50.
-
Hah! But how much of population R^2 does Muslim country descent explain in other datasets? Sets a lower bound then.
-
The other analyses are aggregated at country level, not name level. Comparable? r is .78 for Islam%.https://openpsych.net/paper/21
-
I think you would have to weight by their share of overall population. These first names are population, not group-level.
-
LASSO picked 81/281 predictors. Fit that model with OLS. R2=43, R2CV = 38%. Wow!
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.