@KirkegaardEmil https://quid.com/feed/how-quid-uses-deep-learning-with-small-data … example of the sort of approach I mean fpr handling your surnames by regression on n-grams.
From my skimming, seems a lot of it is Muslim related. They tend to use a's a lot. The d value for "a" anywhere is -.50.
-
-
Hah! But how much of population R^2 does Muslim country descent explain in other datasets? Sets a lower bound then.
-
The other analyses are aggregated at country level, not name level. Comparable? r is .78 for Islam%.https://openpsych.net/paper/21
-
I think you would have to weight by their share of overall population. These first names are population, not group-level.
-
LASSO picked 81/281 predictors. Fit that model with OLS. R2=43, R2CV = 38%. Wow!
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.