@KirkegaardEmil https://quid.com/feed/how-quid-uses-deep-learning-with-small-data … example of the sort of approach I mean fpr handling your surnames by regression on n-grams.
-
-
Replying to @gwern
It's first name data. :) Data here with calculated S scores. https://osf.io/xf8py/ But yes, I should give it a go. Maybe now.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil @gwern
Here's some results. http://rpubs.com/EmilOWK/228495
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil
What sort of R^2 do you get with all the n-grams? also, could use 'p.adjust' to do non-Bonferroni multiple correction.
1 reply 0 retweets 0 likes -
Replying to @gwern
Like in multiple regression? Currently have about 280 ngrams and ~1900 names, so could use OLS MR and get CV R2.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil @gwern
The p_cor is the corrected p value. glmnet not suitable for categorical predictors. Need a good function for LASSO with GLM.
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil @gwern
Huh. Quite good R2 even with OLS. R2=50%, R2-CV=29%. (10 fold CV, 5 runs)
1 reply 0 retweets 0 likes -
Replying to @KirkegaardEmil
That *is* surprising. Do even surnames, marking family history/class/genetics, achieve CV performance like 29%?
2 replies 0 retweets 0 likes
Unfortunately, I don't have surnames. However, it's possible to estimate them by scraping all the first x last name numbers.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.