Thread about gender and personality. How separable are men and women based on their answers to the mini-IPIP-20 questionnaire? TLDR: Prediction accuracy goes from 50% -> 65% with 20 questions.pic.twitter.com/VeETziRQR3
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
Add this Tweet to your website by copying the code below. Learn more
Add this video to your website by copying the code below. Learn more
By embedding Twitter content in your website or app, you are agreeing to the Twitter Developer Agreement and Developer Policy.
| Country | Code | For customers of |
|---|---|---|
| United States | 40404 | (any) |
| Canada | 21212 | (any) |
| United Kingdom | 86444 | Vodafone, Orange, 3, O2 |
| Brazil | 40404 | Nextel, TIM |
| Haiti | 40404 | Digicel, Voila |
| Ireland | 51210 | Vodafone, O2 |
| India | 53000 | Bharti Airtel, Videocon, Reliance |
| Indonesia | 89887 | AXIS, 3, Telkomsel, Indosat, XL Axiata |
| Italy | 4880804 | Wind |
| 3424486444 | Vodafone | |
| » See SMS short codes for other countries | ||
This timeline is where you’ll spend most of your time, getting instant updates about what matters to you.
Hover over the profile pic and click the Following button to unfollow any account.
When you see a Tweet you love, tap the heart — it lets the person who wrote it know you shared the love.
The fastest way to share someone else’s Tweet with your followers is with a Retweet. Tap the icon to send it instantly.
Add your thoughts about any Tweet with a Reply. Find a topic you’re passionate about, and jump right in.
Get instant insight into what people are talking about now.
Follow more accounts to get instant updates about topics you care about.
See the latest conversations about any topic instantly.
Catch up instantly on the best stories happening as they unfold.
Thread about gender and personality. How separable are men and women based on their answers to the mini-IPIP-20 questionnaire? TLDR: Prediction accuracy goes from 50% -> 65% with 20 questions.pic.twitter.com/VeETziRQR3
Data is from myPersonality; there are ~2.5 million users who answered all 20 questions. Using Lasso, here are the paths the coefficients take for each feature as the regularization decreases:pic.twitter.com/zDZ1uLI8a0
First six questions selected by model: 1. I am the life of the party 9. I am relaxed most of the time 5. I have a vivid imagination 18. I make a mess of things 17. I am not really interested in others 4. I have frequent mood swings
As one would expect when enforcing sparcity, the model selects a question from each of the big5 before fleshing out important factors (starting with Neuroticism). The first five order is ENOCA
The dataset is 64% female so it almost always best to predict female. Therefore I weighted the male samples so that from the model's perspective a random guess is as good as guessing the mode. (When n is close to infinity equivalent to duplicating some male samples)
I then fit a logistic regression model to each set of features suggested by lasso transitions above. (27 sets total.) This yields test set accuracy of:pic.twitter.com/w2T4VRsNlx
Notice how q1 "I am the life of the party" (extroversion) and q9 "I am relaxed most of the time" (neuroticism) are doing a lot of the work. The next big jump in accuracy comes when adding q17 "I am not really interested in others" (aggreeableness)
With all features, the largest coefs from logistic regression are: q17 I am not really interested in others q4 I have frequent mood swings q7 I am not interested in other people’s problems Overall I was surprised at only 65% accuracy with 20 questions.
Personality/gender may be interesting to @KirkegaardEmil, @phl43, @sentientist
Thanks, this is interesting, I'll share it so more people can read it!
Try random forest instead of lasso. I expect there to be some strange interactions in the data that are good at detecting sex.
w/ an ExtraTreesClassifier (100 trees) I got 68.7%. Maybe a deep model can find those interactions.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.