this is a beautiful example of decision tree overfitting looks like (SALARY is the variable we're trying to predict)pic.twitter.com/r4WVJXMIHs
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
@avibryant @slendrmeans oh sure! I guess I was assuming there's no cheating
@avibryant @b0rk exactly. I actually think this is a benefit--it exposes dumb features via overfitting and how you "read" results.
@slendrmeans @b0rk do you think trees are especially prone to this vs regressions or NN etc?
@avibryant @b0rk Eg predicting engagement on a site using behaviors as features. It sounds reasonable but is rife with reverse causality.
@avibryant @b0rk I think there are subtler ways of "cheating" inadvertently. Not always easy to spot.
@avibryant @b0rk I don't think so--the problem is the features, not the algo. But I've noticed I spot the "duh" features easier in trees.
@avibryant @b0rk May just be a byproduct of them being easier to run quickly and read.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.