One other thing - when thinking about these 3 phases (ETL/features, training, deployment), resist the temptation to bikeshed modeling and expend resources on feature development & domain understanding.
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
IMO Avi covered the most important points. For experiment and provenance tracking, consider pachyderm. For managing repeatable pipelines (DAGs), Airflow is worth looking at. Training may become more converged in Spark w/ project hydrogen, but that is speculative.
- 1 more reply
New conversation -
-
-
I’m obviously biased :) but that all seems like great advice. I’d also add model serialization / deserialization vs separate services as an extension to your point about scoring / operability.
-
I’d love to hear what you’re currently using for serializing/deserializing models right now?
-
Stripe has a few things open sourced for tree models! See: https://github.com/stripe/bonsai and https://github.com/stripe/brushfire …. I assume you weren’t asking about my current gig, but contrary to all advice we run an R service

-
Thanks! That’s very neat (Not the R service
)
End of conversation
New conversation -
-
-
Fairly reasonable, however would also give a thought to model versioning and maintenance.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
So you train in TF or something like it, then need to reimplement the model in a jvm language? That doesn’t sound fun.
-
Tensorflow does have a (small) Java API that’s workable for scoring.
End of conversation
New conversation -
-
-
I'd largely agree with your sentiments Avi. I also agree that these are major infrastructure projects, more than R and D type work.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
sounds like a pretty reasonable take to me
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.