Tweetovi
- Tweetovi, trenutna stranica.
- Tweetovi i odgovori
- Medijski sadržaj
Blokirali ste korisnika/cu @ofirnachum
Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @ofirnachum
-
Many of our group's recent papers - DualDICE, ValueDICE, GenDICE, AlgaeDICE - can be framed as applications of this duality. Still, lots of potential remaining applications for others to explore, and lingering questions of how these formulations interplay with stoch. opt methods.https://twitter.com/ofirnachum/status/1214938316316893184 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Fenchel-Rockafellar duality is a powerful tool that more people should be aware of, especially for RL! Straightforward applications of it enable offpolicy evaluation, offpolicy policy gradient/imitation learning, among others https://arxiv.org/abs/2001.01866
@daibond_alphaHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Code is here: https://github.com/google-research/google-research/tree/master/value_dice … Hope this will serve as a strong baseline for future work to improve on!
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://arxiv.org/abs/1912.05032 <-- my most recent paper with
@ikostrikov & J. Tompson introducing *ValueDICE* - an off-policy imitation learning alg. We set new SOTA for online imitation learning, and for the 1st time (afaik) beat behavior cloning in the totally offline regime.pic.twitter.com/YlKjCRTLd5
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Very excited about my new paper! https://arxiv.org/abs/1912.02074 We formulate the on-policy max-return RL objective w.r.t *arbitrary* offline data and without *any* explicit importance correction. Amazingly, the gradient of the objective w.r.t pi is exactly the on-policy policy gradient!pic.twitter.com/R3g7tOjxNJ
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
https://arxiv.org/abs/1911.11361 Offline RL -what do you need to know about this notoriously difficult regime? Although recent papers propose a variety of algorithmic novelties, we find many of these unnecessary in practice. Extensive studies will hopefully guide future research &practicepic.twitter.com/RKMIKdeP9l
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
[new paper] Adding a simple linear constraint to a loss function (commonly used in ML fairness) can have strange and unintuitive effects on the resulting model. While such constraints seem natural/harmless, we don't fully understand their consequences yet https://arxiv.org/abs/1910.02097
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
new paper! https://arxiv.org/abs/1909.10618 We investigate the underlying reasons for success of hierarchical RL, finding that (surprisingly) much of it is due to exploration, and that this benefit can be achieved *without* explicit hierarchies of policies
@svlevine@shaneguML@honglakleeHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
A few people had asked for this so we open-sourced our code for "Identifying and Correcting Label Bias in Machine Learning". Hope people can find it useful and build on top of it! https://arxiv.org/abs/1901.04966 https://github.com/google-research/google-research/tree/master/label_bias …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
I'm excited to announce the release of my most recent work: applying HRL to robotic "manipulation via locomotion" tasks with impressive real-world results! https://sites.google.com/view/manipulation-via-locomotion/dkitty-avoid-videos … https://arxiv.org/abs/1908.05224 w/ amazing collaborators
@GoogleAI -@Vikashplus@shaneguML and more!pic.twitter.com/NYWVUh27kLHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
New paper out! An advancement in properly estimating off-policy occupancy ratios. We apply it to off-policy policy evaluation with great results, but we believe it should be useful in many more off-policy settings! https://arxiv.org/abs/1906.04733 pic.twitter.com/nD4UaNj9T1
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Ofir Nachum proslijedio/la je Tweet
Great thanks to
@reworkdl for the honor to kickoff the Deep Reinforcement Learning Summit https://www.re-work.co/events/deep-reinforcement-learning-summit-san-francisco-2019 … with the important topic on "Secure Deep Reinforcement Learning"! Amazing talks from fellow speakers@jacobandreas@ofirnachum@jeffclune@marcgbellemare and others!pic.twitter.com/Y2EEuPjsrv
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Our submission won best paper at the RL4RL workshop at ICML! https://arxiv.org/abs/1901.10031 pic.twitter.com/o8N1Jpaj5p
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Nice work with great collaborators - we'll be presenting this at ICML! Tue Jun 11th 03:05PM @ Room 104 and Tuesday poster #108.https://twitter.com/carlesgelada/status/1136803145110216704 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Ofir Nachum proslijedio/la je Tweet
Fast & Simple Resource-Constrained Learning of Deep Network Structure
#cvpr2018 By@EladEban@ofirnachum Suppose you have a working CNN, MorphNet will adjust number of channels in each layer to satisfy memory/latency constraints https://github.com/google-research/morph-net … https://arxiv.org/abs/1711.06798 pic.twitter.com/4tLjfoN9qZ
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Open source library for "MorphNets: Fast & Simple Resource-Constrained Learning of Deep Network Structure" has been released https://github.com/google-research/morph-net …. Based on work (CVPR 2018) with
@EladEban and others at@GoogleAI (https://arxiv.org/abs/1711.06798 ).Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Great summary of our recent work about safe RL, a collaboration between researchers at Google, DeepMind, and FAIR:https://ai.facebook.com/blog/lyapunov-based-safe-reinforcement-learning …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Ofir Nachum proslijedio/la je Tweet
Hey, a really interesting and elegant paper! I wrote a summary of the main ideas, aiming to make them accessible to a broad audience. I hope you like it!https://www.lyrn.ai/2019/01/29/identifying-and-correcting-label-bias-in-machine-learning/ …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
My new paper on learning fair machine learning classifiers: https://arxiv.org/abs/1901.04966 We frame the problem as trying to learn with respect to unknown (and true) labels despite only having access to observed (and biased) labels. We find a surprisingly simple solution for doing so!
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.