Opens profile photo
Follow
Yuxin Wu
@ppwwyyxx
Joined June 2010

Yuxin Wu’s Tweets

Tips to avoid a >40x waste of RAM in pytorch dataloaders - an issue that many have encountered.
Quote Tweet
Ever faced a "memory leak" in your @PyTorch data loader? (I did🥲) @ppwwyyxx has an amazing blog post on demystifying RAM usage in multi-process data loaders. Check out his post: ppwwyyxx.com/blog/2022/Demy🚀
34
Tips for everyone: whenever you do ANYTHING slightly fancy in test time, always try recomputing the population statistics using correct (non-moving) average. This tweet is one example, and Sec 3.3 of arxiv.org/abs/2105.07576 lists a few more. github.com/facebookresear is a good impl
Quote Tweet
The solution: After interpolating between BatchNorm networks, we need to reset the normalization statistics (running_mean and running_var). Once this is done, the barrier is immediately even smaller than that of the LayerNorm networks used in the paper. 7/14
Show this thread
Image
1
7
Recognized by for “selfless contribution to other members of the computer vision community” — WELL DONE to , , Ilija Radosavovic, , , , Wan-Yen Lo, Piotr Dollár and Kaiming He
Quote Tweet
Congratulations to the Detectron team Ross Girshick, Yuxin Wu, Ilija Radosavovic, Alexander Kirillov, Georgia Gkioxari, Francisco Massa, Wan-Yen Lo, Piotr Dollár, & Kaiming He for winning the PAMI Everingham Prize at #ICCV2021. Learn about the platform: ai.facebook.com/tools/detectro
Embedded video
1:00
15K views
10
46
This paper gives me anxiety. BatchNorm is the most deviously subtly complex layer in deep learning. Many issues (silently) root cause to it. Yet it is ubiquitous because it works well (it multi-task helps optimization/regularization) and can be fused to affines at inference time.
Quote Tweet
Rethinking “Batch” in BatchNorm pdf: arxiv.org/pdf/2105.07576 abs: arxiv.org/abs/2105.07576
Image
20
1,015
Show this thread
The paper uses Horovod, which means the TensorFlow part of it runs in single process and does nothing about scaling. It's Horovod that scales well.
Quote Tweet
TensorFlow scales pretty well. Work done on the Summit supercomputer at Oak Ridge. 27,600 V100 GPUs, near-linear scaling. twitter.com/NandoDF/status…
4
卧槽刚才蛋疼装了个豌豆荚,尼玛居然提醒我Google Play Services有更新,这尼玛比内部人士的测试版都特么高!提醒的4个更新app都是来自N多市场的,有没有豌豆荚的人管管?
16
3
写博客之前我和 会交流主题。我会事先问他知不知道某事,如果不知道,我觉得大部分人也不了解,就决定要写。他也会事先给我做题,如果我都做出来了,他就觉得所有人都会做,就不打算写了。 // 好难过但也算派上用场了
3