Multi-Layer Perceptron (MLP) has become the staple of a modern AI diet. It’s everywhere: ConvNets, Transformers, RL, etc. Small MLPs are especially important for Neural Rendering.
How to make MLPs lightning fast? CUDA kung-fu mastery is all you need! github.com/NVlabs/tiny-cu 🧵
Conversation
Replying to
My colleagues developed tiny-cuda-nn, a self-contained framework written in CUDA for training and deploying "fully fused" MLPs. It is able to speed up NeRF-style research and apps dramatically. Here's an example on training a 2D rendering function: (x,y) -> (R,G,B)
2/
2
2
29
The framework is created by Müller et al. Its killer app is Instant-NGP, a neural rendering algorithm that deserves its own spotlight thread!
Paper: tom94.net/data/publicati
Github: github.com/NVlabs/tiny-cu
Welcome to follow me for more ;)
END/🧵
2
6
38
Replying to
slightly clickbaity title - this only works for small MLPs and GPUs with large shared memory
7
Replying to
Are you aware of cases of Tiny CUDA NN usage within Omniverse extensions? I would like to know if that is possible.
Replying to
It would be interesting to replace ALU with a simpler approximate calculation unit (analogue?).
And get additional chip space and faster execution with less accuracy. But that is out of a lot of other use cases.






