12X faster transformer model, possible?
Yes, with Triton kernels!
We release Kernl, a lib to speedup inference of transformer models. It's very fast (sometimes SOTA), 1 LoC to use, and hackable to match most transformer architectures.
github.com/ELS-RD/kernl
🧵
Pinned Tweet
Show this thread


























