Occasionally my instinct to make all buffers a nice power of two is counterproductive in random C++ code due to cache usage and way collisions, but CUDA really wants you to Align All The Things.
-
Show this thread
-
Replying to @ID_AA_Carmack
Power-of-two-sized buffers can also cause internal heap fragmentation for malloc implementations that hide allocation metadata before the buffer. A 1024-byte buffer might need 1024+N bytes, which might be rounded up to a 2048-byte memory block inside malloc.
2 replies 0 retweets 22 likes
Replying to @cpeterso @ID_AA_Carmack
jemalloc has done a lot of experimentation on optimium allocation size classes. They have a nice table: http://jemalloc.net/jemalloc.3.html#size_classes … It’s nearly powers of two, but not quite.
9:48 AM - 13 Feb 2020
0 replies
0 retweets
7 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.