Unexpected side-effect of using __shfl on SM3.0 instead of shared mem: compiler seems to spills less. More reordering freedom?
Conversation
Replying to
ah those were happy days! Still, implementing bitonic sort with only __shfl_xor() for data movement is also pretty neat!
1
Show replies

