Unexpected side-effect of using __shfl on SM3.0 instead of shared mem: compiler seems to spills less. More reordering freedom?
Conversation
Replying to
ah those were happy days! Still, implementing bitonic sort with only __shfl_xor() for data movement is also pretty neat!
1
Replying to
Sad to say I know more CUDA than PS4 shader programming. I dabble in the former, and have little to do with the latter.
1
Replying to
1
Replying to
actually your office does look thoroughly awesome, do you work in the "secret treehouse"? :)


