When using Metal for compute stuff am I going to be limited by the replaceRegion/getBytes calls even on a device with unified mem?
Yeah, it’s common for I/O to be the bottleneck when streaming resources, especially when the compute workload isn’t massive.
-
-
and the IO is significantly slower than something like memcpy, meaning for a simple matrix multiply CPU will always be faster?
-
Many factors, but you need to be doing enough work on enough data to overcome cost of I/O and overhead of dispatching to GPU.
-
right trying to understand those factors and what “enough work” is.
-
Feel free to email warren_moore@apple.com. I can help you with particular use cases or refer to someone who can.
-
thanks, I’m just trying to get a good understanding of where Metal would be useful in general, no current use case in mind.
End of conversation
New conversation -
-
-
I would expect shared resource copies to be proportionately faster on integrated GPUs, though.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.