Specifically, a call to objc_msgSend would have this fast path: cbz self,cache_miss ; nil check ldp type/targetcache ; compiler generates a slot ldr self->isa cmp typecache,isa bne cache_miss br targetcache
This is especially easy to do on ARM because you have a link register. Anyway, I think you could get the same effect on x86 w/o increasing mem usage by mmap-ing in objc_msgSend at different addresses and having the compiler emit calls to a random one at each site…VIPT abuse ;)
-
-
The lower tech trick is to split up lookup and dispatch, so the branch itself always has a unique address. Even that had marginal or inconsistent benefit on x86 in tests from what I hear
-
Huh, surprising that that isn’t a significant improvement. It hits the C++-like paths that Intel optimizes for…
End of conversation
New conversation -
-
-
Matt Godbolt’s tests showed that Haswell and Ivy Bridge have a 4096-entry 4-way BTB, so there’s your hard cap on how many indirect branches you can do before you start getting punished.
-
And note that there are other indirect branches in play, such as when calling objc_msgSend itself.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.