If you put me in charge of objc_msgSend, I think I’d have the compiler scatter a couple of polymorphic inline caches into the binary at each call site (using indirect jumps to satisfy W^X). I think it’d reduce insn count from 11 to ~6, and it would use the BTB better.
Also I didn't see anything about BTB history in https://xania.org/201602/bpu-part-three … and follow-up posts: it seems like just a plain old address cache...
-
-
I don’t know much about the Intel side or the specific algorithms on any architecture. If I remember correctly Apple’s ARM CPU designers were once worried about the high mispredict rates they observed in objc_msgSend, but then they solved it with no software changes.
-
My understanding is that history was always good enough, and the misprediction problem was just that objc_msgSend introduced so many indirect branches that it was blowing out their cache.
- 4 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.