you can go a long way with inline caches, see ObjC
-
-
Replying to @whitequark @mcclure111
Hm, the approaches I know of to add inline caches or method caches to ObjC all ended up getting rolled back because they didn’t make things faster enough to justify the code size and memory cost. ObjC is weird though because when you care about perf you just…don’t use objects
2 replies 0 retweets 5 likes -
I think the biggest benefit isn’t so much ICs but speculative devirtualization leading to opportunistic inlining, which is really tough without a heat profile of the code (PGO or JIT)
1 reply 0 retweets 6 likes -
Yeah that’s pretty much exactly the problem with ObjC. PGO is of course hard to do well, and people generally manually optimize out the dispatch in known hot paths anyway
1 reply 0 retweets 1 like -
Like, modern CPU’s BTB will mostly do a good job with virtual call overhead. (objc_msgSend is weird but I’m guessing it’s keyed off {pc,lr} on Apple’s chips or something to deal with that.) But inlining, that obviously opens up arbitrarily many optimizations.
3 replies 0 retweets 6 likes -
Even Intel’s branch predictors do a fine job with objc_msgSend these days. But since msgSends tend to fall along what would have been ABI boundaries across dylibs, often the opportunity for IPO isn’t there anyway
2 replies 0 retweets 2 likes -
I trust you, but I don’t understand how that works on Intel unless you’re calling the same method over and over… RIP is always the same, should always miss?
3 replies 0 retweets 0 likes -
One thing to remember: objc_msgSend does a tail call (ie indirect jump) not an indirect call like vtable dispatches. The return address predictor is useful here.
1 reply 1 retweet 9 likes -
Replying to @clattner_llvm @jckarter and
Aha, maybe that's what's going on. Might not even need a fancy BTB
1 reply 0 retweets 0 likes -
Replying to @pcwalton @clattner_llvm and
Well the indirect branch predictor is also good enough, on Haswell and later at least, that changing the compiler to generate separate method lookup and call branches did not improve performance at all
1 reply 0 retweets 0 likes
Really interesting. I'd love to know exactly how that works at the hardware level... everything is guesswork with Intel
Related, Agner says AMD is now using perceptrons for branch prediction in Ryzen 
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.