It's difficult for a compiler to know if it happens though. It would need to do whole program analysis to build a potential call graph to avoid being able to insert a massive number of these performance killing intrinsics. Indirect calls also make it much harder to figure it out.
Conversation
It penalizes code without any recursion, because the compiler doesn't know that. You really just end up needing to do the same analysis LLVM should be doing yourself in the higher level compiler: an internal function attribute for always_returns and functions without it get this.
2
If a function is calling another function without a locally-visible definition, it can't be assumed to be pure anyway, no? If definitions are locally visible, potential for recursion is visible.
1
LLVM will go through the program and mark things as pure followed by bubbling that up. It has a function attribute pass responsible for bubbling up internal function attributes like noreturn, etc. It could do that with always_returns too.
1
So, LLVM is in a position to do this correctly already, in a way that's essentially ideal. Instead, they are just ignoring non-termination as an effect, and providing a painful and costly workaround for frontends that actually care about correctness.
1
They do have 'noreturn' and bubble it up, just no 'always_returns' or the opposite 'may_return' which would accomplish the same thing. Their optimizations treat 'noreturn' as an effect already too... and they aren't going to remove that, since it should be.
1
That's why in my example (gist.github.com/thestinger/139) removing the conditional return will stop the breakage from happening. LLVM considers 'noreturn' an effect but not a function that MAY be 'noreturn'. And they just come up with excuses / workarounds instead of a fix.
1
I honestly have absolutely no idea why they aren't just addressing it properly. The way they are doing it is wrong for C and C++ too, despite them portraying it and addressing it as if other languages simply need a different approach because LLVM is C centric. It's a red herring.
1
It's far from one of the most important of these issues, but I feel like it's an incredibly useful case study of how broken the development process and decision making for LLVM is. It's a nice simple, easily demonstrated example of what happens in much more complex cases.
1
Basically, they are often right when they brush aside complaints about optimizations by saying something is undefined. That's fine. However, they also use that excuse to brush aside complaints / justify things that are legitimately wrong and it actually ends up staying that way.
1
So for example even if there are official 'pointer provenance' semantics, LLVM and GCC are still just going to do things their own way without actual compliance with the standard, and will just rely on the standard being readjusted again and again until it matches one day.

