I get that llvm *does* optimizations that involve inbounds assumptions. I’m arguing that it shouldn’t. Because it doesn’t have to if it wants to generate fast and correct code. The slowdown isn’t going to be big enough to dissuade those folks who want this.
Conversation
It's not an LLVM problem. The frontend can choose not to mark instructions with inbounds, nsw and nuw. Those are mostly theoretical issues not causing a substantial number of real world correctness, safety or security issues though.
2
One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
1
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
Tons of programs have latent memory corruption bugs and are depending on the specific way that the compiler, malloc implementation, etc. chose to lay things out in memory, what happens to be zeroed based on what they chose to do, etc. It's certainly UB with similar impacts.
2
It’s only UB because the spec is symbolic. The corruption *does something* as defined by the combination of things in memory. The binary is the definition.
1
2
No, it doesn't, because the compiler is optimizing based on the assumption that UB doesn't happen. If you write past the end of the array or to a freed object, you don't know what exactly is going to happen at runtime. C pointers are not treated as addresses by the compiler.
2
C pointers can trivially be treated as addressed by the compiler and llvm will happily do that for you if you use int math rather than gep. The compiler is not optimizing based on the assumption that writing past the end of an array can’t happen unless you explicitly tell it to.
1
That's not true. Not marking GEPs as inbounds means that it's permitted to index outside of the bounds of the object. It absolutely does NOT mean that it's okay to dereference the pointer outside the bounds of the object. That's still incorrect and completely undefined.
2
There is no way to disable that from being undefined either. There is no opt-out and no switch to pass to make it well defined. There's a concept of pointer provenance used in GCC/LLVM where pointers must be based on object that they are being used to access. It's how it works.
2
Passing -fno-strict-aliasing to disable outputting type-based alias metadata does not mean there is no alias analysis. It means there is no type-based alias analysis. There's still object-based alias analysis, and other optimizations not just treating pointers as addresses.

