The fact that llvm has bugs doesn't really change the answer. Even a compiler that was designed to avoid UB could have miscompile bugs. Also, the context of the conversation allows for making changes to llvm - so this seems like the kind of thing that could be fixed.
Conversation
I guess I don't understand the context. It seems to be about C, and I don't see how you can resolve that problem for C without coming up with a model to enforce a form of memory safety. What is the scope of UB that should be avoided? You mean, for a language like Rust or Swift?
2
The question of whether memory unsafety implies UB is sort of at the heart of the disconnect between the C spec and C practitioners. As a practitioner (and compiler guy) I view memory unsafety as a separate thing - after all a “bad” store still stores to a well defined place.
2
1
2
There is nothing well defined about what an out-of-bounds access or use-after-free will access. The compiler, linker and even runtime environment are assuming that is never going to happen and there's nothing defined about what the consequences are going to be from the C code.
3
1
I get that llvm *does* optimizations that involve inbounds assumptions. I’m arguing that it shouldn’t. Because it doesn’t have to if it wants to generate fast and correct code. The slowdown isn’t going to be big enough to dissuade those folks who want this.
1
It's not an LLVM problem. The frontend can choose not to mark instructions with inbounds, nsw and nuw. Those are mostly theoretical issues not causing a substantial number of real world correctness, safety or security issues though.
2
One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
1
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
Tons of programs have latent memory corruption bugs and are depending on the specific way that the compiler, malloc implementation, etc. chose to lay things out in memory, what happens to be zeroed based on what they chose to do, etc. It's certainly UB with similar impacts.
2
It’s only UB because the spec is symbolic. The corruption *does something* as defined by the combination of things in memory. The binary is the definition.
1
2
No, it doesn't, because the compiler is optimizing based on the assumption that UB doesn't happen. If you write past the end of the array or to a freed object, you don't know what exactly is going to happen at runtime. C pointers are not treated as addresses by the compiler.
C pointers can trivially be treated as addressed by the compiler and llvm will happily do that for you if you use int math rather than gep. The compiler is not optimizing based on the assumption that writing past the end of an array can’t happen unless you explicitly tell it to.
1
That's not true. Not marking GEPs as inbounds means that it's permitted to index outside of the bounds of the object. It absolutely does NOT mean that it's okay to dereference the pointer outside the bounds of the object. That's still incorrect and completely undefined.
2
Show replies
(I feel like you’re trolling me at this point with obviously incorrect claims so you get a free compiler lecture on Twitter.)

