So, even though it's known that this is broken for many years, they have kept the optimization enabled. No one has been motivated to deal with implementing a 'halts' attribute and adding support for detecting / propagating it in the function attribute pass and making it required.
Conversation
That's just broken, but not too surprising, since it's not uncommon for compilers to have miscompile bugs in tough and unusual corner cases. But the context of this conversation is: do you need a new compiler to fix UB?
1
The fact that llvm has bugs doesn't really change the answer. Even a compiler that was designed to avoid UB could have miscompile bugs. Also, the context of the conversation allows for making changes to llvm - so this seems like the kind of thing that could be fixed.
1
1
I guess I don't understand the context. It seems to be about C, and I don't see how you can resolve that problem for C without coming up with a model to enforce a form of memory safety. What is the scope of UB that should be avoided? You mean, for a language like Rust or Swift?
2
The question of whether memory unsafety implies UB is sort of at the heart of the disconnect between the C spec and C practitioners. As a practitioner (and compiler guy) I view memory unsafety as a separate thing - after all a “bad” store still stores to a well defined place.
2
1
2
There is nothing well defined about what an out-of-bounds access or use-after-free will access. The compiler, linker and even runtime environment are assuming that is never going to happen and there's nothing defined about what the consequences are going to be from the C code.
3
1
I get that llvm *does* optimizations that involve inbounds assumptions. I’m arguing that it shouldn’t. Because it doesn’t have to if it wants to generate fast and correct code. The slowdown isn’t going to be big enough to dissuade those folks who want this.
1
It's not an LLVM problem. The frontend can choose not to mark instructions with inbounds, nsw and nuw. Those are mostly theoretical issues not causing a substantial number of real world correctness, safety or security issues though.
2
One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
1
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
Tons of programs have latent memory corruption bugs and are depending on the specific way that the compiler, malloc implementation, etc. chose to lay things out in memory, what happens to be zeroed based on what they chose to do, etc. It's certainly UB with similar impacts.
You might be surprised by how many bugs are uncovered by simply doing something like having malloc zero memory immediately on free. Using a non-zero byte value will uncover even more bugs. The bugs are often relatively benign too. Most of them aren't really security bugs.
1
You’re not telling me anything I don’t know. You’re just defining UB too broadly. “Things interacting in interesting ways” is not the same thing as “compiler can do whatever it wants”.
1
Show replies
It’s only UB because the spec is symbolic. The corruption *does something* as defined by the combination of things in memory. The binary is the definition.
1
2
No, it doesn't, because the compiler is optimizing based on the assumption that UB doesn't happen. If you write past the end of the array or to a freed object, you don't know what exactly is going to happen at runtime. C pointers are not treated as addresses by the compiler.
2
Show replies

