That's just broken, but not too surprising, since it's not uncommon for compilers to have miscompile bugs in tough and unusual corner cases. But the context of this conversation is: do you need a new compiler to fix UB?
Conversation
The fact that llvm has bugs doesn't really change the answer. Even a compiler that was designed to avoid UB could have miscompile bugs. Also, the context of the conversation allows for making changes to llvm - so this seems like the kind of thing that could be fixed.
1
1
I guess I don't understand the context. It seems to be about C, and I don't see how you can resolve that problem for C without coming up with a model to enforce a form of memory safety. What is the scope of UB that should be avoided? You mean, for a language like Rust or Swift?
2
The question of whether memory unsafety implies UB is sort of at the heart of the disconnect between the C spec and C practitioners. As a practitioner (and compiler guy) I view memory unsafety as a separate thing - after all a “bad” store still stores to a well defined place.
2
1
2
There is nothing well defined about what an out-of-bounds access or use-after-free will access. The compiler, linker and even runtime environment are assuming that is never going to happen and there's nothing defined about what the consequences are going to be from the C code.
3
1
I get that llvm *does* optimizations that involve inbounds assumptions. I’m arguing that it shouldn’t. Because it doesn’t have to if it wants to generate fast and correct code. The slowdown isn’t going to be big enough to dissuade those folks who want this.
1
It's not an LLVM problem. The frontend can choose not to mark instructions with inbounds, nsw and nuw. Those are mostly theoretical issues not causing a substantial number of real world correctness, safety or security issues though.
2
One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
1
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
That’s just simply not true. Memory corruption and UB are different things. Optimizations could be sound under the assumption that a write goes anywhere but compiler-private state like the stack. That would give a memory-unsafe language without UB.
1
It's true. It's how the specification defines these things and how it gets implemented. You're talking about the specification and compilers implementing that specification, so why not stick to the agreed upon definitions of terms and argue your case without playing word games.
I’m saying that the specification is wrong. Your points about compilers having to optimize based on in-bounds are a statement about a sad and easily fixable reality and I am proposing to change that reality.
1
1
No, you're misunderstanding / misinterpreting what I was saying. I was talking about a lot more than the GEP inbounds marker for pointer arithmetic. LLVM and GCC fundamentally treat pointers as more than addresses and objects as more than data laid out at an address in memory.
2
1
Show replies

