One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
Conversation
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
That’s just simply not true. Memory corruption and UB are different things. Optimizations could be sound under the assumption that a write goes anywhere but compiler-private state like the stack. That would give a memory-unsafe language without UB.
1
It's true. It's how the specification defines these things and how it gets implemented. You're talking about the specification and compilers implementing that specification, so why not stick to the agreed upon definitions of terms and argue your case without playing word games.
1
1
I’m saying that the specification is wrong. Your points about compilers having to optimize based on in-bounds are a statement about a sad and easily fixable reality and I am proposing to change that reality.
1
1
No, you're misunderstanding / misinterpreting what I was saying. I was talking about a lot more than the GEP inbounds marker for pointer arithmetic. LLVM and GCC fundamentally treat pointers as more than addresses and objects as more than data laid out at an address in memory.
2
1
You can trivially make both compilers think that all pointers point into the same very large and diverse object.
1
Sure, and you lose a whole lot of the optimizations that make C efficient. Pointers and aliasing are huge blockers for optimization even with all the extensive leveraging of the aliasing / indexing guarantees which go a long way to making optimizations possible.
1
2
The most important optimizations still work in the C that I describe. I’ve tried variations. I think the only one that really hurts is strict aliasing. Without that you lose too much load elimination. But that one doesn’t produce full UB - it just creates extra load motion.
1
Type-based alias analysis (-fstrict-aliasing) is only a small portion of the overall alias analysis. The basic baseline alias analysis and reasoning about memory even without AA is extremely important for basic optimizations / code movement, and there's no switch for disabling.
2
1
You're saying that you want that all entirely disabled, with pointers treated as addresses. That's asking for a whole lot more than -fno-strict-aliasing and not marking pointer arithmetic inbounds (and the latter definitely has a huge impact on important loop optimizations, etc).
I'm certainly not saying that it can't be done or that it isn't useful but that you're understating how much needs to be changed and the impact of it. Memory corruption also doesn't become predictable, just *less* impacted by optimization. It's still always going to be impacted.
1
1
Here's something that's undefined: accessing uninitialized data.
int a; if (cond1) { a = 5; } else { if (cond2) { a = 10; } } use(a); do_other_stuff(); use(a);
Lets say that cond1 & cond2. Should both calls to use(a) be guaranteed to read the same value from uninitialized data?
2
1
2
Show replies

