Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
Conversation
That’s just simply not true. Memory corruption and UB are different things. Optimizations could be sound under the assumption that a write goes anywhere but compiler-private state like the stack. That would give a memory-unsafe language without UB.
1
It's true. It's how the specification defines these things and how it gets implemented. You're talking about the specification and compilers implementing that specification, so why not stick to the agreed upon definitions of terms and argue your case without playing word games.
1
1
I’m saying that the specification is wrong. Your points about compilers having to optimize based on in-bounds are a statement about a sad and easily fixable reality and I am proposing to change that reality.
1
1
No, you're misunderstanding / misinterpreting what I was saying. I was talking about a lot more than the GEP inbounds marker for pointer arithmetic. LLVM and GCC fundamentally treat pointers as more than addresses and objects as more than data laid out at an address in memory.
2
1
You can trivially make both compilers think that all pointers point into the same very large and diverse object.
1
Sure, and you lose a whole lot of the optimizations that make C efficient. Pointers and aliasing are huge blockers for optimization even with all the extensive leveraging of the aliasing / indexing guarantees which go a long way to making optimizations possible.
1
2
The most important optimizations still work in the C that I describe. I’ve tried variations. I think the only one that really hurts is strict aliasing. Without that you lose too much load elimination. But that one doesn’t produce full UB - it just creates extra load motion.
1
Type-based alias analysis (-fstrict-aliasing) is only a small portion of the overall alias analysis. The basic baseline alias analysis and reasoning about memory even without AA is extremely important for basic optimizations / code movement, and there's no switch for disabling.
2
1
Must-alias aliasing rules don’t introduce UB problems. Only the type based may-alias ones do. You can disable type based aliasing.
1
That's not accurate, and there are more rules based on provenance / aliasing used for optimization than using the type-based aliasing metadata output from the frontend. Sure, you can both use non-inbounds GEP and disable all of this by changing LLVM and it has a significant cost.
I think we already established that the provenance stuff is pointless and easy to disable.
1
I'm not talking about whether something is easy to disable in anything that I've said. I'm pointing out that it's the basis for a huge amount of the compiler optimization. They infer as much as they can by assuming memory safety to optimize well without tons of alias metadata.

