I get that llvm *does* optimizations that involve inbounds assumptions. I’m arguing that it shouldn’t. Because it doesn’t have to if it wants to generate fast and correct code. The slowdown isn’t going to be big enough to dissuade those folks who want this.
Conversation
It's not an LLVM problem. The frontend can choose not to mark instructions with inbounds, nsw and nuw. Those are mostly theoretical issues not causing a substantial number of real world correctness, safety or security issues though.
2
One of the tweets that started this thread is when I said that clang/llvm could be refactored to eliminate UB. So, when I say llvm, I mean it in the broader clang+llvm sense. Also, llvm could fix it by just ignoring/stripping nsw/nuw/etc.
1
Memory safety is certainly UB and certainly heavily impacted by optimizations. They would need to trap on memory corruption / type confusion bugs in order to get rid of undefined behavior while also still being able to heavily optimize without changing runtime behavior.
2
2
Tons of programs have latent memory corruption bugs and are depending on the specific way that the compiler, malloc implementation, etc. chose to lay things out in memory, what happens to be zeroed based on what they chose to do, etc. It's certainly UB with similar impacts.
2
You might be surprised by how many bugs are uncovered by simply doing something like having malloc zero memory immediately on free. Using a non-zero byte value will uncover even more bugs. The bugs are often relatively benign too. Most of them aren't really security bugs.
1
You’re not telling me anything I don’t know. You’re just defining UB too broadly. “Things interacting in interesting ways” is not the same thing as “compiler can do whatever it wants”.
1
I'm using the term per the standard that came up with this meaning of it. I'd never use that term on my own when designing and defining a language not tied to C or C compilers. It's C standard jargon. Compiler choices certainly matter for memory safety.
Quote Tweet
Replying to @DanielMicay @filpizlo and 4 others
Here's something that's undefined: accessing uninitialized data.
int a; if (cond1) { a = 5; } else { if (cond2) { a = 10; } } use(a); do_other_stuff(); use(a);
Lets say that cond1 & cond2. Should both calls to use(a) be guaranteed to read the same value from uninitialized data?
1
It matters how they lay things out in memory, how their register allocation / stack spilling works, how they reuse stack space (use-after-free variant with variables going out of scope) and the same goes for how malloc is implemented, etc. Many programs have these latent bugs.
1
And I don't just mean they can be triggered with edge cases but that most large C programs have latent memory corruption occurring at runtime that is mostly benign due to the specific details of how the compiler chooses to optimize and generate code, which then breaks on changes.
1
1
I don't really agree that an issue like an undefined shift is different than a program depending on the value of a read from an uninitialized variable not changing between reads or spatial / temporal memory safety bugs like a read of free memory with result unused or just logged.

