For example: introduce a pass that runs before llvm opt pipeline that removes all TBAA, changes all geps to int math, remove all nsw/nuw flags from int math, replace all undef's with 0, and 0-initialize all alloca's. That gets you very close to no UB.
Conversation
not really but it would reduce the amount of UB exploitation. there's a long tail scattered around many passes that you can't find so easily, requires a fine-toothed comb.
1
8
Not sure that’s really true. WebKit’s LLVM-based FTL JIT encountered no such problems to my knowledge. High probability we would have known. We even ran tests with the full -O3 pipeline. Maybe there are bugs, but I wouldn’t conflate that with UB.
2
To name one off the top of my head: You have to do something to sanitize float-to-int casts or else they become undefs if out of range. For a long time “array[x as usize]” with x: f64 could cause UB in safe Rust for this reason.
2
2
5
There's also the perpetually annoying 'a function calling itself in an infinite loop is UB'
3
3
Check out this example using a loop:
gist.github.com/thestinger/7e6
LLVM considers noreturn to be an effect, and yet it doesn't consider a function that *may* not return to have an effect. This is a bug, but an intentional one because they chose to keep an unsafe optimization around.
1
4
They properly preserve functions that are pure but not nounwind, such as a chain of them like foo(); foo(); foo(); being optimized to foo(); but never being completely removed. They are missing an attribute for 'returns' or 'halts' and yet optimize without checking anyways.
1
1
So, even though it's known that this is broken for many years, they have kept the optimization enabled. No one has been motivated to deal with implementing a 'halts' attribute and adding support for detecting / propagating it in the function attribute pass and making it required.
2
1
That's just broken, but not too surprising, since it's not uncommon for compilers to have miscompile bugs in tough and unusual corner cases. But the context of this conversation is: do you need a new compiler to fix UB?
1
The fact that llvm has bugs doesn't really change the answer. Even a compiler that was designed to avoid UB could have miscompile bugs. Also, the context of the conversation allows for making changes to llvm - so this seems like the kind of thing that could be fixed.
1
1
I guess I don't understand the context. It seems to be about C, and I don't see how you can resolve that problem for C without coming up with a model to enforce a form of memory safety. What is the scope of UB that should be avoided? You mean, for a language like Rust or Swift?
I was just responding about the infinite recursion thing, particularly because I don't believe that's actually undefined in C. It doesn't talk about a stack, let alone saying that overflowing it in undefined or that infinite recursion can just be removed. I think that's wrong.
1
I don't think features like opt-in to undefined behavior on signed or unsigned overflow (nsw, nuw) are an issue in LLVM since frontends can avoid emitting it (like Rust and Swift). It's only a major issue when there isn't a decent alternative or when UB is poorly defined/unclear.
1
Show replies
The question of whether memory unsafety implies UB is sort of at the heart of the disconnect between the C spec and C practitioners. As a practitioner (and compiler guy) I view memory unsafety as a separate thing - after all a “bad” store still stores to a well defined place.
2
1
2
Some folks cannot imagine memory safety without UB; for them I guess you can’t say that there is a C without UB. But for me, I know what that means enough that it’s an implementable thing to me.



