In many programming languages, undefined behaviour is a lack of guarantees with respect to how some code might execute, i.e., it is a hole in the PL’s specification. C/C++ is ripe with UB because they focus on low-level programming. Safer, modern languages like Java or Python have well-defined behaviour, at the cost of some performance.
A basic example is using an uninitialised variable. In Golang, every type has a “default” (for
int64, that would be 0) that a variable is default-initialised to if not explicitly initialised. In C, there is no such guarantee. This is because for large data structures, Golang essentially has tomemsetit to 0 beforehand, which has a performance cost.
Oftentimes, linters have rules that can catch many forms of UB statically. In practice, UB is up to the compiler to handle, and GCC/Clang may handle each case differently.
Compiler optimisations
Another implication is that it allows compilers to make certain assumptions that allow for more powerful optimisations. For example, Rust’s compiler assumes certain memory semantics. It assumes that unsafe doesn’t violate it.
x = (x * 2) / 2will be optimised away ifxis auint, since the compiler assumesxwon’t overflow, since it’s undefined.
For example, one thing:
*x = 0;
if (x == NULL) { // optimised away, since x cannot be NULL
printf("error!");
}This can also happen if the if happens before the write. This optimisation is called time travel.
In C/C++
Data accesses:
- Using an uninitialised variable
- Out of bounds array accesses
- De-referencing random pointers (
nullptr, freed memory)
Operators:
- Signed integer overflow
- Bitshift under/overflow
Resources
- What Every C Programmer Should Know About Undefined Behavior, by Chris Lattner
- Advanced C: The UB and optimizations that trick good programmers, by Eskil Steenberg