Undefined behaviour

In many programming languages, undefined behaviour is a lack of guarantees with respect to how some code might execute, i.e., it is a hole in the PL’s specification. C/C++ is ripe with UB because they focus on low-level programming. Safer, modern languages like Java or Python have well-defined behaviour, at the cost of some performance.

A basic example is using an uninitialised variable. In Golang, every type has a “default” (for int64, that would be 0) that a variable is default-initialised to if not explicitly initialised. In C, there is no such guarantee. This is because for large data structures, Golang essentially has to memset it to 0 beforehand, which has a performance cost.

Oftentimes, linters have rules that can catch many forms of UB statically. In practice, UB is up to the compiler to handle, and GCC/Clang may handle each case differently.

Compiler optimisations

Another implication is that it allows compilers to make certain assumptions that allow for more powerful optimisations. For example, Rust’s compiler assumes certain memory semantics. It assumes that unsafe doesn’t violate it.

x = (x * 2) / 2 will be optimised away if x is a uint, since the compiler assumes x won’t overflow, since it’s undefined.

For example, one thing:

*x = 0;
if (x == NULL) { // optimised away, since x cannot be NULL
	printf("error!");
}

This can also happen if the if happens before the write. This optimisation is called time travel.

In C/C++

Data accesses:

Using an uninitialised variable
Out of bounds array accesses
De-referencing random pointers (nullptr, freed memory)

Operators:

Signed integer overflow
Bitshift under/overflow

Resources

What Every C Programmer Should Know About Undefined Behavior, by Chris Lattner
- Part 1, Part 2, Part 3
Advanced C: The UB and optimizations that trick good programmers, by Eskil Steenberg

jszhn

Recent Notes

ALOHA

ARP

American literature

Assert

Atomics

Undefined behaviour

Compiler optimisations

In C/C++

Resources

Graph View

Backlinks