In many programming languages, undefined behaviour is a lack of guarantees with respect to how some code might execute, i.e., it is a hole in the PL’s specification. C/C++ is ripe with UB because they focus on low-level programming. Safer, modern languages like Java or Python have well-defined behaviour, at the cost of some performance.

A basic example is using an uninitialised variable. In Golang, every type has a “default” (for int64, that would be 0) that a variable is default-initialised to if not explicitly initialised. In C, there is no such guarantee. This is because for large data structures, Golang essentially has to memset it to 0 beforehand, which has a performance cost.

Oftentimes, linters have rules that can catch many forms of UB statically. In practice, UB is up to the compiler to handle, and GCC/Clang may handle each case differently.

Compiler optimisations

Another implication is that it allows compilers to make certain assumptions that allow for more powerful optimisations. For example, Rust’s compiler assumes certain memory semantics. It assumes that unsafe doesn’t violate it.

x = (x * 2) / 2 will be optimised away if x is a uint, since the compiler assumes x won’t overflow, since it’s undefined.

For example, one thing:

*x = 0;
if (x == NULL) { // optimised away, since x cannot be NULL
	printf("error!");
}

This can also happen if the if happens before the write. This optimisation is called time travel.

In C/C++

Data accesses:

  • Using an uninitialised variable
  • Out of bounds array accesses
  • De-referencing random pointers (nullptr, freed memory)

Operators:

  • Signed integer overflow
  • Bitshift under/overflow

Resources