Compiler Correctness - I | Giacomo Sansone

This is a text I made in Summer 2024, while tweaking my own C compiler.

As I am coding my own hand-crafted compiler, I ended up with a small epiphany during this warm afternoon.

From Engineering A Compiler, Cooper & Torczon, Second Edition:

“After all, if transformations need not preserve meaning, why not replace the entire procedure with a single nop?”

It is generally accepted that using -O2 or -O3 optimization levels in compiler might be risky. I want to show you an example of this statement. Let’s take the following function f:

int f(int n) {
    for(int i = 10; i == n; i = i) {}
    return 10;
}

int main(){
    return f(10);
}

The behaviour of f is: return 10 if the input is not 10, otherwise never end.

By using -O1 (gcc 11.4.0), this is what you get:

_Z1fi:
.L2:
cmpl $10, %edi
je .L2
movl $10, %eax
ret

Which is identical to what I have described above. However, by increasing the level of optimization, you end up with:

_Z1fi:
movl $10, %eax
ret

Oh no! The compiler thought I don’t need the for computation in order to provide the return value. This is an example of Dead Code Removal. Interestingly enough, the semantics is not preserved, as you can observe by executing the program in the two cases.

I should point out that the compiler is much smarter than we are: I had to pick a weird situation in order to end up in this scenario, mainly because the behavior depends on a value (n) which is not determined at compile time. When all values are known beforehand, the results are always correct.

How should we interpret this situation? Clearly an endless loop is something that you want to avoid in 99.99% of the cases. And then, if you are aware of what you are doing, you can ask the compiler to skip some specific optimizations until you get what you need. However, it’s interesting how compiler engineers decided to handle these corner cases by ignoring the basic principle of keeping the original semantic.

Sources: 1, 2, 3.