The uninitialized variable anathema: non-deterministic C++

Tags

, ,

Allowing a variable with an undefined value is a terrible language failure. Especially when programs tend to work anyway. It’s a significant fault of C++ to allow this easy-to-create and hard-to-detect situation. I was recently treated to this insanity with my Leaf project. Various uninitialized structure values made my program fail on different platforms. There’s simply no need for this. Initial values should be guaranteed at all times.

The problem

Local variables, member variables and certain structure allocations result in uninitialized values. C++ has rules for uninitialized, default initialized, and zero initialized. It’s an overly complicated mess.

Given the frequency at which a variable is not initialized in C++ it seems like it’d be easy to spot. It isn’t. While the language states they are uninitialized, a great number of variables nonetheless end up with zero values. Consider the below program.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <iostream>

int main() {
    bool a;
    if( a ) {
        std::cout << "True" << std::endl;
    } else {
        std::cout << "False" << std::endl;
    }

    int b;
    std::cout << b << std::endl;
}

For me this always prints False and 0. It prints the same on ideone. Yet according to the standard it could print True and whatever integer value it wants. Both a and b are uninitialized. A tool like valgrind can point out the problem in this case.

It’s because such programs tend to work that makes the problem insidious. While developing the error may simply not show up since zero happens to be the desired initial value. A test suite is incapable of picking up this error until it is run on a different platform. In some projects I’ve actually included valgrind as part of the regular testing, but I think that is rare, and even then I didn’t make it part of the always run automated test (too many false positives).

Why zero

But why does it always tend to be zero? This is the ironic part. The OS will not give a program uninitialized memory. This is a significant security mechanism. The underlying memory on the system is a shared resource but it must be protected. Program A writes to a page, frees it, then program B happens to get allocated the same page. Program B should not be able to read what Program A wrote to that memory. To prevent this the kernel initializes all memory. On Linux it happens to do this with zeros.

There’s no reason it has to be done this way. I believe OpenBSD uses a different initial value. And apparently ArchLinux running inside VirtualBox does something different as well (this is where Leaf failed). It may not even be the OS; the program can also pick up memory that it previously had allocated. In this case nothing will re-zero this memory since it’s within the same program.

Apparently OpenBSD’s basic free/malloc will reinitialize the data on each allocation. It’s a security feature that mitigates the negative potential of buffer overflows. Curiously it might have prevented the Heartbleed defect, but OpenSSL sort of bypassed that mechanism anyway.

The solution

A language should simply not allow uninitialized variables. I’m not saying that an explicit value must alwas be given, but the language should guarantee the value. C++ should have stated that all variables are default initialized, which in turn means zero initialized for fundamentals. It should not matter how I created the variable, in which scope it resides, or whether it is a member variable.

There might be some exceptions for performance, or low-level data manipulation. These are however the outlying situations. Mostly the optimizer can handle the basic case of unused values and eliminate them. If we want a block of uninitialized memory we can always allocate it on our own. In this case I simply don’t expect the data to be initialized and thus don’t get caught in the trap.

Just for completeness a language might offer a special noinit keyword that indicates a variable should not be initialized.

I even think this should be modified in C++ now. Since the values were previously undefined, making them defined now won’t change the correctness of any existing programs. It’s entirely backwards compatible and would significantly improve the quality of C++.

About these ads
Follow

Get every new post delivered to your Inbox.

Join 237 other followers