Defective Language

The uninitialized variable anathema: non-deterministic C++

Allowing a variable with an undefined value is a terrible language failure. Especially when programs tend to work anyway. It’s a significant fault of C++ to allow this easy-to-create and hard-to-detect situation. I was recently treated to this insanity with my Leaf project. Various uninitialized structure values made my program fail on different platforms. There’s simply no need for this. Initial values should be guaranteed at all times.

The problem

Local variables, member variables and certain structure allocations result in uninitialized values. C++ has rules for uninitialized, default initialized, and zero initialized. It’s an overly complicated mess.

Given the frequency at which a variable is not initialized in C++ it seems like it’d be easy to spot. It isn’t. While the language states they are uninitialized, a great number of variables nonetheless end up with zero values. Consider the below program.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <iostream>

int main() {
    bool a;
    if( a ) {
        std::cout << "True" << std::endl;
    } else {
        std::cout << "False" << std::endl;
    }

    int b;
    std::cout << b << std::endl;
}

For me this always prints False and 0. It prints the same on ideone. Yet according to the standard it could print True and whatever integer value it wants. Both a and b are uninitialized. A tool like valgrind can point out the problem in this case.

It’s because such programs tend to work that makes the problem insidious. While developing the error may simply not show up since zero happens to be the desired initial value. A test suite is incapable of picking up this error until it is run on a different platform. In some projects I’ve actually included valgrind as part of the regular testing, but I think that is rare, and even then I didn’t make it part of the always run automated test (too many false positives).

Why zero

But why does it always tend to be zero? This is the ironic part. The OS will not give a program uninitialized memory. This is a significant security mechanism. The underlying memory on the system is a shared resource but it must be protected. Program A writes to a page, frees it, then program B happens to get allocated the same page. Program B should not be able to read what Program A wrote to that memory. To prevent this the kernel initializes all memory. On Linux it happens to do this with zeros.

There’s no reason it has to be done this way. I believe OpenBSD uses a different initial value. And apparently ArchLinux running inside VirtualBox does something different as well (this is where Leaf failed). It may not even be the OS; the program can also pick up memory that it previously had allocated. In this case nothing will re-zero this memory since it’s within the same program.

Apparently OpenBSD’s basic free/malloc will reinitialize the data on each allocation. It’s a security feature that mitigates the negative potential of buffer overflows. Curiously it might have prevented the Heartbleed defect, but OpenSSL sort of bypassed that mechanism anyway.

The solution

A language should simply not allow uninitialized variables. I’m not saying that an explicit value must alwas be given, but the language should guarantee the value. C++ should have stated that all variables are default initialized, which in turn means zero initialized for fundamentals. It should not matter how I created the variable, in which scope it resides, or whether it is a member variable.

There might be some exceptions for performance, or low-level data manipulation. These are however the outlying situations. Mostly the optimizer can handle the basic case of unused values and eliminate them. If we want a block of uninitialized memory we can always allocate it on our own. In this case I simply don’t expect the data to be initialized and thus don’t get caught in the trap.

Just for completeness a language might offer a special noinit keyword that indicates a variable should not be initialized.

I even think this should be modified in C++ now. Since the values were previously undefined, making them defined now won’t change the correctness of any existing programs. It’s entirely backwards compatible and would significantly improve the quality of C++.

15 replies »

  1. I agree. This has brought many crashes to my programs when porting to other platforms. Now I’ve hopefully learned to always initialize everything, but it can be a lot of work if you have many constructors.

    Especially pointers and floating point numbers should be initialized to null (or null_ptr or whatever it is nowadays) and 0.0, by default. And then have something like your suggested “noinit” keyword for those who really need it for something.

  2. “A language should simply not allow uninitialized variables. I’m not saying that an explicit value must alwas be given, but the language should guarantee the value.”

    That seems completely sensible to me, but I wonder if it’s not in the language specification because it adds performance overhead? A lot of C++ applications are high performance, and when you’re trying to squeeze every last bit out of the language that might be a problem.

    I don’t know C++ that well, but my thought would be to update the spec to say the compiler would require that a variable must be initialized before it is used.

    • I’ve done a lot of high performance/low-latency code and it simply wouldn’t be a problem if values were initialized by default. If I’m doing manual memory management I’ll have the option. Additionally with a `noinit` keyword I can fine tune situations where it’s essential nothing is initialized — though I’m not sure I would ever have needed it. I just don’t consider the performance aspect to be a reasonable objection.

  3. I respectfully disagree, since default initialization of variables
    – does not prevent bugs, it just makes them harder to find, and is thus unsafer!
    – it is slower, and can dominate the performance of useful programs,
    – uninitialized memory is required by some programs, thus forcing default initialization prevents those programs from being written in your language.

    About the safety point. You mention the use of valgrind, which detects access to uninitialized memory. The problem is not uninitialized memory, the problem is accessing it. In a project we decided to forbid default initializing some variables to be able to find bugs in our initialization functions with valgrind because people were default initializing them to zero to remove valgrind warnings instead of fixing the bugs in the initialization functions.

    So the safety point is moot. Default initialization does not prevent bugs, it makes them harder to find. So I argue that it is not safer, and that having “reproducible bugs across platforms” is not a good selling argument for a programming language.

    About uninitialized memory being useful. It is really useful e.g. in serialization. This is an extremely common problem in CS that a lot of programs have to solve. Forbidding uninitialized memory will make any program that serializes data slower. In some fields (HPC, HFT, Servers, Embedded devices…) your PL would be too slow to be applicable. This is how it works: reading some serialized data from a file/network/interprocess memory you 1) get the type and the number of elements to be read, 2) allocate an uninitialized buffer, 3) read the data into the buffer. Depending on the application, step 2) can be as expensive as 1) and 3), thus making it slower does makes your whole program slower.

    So yes, uninitialized memory is _very_ useful.

    IMO the right thing is to forbid the reading of uninitialized memory at the type system level such that it is impossible to do (Rust memory model is a good step in this direction). Still, I’d say that even if you can forbid reading it at the type-system level, you should allow a way to do so (e.g. Rust unsafe blocks), since you might want to write a “valgrind” in your PL that will need to read uninitialized memory.

    I know doing this at the type-system level is a very hard problem, but forbidding uninitialized memory is just the wrong answer.

    • It does prevent defects. The problem with uninitialized valeus is that the program can end up behaving differently from run to run or from platform to platform. When a value is guaranteed this doesn’t happen. It will always behave the same: determism is a nice thing to have. And if it is the wrong value it is a lot easier to find since it won’t accidentally work just sometimes.

      I disagree that performance here is a significant issue. There are only a few rare cases where default initialization might cause an issue. Mostly these programs already do their own memory allocation and thus wouldn’t be subject to the default initialization. In any case, I’ve even said that a `noinit` keyword could be available to satisy even the odd need. Plus, since the OS always initializes memory it’s hard to imagine the performance overhead being extremely limiting.

      No program can require uninitialized memory. Since the language/libraries never guarantee what is in memory it is not reasonable to depend on it being not initialized. Your desire for serialization is covered by my allowance for custom memory management.

    • Again, uninitialized values is not the problem, the problem is accessing uninitialized values. For variables that have complex initialization (e.g. in a switch case after the variable), you can check at compile time that it has always been initialized before you access it. This eliminates this whole group of bugs from all your programs. If you instead give variables a default initialization, this check cannot be done anymore, and the only way to be sure that your program is correct is via testing. As you recognize, valgrind helps here in C/C++ but it won’t help in your case since all your values have been initialized with something, so you can’t even rely on automatic tools like valgrind anymore, but have to rely on user defined tests.

      So yes, I think that if you can forbid these initialization bugs from your programs at compile-time, you should. However, I still think that default initialization just make these impossible to find at compile-time, and harder to find at run-time.

    • I agree with anon: Default-initialization doesn’t necessarily fix bugs. It only helps when the code that uses the uninitialized variable expects exactly the default value. Otherwise, it’s not really different: You are still working with data in an unexpected state.
      The actual problem doesn’t go away: You still have code that wants to read values that were never written. Yes, you get some determinism if you set them to a default value but the code (probably) still won’t work correctly.
      I can only think of one solution if you want to actually catch and fix these bugs: Reading from a variable *must* cause a compilation error whenever the compiler can’t *prove* that the variable was assigned.

    • It doesn’t prevent the bug but it makes it detectable. To me that is the same as introducing bugs. My test suite should work the same on various platforms for basic things like variable values.

      I think you’re making a very minor distinction when you say you shouldn’t be able to use an undefined value. Since proving something is ininitialized is impossible int he general case (algorithmically), the only way to guarantee this is to default initialize.

  4. C++ also lets you omit a return statement in functions that are supposed to return something. More garbage! =D

  5. In Visual C++ at least you can enable all warnings/warnings as errors. The code you posted would not compile in this mode.

    • GCC also has this mode, however it doesn’t detect all the places where unitialized data is used. The simple case as posted here will be detected, but often with allocated structures it does not know.

      As this is programming code, which is turing complete, identifying the uninitialized data would actually be equivalent to the halting problem. Thus it isn’t possible for the compiler to always be correct.

  6. You understand that this isn’t something the designers of C/C++ “forgot” or “overlooked.” The language standard explicitly says that automatic objects, if not explicitly initialized, have indeterminate values. They looked at the pros and cons and made a choice; I think that choice is well-aligned with the C/C++ philosophy.

    These languages are intended to be “close to the metal” and useful on a wide variety of platforms, particularly embedded platforms. In those contexts, not requiring initialization makes more sense than it does on a general programming level.

    Many general programming languages do guarantee object initialization and are usually a better choice for general programming tasks. I would recommend against using C/C++ unless circumstances require it. They are unsafe languages and using them correctly requires a level of expertise that’s hard to achieve. There are situations that require their use, but if you have a choice, modern languages offer much better choices.

    Per the old saying about programming languages and foot wounds, C/C++ doesn’t just allow you to shoot yourself in the foot. It hands you a loaded gun with a hair trigger, wraps your finger around that trigger, pushes the muzzle towards your foot and yells “BANG” in your ear.

    • A lot of the decisions for C/C++ were made at a time when computing power was extremely limited. The lack of initialization I think is one of those decisision stemming from that time.

      I can obviously think of all sorts of situations where you wouldn’t want to initialize a variable, but they have become the rarity now, even in C++.

  7. I do like your solution, as long as it includes the “noinit”. I think D has this with “void”, but I don’t really like D for other reasons

    • I agree 100% with Mortoray. I’m so glad to see someone else post about this! This is a problem, and it’s irritated me for two decades. Fixing this would make C++ a better language for application programming, for both novices and experts.

      I agree that C++ needs to retain the ability to leave data undefined for performance reasons. This capability will always be important to C++ library implementors, and usually only them. That means the solution doesn’t even have to be pretty. The ability to have undefined data should be justified under the philosophy “if you know what you’re doing, and have a good reason, you can open the hood and get at such details” (paraphrasing Herb Sutter). It’s current place as default behavior is a mistake that should be corrected with the benefit of hindsight.

      I agree with something like ‘noinit’, but I don’t think it needs to rise to the level of language keyword.

      I’d personally like to see a new std type and value – perhaps std::undefined which would be of type std::undefined_t, not at all unlike nullptr and std::nullptr_t. As with std::nullptr_t, a user-defined type can provide a constructor taking std::undefined_t if it could benefit from one. The programmer could do anything they wanted in such a constructor, just like they can when accepting std::nullptr_t, but generally they’d initialize select members with std::undefined as desired.

      All of the built-in types (int, float, void*, etc…) would also accept construction by std::undefined, and in ONLY this case the compiler would leave the object undefined. Default construction of built-in types would assign 0, just as is done now when doing “int x();” instead of just “int x;”.

      For example:

      int x; // value of x is zero

      int x = std::undefined; // value of x is left undefined, could be anything

      int x = 77;
      x = std::undefined; // compiler error – built-ins wouldn’t copy from std::undefined

      int buffer = new int[100]; // all 100 are set to zero

      int buffer = new int[100](std::undefined); // all 100 are left undefined

      class Foo {

      int x_;

      Foo() { } // value of x_ is zero

      Foo(std::undefined_t) : x_(std::undefined) { } // value of x_ is undefined

      // ….. but you wouldn’t often see constructors taking std::undefined_t,
      // because you almost never want undefined data.

      }

      And as a bonus, this would make the default value of “int x;” consistent no matter what scope it’s declared in. (Currently, C++ gives “int x;” a default value of 0 only when declared in static/global scope).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s