Tags

, , , , ,

In the world of new languages it seems like garbage collection is standard feature. A way for the runtime to locate unused bits of memory and release them. It has become so common that some people can’t even imagine using a language without it. Programmers, particularly ones new to the trade, tend to be exposed only to the virtues whereas the negatives are quietly left in the dark. Perhaps this was a decent reaction to the memory management of C, but as a whole, in general purpose languages, garbage collection is more of a detriment than a benefit.

What is It?

When we speak of garbage collection most people think of the Java style mark-and-sweep, tracing, copying or other scanning like algorithms. Something continually, or reguarly, searches the memory of the program looking for items that can be deleted. That is, something is looking for garbage to collect and discard.

Though often falling under the same umbrella term, we don’t simply mean systems where memory is automatically freed when it is no longer used. It is invalid to refer to properly tracked and managed memory as garbage in the same sense. We are speaking strictly of the search and destroy type algorithms. Things like reference counting and process cleanup are simply not the same. They aren’t looking for things to delete, they know exactly what has to be removed.

Solves only one problem

Garbage collection is a method to free the programmer from worrying about basic memory management. It isn’t the only option; simple reference counting also covers the vast majority of memory management needs. Sweeping collection has a distinct feature of being able to resolve cyclic dependencies. That is when A points to B and B points to A. Reference counting cannot manage this in the general case.

Obviously if you have such a feature you will make use of it. I don’t see however that cyclic dependencies are a significant obstacle in most programs. Such object relationships can normally be resolved, or significantly mitigated, by one of a number of other techniques.

Does not solve that problem

Even with a garbage collector cyclic references still pose somewhat of a problem. Often objects need to execute some code when they are to be destroyed, they can’t just silently disappear like a block of memory. Within a cyclic reference it is, in the general sense, impossible to determine which destructor has to be called first. So instead garbage collectors introduce the concept of finalization, which is a rather marginalized form of a destructor.

Cyclic loops with special cleanup dependencies need to have a programmer clarify what should be done. A finalizer system could possibly get it wrong, thus forcing a programmer to manage the cleanup prior to garbage collection. In situations where the collector can assuredly do the right thing, there is likely an easy alternative for the programmer to avoid the cycle all together.

Requires a simplified memory model

Scanning memory for unused objects requirese a very clear idea of what memory actually is. The collector has to to undestand all the places where a pointer might be stored. In many high-level languages this is possible solely because the memory model has been simplified to just a stack and a heap.  Computers however have all sorts of other places where pointers could be stored. They have shared memory, specialized vector processors, registers, graphics texture memory, in fact almost every chip in the system has some manner in which to store and retrieve data.

It’s simply not feasible, or even possible, for a garbage collector to properly scan all this memory. Programs that need these resources have to be aware of its implications on the garbage collector. Perhaps the programmer won’t use such memory, but those same implications also apply to a compiler trying to optimize the code. Furthermore, the simplified model requires hiding the true nature of memory making it more difficult to implement certain inter-process and concurrent programming algorithms.

It holds on to resources too long

Object based languages tend to use a concept known as RAII (resource acquisition is initialization). The simple idea is that, by example, creating a File object will open the file for you and close it once you stop using the object. With scoped based variables, or reference counting, this is great since the moment the last user of the file is done, it will be closed. With a garbage collection you need to wait for the scanner to find and finalize the object before the file is closed.

This results in a late deallocation. For many resources this causes an exhaustion issue. An OS has a limit on the number of open files. A database can only have so many open cursors. Client sockets are limited, and likely the server will throttle as well. A network protocol may send a packet to end the connecton. The file object will flush the data to the disk. It’s impossible to say in a generic manner what might need to happen at this point.

So in Java, and other such languages, you’ll often see programmers explicitly calling close functions on their objects. This harks a lot back to the idea of manual memory management, the one thing that garbage collection was supposed to avoid. Prompt deallocation of unused resources, via the RAII pattern, is critical. Garbage collection can’t offer it.

It doesn’t prevent memory leaks

Most memory leaks I encounter are not usually the result of a missing free or delete statement. Usually they result because somebody is storing objects in a cache. Nothing ever clears this cachce so over time it grows. It hoards every object added to it until the program exits or runs out of memory. Garbage collection can’t do anything here. Those objects are technically in use and thus cannot be freed.

Garbage collection can prevent certain types of memory leaks. But so can reference counting. Moreso these types of memory leaks are very easy to recognize by readily accessible leak analysis programs. The type of program you’ll want to run to catch the above problems as well, even if you have a garbage collector.

It is slow and doesn’t scale

Compared to manual memory management, or reference counting, or virtually any other non-scanning scheme, garbage collection is slow. This is easy to forget since you can find any number of benchmarks or studies showing that it is just as fast as something else. Look closely, or even a cursory glance in most cases, and you’ll see what is actually being measured is nonsensical, oversimplified, or just plain wrong.

Actually, you don’t even need to look closely, you can just view it theoreritcally. Garbage collection has to scan memory, thus its complexity must be a function of N, where N may be the number of bytes or the number of objects. It must be bound by a linear function of N, since ultimately it needs to scan all of these objects. Even a highly sophisticated, incremental, distributed collector must go through all the objects. This means that every additional block of memory used  by a program will ultimately make it slower.

It pauses and interrupts

Just being slow or consuming extra processing time is not a significant issue for many programs. Collectors have this an additional drawback of occasionally locking the memory. Locking the memory in most cases ends up locking several threads, or the whole program. With a solid design it is possible to significantly reduce the frequency and duration of interruptions. None are perfect however and they all end up blocking the program for some noticable period of time.

What is noticeable depends on the application. Obviously for a real-time, or low-latency application, even a few microseconds pose a significant problem. Most apparent however are user applications. We have all sorts of programs which simply stop reacting for a while, video which pauses in the middle, audio with snaps and crackles, web pages that sometimes take 10 times as long to load, and the list goes on. This is a very unfortunate backwards step in user experience.

Conclusion

Nobody is denying that the original style of allocate and deallocate leads to problems. Though not all people are however opposed to that style, I think in general we can agree that some kind of automatic memory management is required. Looking at its set of features and drawbacks, scanning garbage collection just isn’t the rigth approach for a general purpose language. Recall in fact that garbage collection has only one unique feature: the ability to resolve cyclic loops.  A problem that it can’t actually solve correctly and one that has several other alternate solutions.

Perhaps certain domain-specific-languages could use scanning garbage collection as a reasonable way to implement their features.  Some languages might consider a hybrid approach.  Maybe an ingenius new technique will solve some of the drawbacks. So far however, garbage collection in generic languages has introduced more problems than it has solved.

About these ads