The Joy of Reference Counting
David Chappell - August 1997
One of the standard parts of object technology is the notion of
an object's lifecycle, the key aspects of which are creation and
deletion. To create a new instance of some class, for example, C++
and Java programmers can use the new operator, while a COM programmer
might call CoCreateInstance to create a new COM object. While it
exists, this object instance can be accessed by one or more clients,
and some memory is typically allocated for its state.
Figuring out how to get rid of an object instance when it's no longer
needed is a bit more challenging. One option is to assume omniscience
on the part of the programmer: just as she knows when to create
a new object, she also knows when to destroy it. This is essentially
the strategy used in C++, where an object allocated with new can
be destroyed with delete.
In Java, however, a client can create an object, use it for awhile,
then just forget about it. Objects with no active clients are eventually
deleted by Java's garbage collector, the existence of which frees
programmers from worrying about explicitly deleting objects. Automatic
garbage collection is a very nice thing to have, since it's easy
to forget to delete an object when you're done using it.
Deleting COM Objects
Unlike C++, COM doesn't provide an explicit delete operation. COM
also doesn't necessarily assume the presence of an automatic garbage
collector. Instead, COM relies on reference counting to determine
when an object can safely delete itself. To accomplish this, each
COM object maintains a count of how many clients hold references
to its interfaces. Every time the object hands out a pointer to
one of its interfaces, it adds one to that reference count. In some
cases, when a client acquires an interface pointer to an object
from a source other than the object itself, that client must call
the object's AddRef method, which also causes the object to increment
its reference count by one. Whenever a client is finished using
an interface pointer, it calls Release on that pointer, and the
object subtracts one from the reference count. When all clients
have finished using all of an object's interfaces, i.e., when the
reference count falls to zero, the object typically commits suicide,
freeing any resources it has been consuming.
It doesn't take a whole lot of thought to realize that requiring
software developers to correctly call AddRef and Release all the
time might lead to problems. Some developers (not readers of this
magazine, of course, but some other developers) might forget to
call Addref when they should, or they might call Release too often
or not often enough. Errors like these will result in odd behavior,
such as objects that die prematurely or wear out their welcome by
hanging around after all of their clients are gone. Yet the original
view of C++ COM programming assumed just this kind of correctness
on the part of COM programmers. The predictable result was that,
sometimes, things didn't work as expected.
COM Reference Counting Today
The situation is different today. For one thing, COM is supported
by lots of different languages, including Visual Basic, Java, and
many more. In most of these cases, reference counting is hidden
from the programmer-he never needs to call AddRef or Release. Instead,
the language runtime takes care of the work required to make reference
counting work correctly.
For example, in Microsoft's implementation of the Java Virtual Machine,
a Java client can treat an external COM object as if it were a Java
object. When the client is done using the object, it can forget
about it, just as with any other Java object. As always, the Java
garbage collector notices the now-unused object and proceeds to
delete it. If the garbage collector in Microsoft's Java VM determines
that the object in question is a COM object, it simply calls Release
on the object. Reference counting is still used to control the lifetime
of every COM object, but the Java programmer is freed from the responsibility
of worrying about it-it's taken care of automatically.
Even C++ programmers can now avoid this potentially error-prone
task. Microsoft's Visual C++ 5.0 can automatically generate smart
pointer classes that know how to do reference counting correctly.
While many C++ COM programmers have long used this technique to
simplify their lives, having the classes produced automatically
is quite handy. Reference counting remains the underlying mechanism
used for controlling the lifetime of COM objects, but it no longer
need be the programmer's concern.
Reference Counting for Remote Objects
Hiding reference counting from programmers is unquestionably a good
thing, but it doesn't affect another potential problem: how can
reference counting be done effectively across a network? Distributed
COM (DCOM) faces exactly this problem, since COM uses reference
counting to control the lifetimes of both local and remote objects.
There are two main thing to worry about with remote reference counting.
The first concern is efficiency: if clients call AddRef and Release
frequently (and they typically do), one could imagine that lots
of network traffic might be generated just keeping each remote object's
reference count up to date. The second problem is making sure that
unexpected client failures (remember, we're talking about clients
running Windows here) don't result in garbage objects that hang
around forever.
DCOM's designers handled the first problem by realizing that all
an object needs to avoid premature self-immolation is a non-zero
reference count. Whether the count's value is one or one hundred,
the object won't go away. This means that a large number of the
AddRef and Release calls clients make on remote objects can be handled
locally on the client's system-as long as an object thinks it has
at least one client, it won't go away. In DCOM, most calls to Addref
and Release go no farther than the client proxy (DCOM's term for
a client stub)-they aren't actually sent across the network. Only
when the last Release occurs is a message actually sent to the remote
object. While this means that the object's reference count won't
always match its actual number of clients, who cares? The result
is the same-objects die only when they're supposed to-and a good
deal of unnecessary network traffic is avoided.
The second problem is a little harder. When a client dies unexpectedly,
it will never be able to call Release on the objects to which it
holds references. Without some way for the remote object to learn
about its client's untimely demise, it will never be able to delete
itself. If this happens a lot (as it will in a network of any reasonable
size), servers will grow without bound, and eventually need to be
shut down and restarted to free those resources. This is an unattractive
solution, one that's bound to annoy existing clients that are using
those servers.
There's a standard way for an object to learn about its client's
death: the client can periodically send the object's server a packet,
reassuring the object that its client is still alive. If enough
time passes without the arrival of one of these keepalive packets,
the server can assume the client is dead and take the appropriate
action. This is exactly what DCOM does. Every two minutes a ping
packet is sent to the server; if no such packets arrive for three
ping periods-six minutes-the server assumes the client is dead and
calls Release as needed.
The prospect of every client sending a ping packet to every object
it holds a reference to on every server is chilling. If done naively,
it's easy to imagine all of a network's bandwidth being eaten up
by pinging. Fortunately, DCOM's designers didn't choose to do things
this way. Instead, the DCOM infrastructure automatically creates
ping sets consisting of every client/referenced object combination
on each pair of machines. The entire ping set is then kept alive
with a single packet sent every two minutes between those two machines.
One packet every two minutes between each pair of machines, while
not exactly free, is much better than blindly sending a separate
ping packet for each client/object pair.
Complaining About Problems vs. Solving Them
The OMG has been fiercely critical of DCOM's pinging approach to
distributed garbage collection, claiming that it will never scale.
But the CORBA standards deal with this problem by pretending it
doesn't exist-they don't define a solution. As a result, different
CORBA-based products do different things. Some essentially require
servers to be periodically shut down and restarted to avoid infinite
growth due to garbage objects maintained for crashed clients. Others
rely on some kind of pinging, whether based on TCP keepalives or
an ORB-specific mechanism. The key point is that these allegedly
standard products solve this crucial problem in different ways,
which doesn't bode well for interoperability among them.
Reference counting isn't without cost. But for true component architectures,
where an all-knowing component that knows when to delete everything
can't exist, it's the only viable solution to the problem of deleting
objects when they're no longer needed (in fact, even the apparently
dead OpenDoc relied on reference counting). And when hidden beneath
language-appropriate mechanisms, reference counting can be simple-programmers
don't even need to know it's there.