David Chappell

Columns

Understanding .NET's Common Type System
David Chappell - December 14 , 2001

What is a programming language? One way to think about it is as a specific syntax with a set of keywords that can be used to define data and express operations on that data. While language syntaxes differ, the underlying abstractions of most popular languages today are very similar. All of them support various data types such as integers and strings, all allow grouping executable code into methods, and all provide a way to group data and methods into classes. When a new programming language is defined, the usual approach is to define underlying abstractions such as these—key aspects of the language's semantics—concomitantly with the language's syntax.

Yet there are other possibilities. Suppose you chose to define core programming abstractions without mapping them to any particular syntax. If the abstractions were general enough, they could then be used in many different programming languages. Rather than inextricably mingling syntax and semantics, they could be kept separate, allowing different languages to be used with the same set of underlying abstractions.

This is exactly what's done in the Common Type System (CTS). A fundamental part of the .NET Framework's Common Language Runtime (CLR), the CTS specifies no particular syntax or keywords, but instead defines a common set of types that can be used with many different language syntaxes. Each language is free to define any syntax it wishes, but if that language is built on the CLR, it will use at least some of the types defined by the CTS. While the creator of a CLR-based language is free to implement only a subset of the types defined by the CTS, and even to add types of her own to her language, most languages built on the CLR make extensive use of the CTS-defined types. Visual Basic.NET, C#, and pretty much every other language used with the .NET Framework rely heavily on the CTS.

Figure 1 The CTS defines reference and value types, all of which inherit from a common Object type.

Figure 1 shows a substantial subset of the types defined by the CTS. The first thing to note is that every type inherits either directly or indirectly from a base Object type. Notice, too, that every type defined by the CTS is either a reference type or a value type. As their names suggest, an instance of a reference type always contains a reference to a value of that type, while an instance of a value type contains the value itself. Reference types inherit directly from Object, while all value types inherit directly from a type called ValueType, which in turn inherits from Object.

Value types tend to be simple. As Figure 1 shows, the types in this category include Byte, Char, signed integers of various lengths, unsigned integers of various lengths, single- and double-precision floating point, Decimal, Boolean, and more. Reference types, by contrast, are more complex. As shown in the figure, for instance, the CTS's reference types include the following:

Class: A CTS class can have methods, events, and properties; it can maintain its state in one or more fields; and it can contain nested types. Classes have one or more constructors, which are initialization methods that execute when a new instance of this class is created. A class can directly inherit from at most one other class, and act as the direct parent for at most one inheriting child class. A class can also implement one or more interfaces, described next.
Interface: An interface can include methods, properties, and events. Unlike a class, an interface can inherit from one or more other interfaces simultaneously.
Array: An array is a group of values of the same type. Arrays can have one or more dimensions, and their upper and lower bounds can be set more or less arbitrarily (although languages built on the CTS commonly restrict this freedom).
String: A string is just a group of Unicode characters. Strings can't be modified once they're created.
Delegate: A delegate is effectively a pointer to a method, and they're commonly used for event handling and callbacks.

To really understand the difference between value types and reference types—a fundamental distinction in the CTS—you must first understand how memory is allocated for instances of each type. In managed code, values can have their memory allocated either on the stack managed by the CLR or on a CLR-managed heap. Variables allocated on the stack are typically created when a method is called or when a running method creates them. In either case, the memory used by stack variables is automatically freed when the method in which they were created returns. Variables allocated on the heap, however, don't have their memory freed when the method that created them ends. Instead, the memory used by these variables is freed via a process called garbage collection.

A basic difference between value types and reference types is that an instance of a value type has its value allocated on the stack, while an instance of a reference type has only a reference to its actual value allocated on the stack. The value itself is allocated on the heap. Figure 2 shows an abstract picture of how this looks. In the case shown here, three instances of value types—Int16, Char, and Int32—have been created on the managed stack, while one instance of the reference type String exists on the managed heap. Note that even the reference type instance has an entry on the stack—it's a reference to the memory on the heap—but the instance's contents are stored on the heap.

Figure 2 Instances of value types are allocated on the managed stack, whereas instances of reference types are allocated on the managed heap.

There are cases when an instance of a value type needs to be treated as an instance of a reference type. For situations like this, a value type instance can be converted into a reference type instance through a process called boxing. When a value type instance is boxed, storage is allocated on the heap and the instance's value is copied into that space. A reference to this storage is placed on the stack. The boxed value is an object, a reference type that contains the contents of the value type instance. A boxed value type instance can also be converted back to its original form, a process called unboxing.

CLR-based programming languages such as C# and Visual Basic.NET construct their own type systems on top of the CTS types. Despite their different representations, however, the semantics of these types are essentially the same in typical CLR-based languages. Because of this, no matter which CLR-based language you're working in—C#, VB.NET, or something else—the CTS underlies a large part of what you're doing. As Windows development shifts more and more to .NET, the CTS will become the foundation for a growing part of the world's software.