When something works well on its own with a high level of reliability, we tend to take it for granted and forget about the mechanics of how it works. We turn the key of our car and expect the engine to fire up. No thought is given to the inner workings of the internal combustion engine.
We write .NET code and expect the CLR and its garbage collector to manage memory for us without having to think much about value types, reference types, the stack and the heap. We just write code that works and thank our lucky stars that we've seen the last of malloc. Sure we have to dispose of some objects that reference system or outside resources manually but that's simple work.
But sometimes it's fun to look under the hood. And who knows, it might show up on a test.
So what is the heap and the stack and what is the difference between a reference type and a value type and what do they have to do with the stack and the heap and garbage collection?
You're right. I'm about to tell you. Now I'm not a compiler or CLR guru and this is not a graduate level CS class. This is a blog post, so we'll try to keep it real. There are far greater explainations that go into much more detail out there on the web and in texts. I hope this post will serve as a summary reminder of these concepts and help us understand the engine under the hood a bit better. And remember, this post is based on what I know, so if I'm wrong about something here, please feel free to correct me.
What is the Stack?
The Stack is essentially a LIFO (last in, first out) execution stream for a given thread (each thread gets its own Stack) with the most recently called method on the top containing parameters, stack allocated value types (more on that later), and references or pointers to data items in the Heap. The CLR using the JIT compiler manages what goes on the Stack. When a method returns or fires an unhandled exception, that top item on the Stack is removed and control is returned to the next item on the Stack. Sometimes you will see the Stack referred to as the "call stack."
What is the Heap?
The Heap's purpose is to hold information. The Heap is like a filing cabinet where data is stored. While the Stack is only accessed by the CLR in a LIFO fashion, the Heap can be accessed without constraint. The Heap contains the data you generally think of as variables or objects that are reference types. When we're done with things in the Heap, they have to be cleaned up to make room for other things. That's the job of the garbage collector.
What is a Value Type?
A value type is an object that is derived implicitly from the System.ValueType which overrides the System.Object virtual methods more appropriate to value types. Value types fall into to main categories:
Structs
Structs fall into these categories:
• numeric types
- integral types (byte, char, short, int, long, sbyte, ushort, uint, ulong)
- floating-point types (double, float)
- decimal
• boolean
• user defined structs
Enumerations
Enumerations are a set of named constants with an underlying type which can be any integral type except System.Char.
Value types are allocated on the Stack or allocated inline in a structure. This means that value types are almost always stored in the execution or call stack memory. When they're used like an object, they're wrapped up to look like a reference type and placed on the Heap. This is called boxing. Bringing the object back into use on the Stack as a value type is called unboxing.
The most important thing to remember about value types is that the assignment of one value type to another results in the data being copied. For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace StackHeap
{
class Program
{
static void Main(string[] args)
{
MyData me = new MyData();
me.Age = 42;
me.Name = "Sam";
me.Relationship = "married";
MyData her = me;
her.Age = 44;
her.Name = "Mary";
Console.WriteLine("I am {0} years old. My name is {1}.", me.Age, me.Name);
Console.WriteLine("She is {0} years old. Her name is {1}.", her.Age, her.Name);
Console.WriteLine("I am {0} to her. She is {1} to me.", me.Relationship, her.Relationship);
Console.ReadLine();
}
}
public struct MyData
{
public int Age;
public string Name;
public string Relationship;
}
}
Output
I am 42 years old. My name is Sam.
She is 44 years old. Her name is Mary.
I am married to her. She is married to me.
The assignment of the Age value applies only to the copy of the original. It is a value type.
Now consider the output if we make MyData a reference type by changing it to a class.
public class MyData
{
public int Age;
public string Name;
public string Relationship;
}
Output
I am 44 years old. My name is Mary.
She is 44 years old. Her name is Mary.
I am married to her. She is married to me.
The assignment of the Age and Name values apply to both objects now because the variable refers to or points to the object created with the "new MyData()" call.
The same behavior can be observed in using value types and reference types as parameters. Consider this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace StackHeap
{
class Program
{
static void Main(string[] args)
{
MyData me = new MyData();
me.Age = 42;
me.Name = "Sam";
me.Relationship = "married";
MyData her = me;
her.Age = 44;
her.Name = "Mary";
ModifySomeData(me, ref her);
Console.WriteLine("I am {0} years old. My name is {1}.", me.Age, me.Name);
Console.WriteLine("She is {0} years old. Her name is {1}.", her.Age, her.Name);
Console.WriteLine("I am {0} to her. She is {1} to me.", me.Relationship, her.Relationship);
Console.ReadLine();
}
private static void ModifySomeData(MyData me, ref MyData her)
{
me.Age = 50;
her.Age = 50;
Console.WriteLine("\nMy age changed to {0} as value type parameter.", me.Age);
Console.WriteLine("Her age changed to {0} as value type parameter passed by ref.\n", me.Age);
}
}
public struct MyData
{
public int Age;
public string Name;
public string Relationship;
}
}
Output
My age changed to 50 as value type parameter.
Her age changed to 50 as value type parameter passed by ref.
I am 42 years old. My name is Sam.
She is 50 years old. Her name is Mary.
I am married to her. She is married to me.
Notice that her.Age changed permanently and me.Age changed only in the "copy" in the ModifySomeData method because it was not passed as a "by ref" parameter. Now what happens to the output if we change the MyData to a class rather than a struct? Here's the output if we make MyData a reference type by making it a class:
public class MyData
{
public int Age;
public string Name;
public string Relationship;
}
Output
My age changed to 50 as value type parameter.
Her age changed to 50 as value type parameter passed by ref.
I am 50 years old. My name is Mary.
She is 50 years old. Her name is Mary.
I am married to her. She is married to me.
So what happened when we passed a value type "by ref" in the example above? It was boxed into a reference type. And when we changed MyData into a class, the parameters are passed "by ref" regardless of whether the "ref" keyword is used. This is an important distinction to learn when dealing with parameter values.
What is a Reference Type?
Reference types can be a class, interface or delegate. All classes are ultimately derived from System.Object. Interfaces and delegates are special reference types. Exactly what they are and how they are used is a subject for another day. There are two built in reference types: object and string. These types are "native" to the CLR and do not require a "class definition" in code. All other reference types are defined in existing framework assemblies, third party assemblies or in your own code.
When you create an instance of a reference type or pass that instance as a parameter, a pointer is placed in the call Stack that references the object in the Heap. While value types live in the call Stack and are cleaned up with the removal of their associated execution call, the data on the Heap must be cleaned up by the garbage collector (GC).
What is the GC?
The garbage collector in the .NET CLR is the intelligent mechanism that deals with data on the Heap for which there is no reference or rather no existing pointer in the call Stack. It scans the Stack from time to time to determine if an object in the Heap is referenced. If the object is no longer referenced, it removes the object from the Heap returning that memory space to the runtime and eventually to the system.
The GC uses a "generations" approach to improve the speed with which garbage collection is done. The reason this is required is that while garbage collection is done (the traversing of the call Stack and iteration through the Heap), nothing else can be done. In other words, all processing threads are halted while garbage collection is performed.
The GC marks each object in the Heap with a generation value: 0, 1 or 2. The reason for this is that objects that make it through two garbage collection passes are likely to be objects that will live a long time, such as a Windows.Form object. On the other hand, short lived objects such as a local string literal will generally not survive one garbage collection pass. By marking objects with a generation value, the GC can prioritize its work. For example, if sufficient memory is recovered by examining objects with the generation value of zero, then further garbage collection is not required and processing may continue. This way you get a good balance between memory use and performance.
Summary
That's about all I have to say on this subject for now. I'm sure that virtual forests have been consumed in addressing these topics, but writing this up has been a good exercise for me. If you find mistakes, please let me know but go easy on me. :)