Tempering Garbage Collection - Synopse Open Source

Some general patterns about Garbage Collection (GC):

It is almost impossible to know how much memory is used by a memory structure at runtime, since the corresponding objects may not have been marked as deprecated so are still in memory even if they are not in the queue any more;
Direct use of object references is handled by some internal reference-counting mechanism, until you define some circular references. Sadly, most GC’s algorithms are much more complex than a simple reference counting mechanism: since a GC favors allocation speed, its tendency is to allocate as many objects as possible, only re-using and collecting the objects as late as possible.
You can force the GC to collect the memory, but it is usually a blocking process (so may be a wrong idea on a real-time service);
And since the GC has not a deterministic behavior, you can not be sure which threshold value of your heap use may be a good trigger of garbage collection;
Some authors state that most GC algorithms expects from 3 to 5 times the used memory to be available (i.e. if you expect 200 MB of data, you need 800 MB of free RAM for your process) - this is mostly due to the performance optimization ;
On the other hand, giving too much memory may do the opposite as expected, i.e. reduce the global performance, depending on how the VM works;
From my experiments, the .Net memory model seems to be more aggressive than the one in Java, especially in multi-thread process.

Some usual fixes/optimizations paths:

Re-use existing objects, and not create new instances (using object pools);
Use arrays of pre-allocated objects, and restraint use to POJOs/POCOs;
Some memory structures may use less memory and overhead (e.g. an array of struct in C# are much faster and uses much less memory than a list of objects);
Limit objects cloning/marshaling/wrapping as much as possible, and pass the data as reference;
Pre-allocate and re-use memory e.g. for storing text (typical efficient pattern is the string builder);
Multi-thread process (object locking and monitoring) consumes a lot of resources, so instead of locking at object level, mutexes on small part of the code are much more efficient;
Do not create more threads than the number of CPU cores it run on – in general, one optimized thread is more efficient than multiple threads: process should happen in one non-blocking thread, then other threads are used to pre-process or post-process the data, e.g. when something slow may take place like serialization or network access;
Profile the execution, then identify the real bottlenecks to be optimized – for instance working with individual small files is an awful practice;
Use a fast un-managed in-process storage (e.g. SQLite3, BerkeleyDB, memcached…) instead of storing long-term objects in GC memory.

For server process, or mobile execution, unmanaged environments like Delphi are still a perfect fit!