Some general patterns about Garbage Collection (GC):

  • It is almost impossible to know how much memory is used by a memory structure at runtime, since the corresponding objects may not have been marked as deprecated so are still in memory even if they are not in the queue any more;
  • Direct use of object references is handled by some internal reference-counting mechanism, until you define some circular references. Sadly, most GC’s algorithms are much more complex than a simple reference counting mechanism: since a GC favors allocation speed, its tendency is to allocate as many objects as possible, only re-using and collecting the objects as late as possible. 
  • You can force the GC to collect the memory, but it is usually a blocking process (so may be a wrong idea on a real-time service);
  • And since the GC has not a deterministic behavior, you can not be sure which threshold value of your heap use may be a good trigger of garbage collection;
  • Some authors state that most GC algorithms expects from 3 to 5 times the used memory to be available (i.e. if you expect 200 MB of data, you need 800 MB of free RAM for your process) - this is mostly due to the performance optimization ;
  • On the other hand, giving too much memory may do the opposite as expected, i.e. reduce the global performance, depending on how the VM works;
  • From my experiments, the .Net memory model seems to be more aggressive than the one in Java, especially in multi-thread process.

Some usual fixes/optimizations paths:

  • Re-use existing objects, and not create new instances (using object pools);
  • Use arrays of pre-allocated objects, and restraint use to POJOs/POCOs;
  • Some memory structures may use less memory and overhead (e.g. an array of struct in C# are much faster and uses much less memory than a list of objects);
  • Limit objects cloning/marshaling/wrapping as much as possible, and pass the data as reference;
  • Pre-allocate and re-use memory e.g. for storing text (typical efficient pattern is the string builder);
  • Multi-thread process (object locking and monitoring) consumes a lot of resources, so instead of locking at object level, mutexes on small part of the code are much more efficient;
  • Do not create more threads than the number of CPU cores it run on – in general, one optimized thread is more efficient than multiple threads: process should happen in one non-blocking thread, then other threads are used to pre-process or post-process the data, e.g. when something slow may take place like serialization or network access;
  • Profile the execution, then identify the real bottlenecks to be optimized – for instance working with individual small files is an awful practice;
  • Use a fast un-managed in-process storage (e.g. SQLite3, BerkeleyDB, memcached…) instead of storing long-term objects in GC memory.

For server process, or mobile execution, unmanaged environments like Delphi are still a perfect fit!